The Road to AGI
Will humans ever be able to build artificial general intelligence (AGI), AI as intelligent and capable as humans on all tasks? If so, how might we build it? And once we get there, how will we know?
Witnessing AGI in our lifetime was a distant dream for many just a decade ago. However, with the emergence of large language models like ChatGPT and Gemini, which possess a deep understanding of the world (despite occasional glaring inaccuracies), the possibility of AGI is becoming more tangible. It might become a reality before the end of this decade.
The pursuit of AGI is a race, with leading contenders such as OpenAI, Google DeepMind, Anthropic, and Meta, each charting their unique paths. While the capabilities of the latest models like GPT-4o, Gemini 1.5 Pro, Claude 3 Opus, and Llama-3 are impressive, they still exhibit significant limitations. The road to AGI is paved with extensive research and development, and several key challenges must be overcome before we can even approach AGI.
System 1 vs System 2
The fascinating book Thinking Fast and Slow discusses the two main subsystems of our brains: System 1 and System 2. If I ask you your name or the sum of 2 plus 2, your answer will use System 1, giving a more reflexive answer that uses information retrieval rather than deliberation or deep thought. If I ask you to multiply 23 by 417 or how you’d plan a birthday party, you’ll use System 2, the part of your brain that does reasoning and planning, and that takes time to come to conclusions. Today’s LLMs are all a sophisticated version of System 1 thinking. To build AGI, we must develop machines with System 2 thinking, systems that understand how to break problems down, explore possible solutions, assess the best approach, and execute it. This might require many steps in a chain with multiple iterative loops.
Much of DeepMind’s research has focused on building System 2 abilities. The rumored Q-Star capability that OpenAI is working on is thought to have basic reasoning skills used to solve math problems it has not encountered before.
Generalization and Transfer Learning
The algorithms needed to train a narrow AI model are well understood. A narrow AI is one that’s designed to do one thing and one thing only. Narrow AI can’t perform other tasks, so an AI good at filtering spam can’t play chess, and a chess-playing AI can’t filter spam. You can’t build an AGI by bolting together many narrow AIs. That’s like trying to get to the moon by building a tower. To get beyond narrow AI, we need to build rocket ships: sophisticated models that can learn, adapt, and generalize.
A good example is the work being done by researchers at DeepMind. In 2020, they built Agent 57, a model that could learn to play classic Atari video games; not a specific game, but all games. Their Gato model is a generalist agent able to act as a chatbot, label images, play Atari, and stack blocks with a robotic arm—one model able to perform a wide range of tasks.
World Models and Embodiment
While large language models demonstrate an impressive understanding of the world as captured in language, there is a lot that they still don’t understand. Many researchers argue that not all that can be known about the world is described in language. To build AGI, we will first need to build embodied intelligence, an AI that resides in a physical form to explore, interact with, and have agency in the world. An embodied intelligence can learn about cause and effect, the laws of physics, and how the world works. Think about how babies learn about the world at an early age, and you get the idea.
Context and Episodic Memory
People who have lost their ability to make or recall memories have difficulty functioning in the world. And so it is with artificial intelligence. A high-functioning AI must understand context—what has come before that shapes how it should proceed—and episodic memory, the ability to remember and learn from its experiences. One of the differentiators between the various frontier models is their context window: the number of tokens they can ingest so they can work on your query. Models with small context windows will ‘forget’ the first things you told them if your prompt is too long for their context window. Early models had context windows of just a few thousand tokens (where one token is roughly equivalent to one word), but the latest models boast larger and larger context windows. Google’s Gemini Pro 1.5 model has a context window of 1 million tokens, enough to ingest over 700,000 words, more than 30,000 lines of code, 11 hours of audio, or an hour of video. At Google I/O, Google announced that developers now have access to a version of Gemini with a 2 million token context window and shared their goal of creating models with infinite context. Expanding context itself won’t be enough, though. Models will also need mechanisms to remember their experiences and learn from them. Research into episodic memory is ongoing.
Theory of Mind
Another dimension needed to build an AGI is the notion of ‘theory of mind,’ the ability to evaluate and understand the internal mental state of another being. We humans use the theory of mind every day when we empathize, feel sympathy, realize that someone else is having a terrible day, are delighted at the news you just shared with them, or are ambivalent to what’s happening around them. Without a theory of mind, building relationships and participating in the social world that’s so important to humans is hard. Research at the University of Hamburg reveals that LLMs seem already able to evaluate users' mental states at some level.
Energy Consumption
The lump of grey spongy material that you carry around inside your skull weighs about 3lb (1.4 kg) and consumes about 20W of power when you’re noodling hard on something. Contrast that with the power consumption of a typical AI data center, which can quickly suck enough energy to make a power station tremble at the knees. The brute force methods we use to build generative artificial intelligence today— lengthy, energy-intensive training runs and inference that conducts trillions of matrix multiplications to answer a simple question—are many orders of magnitude less efficient than our biological intelligence.
Object-driven AI
Yann LeCun, an AI luminary and Chief AI scientist at Meta, believes the AI community needs a fresh approach to reach what he refers to as “human-level intelligence.” He argues that current generative AI approaches “really suck,” struggle with factual accuracy, and cannot truly understand the world and how it works. Without any common-sense understanding of the world or the capability to learn quickly from just a few examples the way a child can, he believes we are on the wrong track and that a fundamentally different approach is required. LeCun advocates for a shift to objective-driven AI that is less of a pattern-matching tool (which is how most AI operates under the hood today) and works more like living beings that develop rich internal representations of how the world works, how things interact, and how things change over time. Such an AI would use this internal model to simulate outcomes, reason about the future, and then make an informed decision on the best way to achieve a specific objective it has been given.
In Summary
There is still a lot of fundamental research to be done as AI researchers try to find a way to build artificial general intelligence. When I speak with bright people in the industry about when we might reach AGI, I get answers that range from “a couple of years from now” to “never.” The smart money is betting on the 2028 or 2029 timeframe, but how will we know when we see it if we get there? Several tests have been proposed, from the Turing test (which many would argue has already been reached) to the IKEA test (an AI that can figure out how to build a complex piece of IKEA furniture) to the employment test (an AI able to do any economically valuable work that a human can).
If humans can ever build AGI, what will it mean for humanity? What will it mean for employment, the economy, and how we feel about ourselves? Will we live lives of leisure and abundance? Or will we be left behind, without purpose, depressed, and finding it hard to create meaning in our lives in a post-labor world? That’s a topic for next time.