Beyond Benchmarks: The Dawn of Experiential AI and Its Implications

The artificial intelligence community has long grappled with evaluating and advancing generative AI. While models have seemingly "conquered" traditional benchmarks like the Turing Test, a growing debate questions whether current development focuses too heavily on simply excelling at these specific evaluations. Scholars at Google's DeepMind argue that the limitations lie not in the tests themselves but in the constrained and static nature of the data used to train AI, hindering its potential for true innovation.

In a recently published paper, "Welcome to the Era of Experience", DeepMind luminaries David Silver and Richard Sutton propose a radical shift: AI agents must be allowed to learn through "experience", actively engaging with the world to define their own objectives based on environmental feedback. These two pioneers are giants in the field. Silver spearheaded the groundbreaking AlphaZero, the DeepMind AI that masterfully defeated human champions in Chess and Go. Sutton, a Turing Award recipient, is a principal architect of reinforcement learning, the very technique underpinning AlphaZero's success.

Their vision, dubbed "streams", builds upon the foundation of reinforcement learning and the insights gained from AlphaZero. It aims to address the inherent shortcomings of contemporary large language models (LLMs), which are primarily designed to respond to individual human prompts.

Interestingly, Silver and Sutton note that the rise of generative AI tools like ChatGPT led to a relative sidelining of reinforcement learning after the initial triumphs of AlphaGo and AlphaZero. While this transition facilitated the development of models capable of handling novel human input without predefined rules, it also came at a cost. "Something was lost in this transition: an agent's ability to self-discover its own knowledge," they contend.

Current LLMs, they observe, heavily rely on "human prejudgment", essentially mirroring what humans desire at the prompt stage. This reliance, they argue, imposes an "impenetrable ceiling" on an AI's capabilities, preventing it from uncovering superior strategies that human evaluators might overlook.

Furthermore, the brief, isolated nature of prompt-based interactions restricts AI's development. "In the era of human data, language-based AI has largely focused on short interaction episodes," the researchers explain. "The agent aims exclusively for outcomes within the current episode... Typically, little or no information carries over from one episode to the next, precluding any adaptation over time."

The "Age of Experience", as envisioned by Silver and Sutton, contrasts sharply with this model. Instead of short exchanges, "Agents will inhabit streams of experience, rather than short snippets of interaction." They draw a compelling parallel to human learning, where a lifetime of accumulated experiences shapes our understanding and drives actions based on long-term goals. "Powerful agents should have their own stream of experience that progresses, like humans, over a long timescale," they assert.

Intriguingly, Silver and Sutton believe that "today's technology" is already sufficient to begin constructing these experiential learning systems. Early manifestations can be seen in the emergence of web-browsing AI agents, such as OpenAI's Deep Research. These browser agents signify "a transition from exclusively human-privileged communication to much more autonomous interactions where the agent is able to act independently in the world."

As AI agents evolve beyond mere web navigation, they will require mechanisms to interact with and learn directly from the world. Silver and Sutton propose leveraging the core principles of reinforcement learning, the same approach that powered AlphaZero. In this paradigm, the AI agent operates within a defined "world model" and adheres to a set of rules. Through exploration and action, it receives "rewards" – feedback signals that guide its learning process, indicating the relative value of different actions in various situations.

The real world, they argue, is replete with such "reward" signals, provided the agent is equipped to perceive them. "Where do rewards come from, if not from human data? Once agents become connected to the world through rich action and observation spaces, there will be no shortage of grounded signals to provide a basis for reward. In fact, the world abounds with quantities such as cost, error rates, hunger, productivity, health metrics, climate metrics, profit, sales, exam results, success, visits, yields, stocks, likes, income, pleasure/pain, economic indicators, accuracy, power, distance, speed, efficiency, or energy consumption."

To provide an initial framework for these AI agents, developers might employ "world model" simulations. These simulations allow the AI to make predictions, test them in the real world, and refine the model based on the received reward signals. "As the agent continues to interact with the world throughout its stream of experience, its dynamics model is continually updated to correct any errors in its predictions," they explain.

Crucially, Silver and Sutton anticipate that humans will still play a vital role in defining the overarching goals that these AI agents strive to achieve. The reward signals then serve to guide the agent towards these broader objectives. For instance, a user might set a goal like "improve my fitness", and the reward function could be based on metrics such as heart rate, sleep duration, and steps taken. Similarly, a "learn Spanish" goal could be linked to Spanish exam results. In this model, human input provides the "top-level goal" that directs the AI's experiential learning.

The researchers envision that AI agents with these long-range learning capabilities will be significantly more effective as AI assistants. They could monitor an individual's health over extended periods, offering personalised advice that transcends short-term trends. In education, they could serve as long-term learning companions, tracking student progress over years.

"A science agent could pursue ambitious goals, such as discovering a new material or reducing carbon dioxide," they suggest. "Such an agent could analyse real-world observations over an extended period, developing and running simulations and suggesting real-world experiments or interventions."

Silver and Sutton propose that this "Age of Experience" could potentially eclipse the current focus on "thinking" or "reasoning" AI models. They argue that while reasoning agents often "imitate" human language in their explanations, human thought itself can be constrained by inherent assumptions. An AI trained solely on human reasoning from past eras, for example, might approach scientific problems with outdated frameworks. Experiential agents, by directly learning from the world, could potentially bypass these limitations.

The researchers are optimistic about the transformative potential of experiential AI, predicting "unprecedented capabilities" and "a future profoundly different from anything we have seen before". However, they also acknowledge significant risks. Beyond concerns about job displacement, they highlight the reduced opportunities for human intervention in the actions of agents that can autonomously operate in the world over extended periods to achieve long-term goals.

On a more positive note, they suggest that an agent capable of continuous adaptation could learn to recognise and modify its behaviour in response to human concerns or distress.

Ultimately, Silver and Sutton are confident that the data generated through streams of experience will far surpass the scale and richness of current AI training data, such as Wikipedia and Reddit. This paradigm shift, coupled with advancements in reinforcement learning algorithms, could potentially unlock capabilities that exceed human intelligence in numerous domains, hinting at the eventual arrival of artificial general intelligence or even super-intelligence. "Experiential data will eclipse the scale and quality of human-generated data," they conclude. "This paradigm shift, accompanied by algorithmic advancements in RL, will unlock in many domains new capabilities that surpass those possessed by any human."

Defoes