 
   Key takeaways:
- Google's new Game Arena will allow models to compete in games head-to-head.
- You can tune in to the Game Arena at 12:30 p.m. ET Tuesday.
- The effort could open the door to new business applications.
As artificial intelligence evolves, it's becoming increasingly difficult to accurately measure the performance of individual models.
To that end, Google unveiled on Tuesday the Game Arena, an open-source platform in which AI models compete in a variety of strategic games to provide "a verifiable, and dynamic measure of their capabilities," as the company wrote in a blog post.
Also: OpenAI wins gold at prestigious math competition - why that matters more than you think
The new Game Arena is hosted in Kaggle, another Google-owned platform in which machine learning researchers can share datasets and compete with one another to complete various challenges.
This comes as researchers have been working on new kinds of tests to measure the capabilities of AI models as the field inches closer to artificial general intelligence, or AGI, an as-yet theoretical system that (as it's commonly defined) can match the human brain in any cognitive task.
Serious play
Google's new Game Arena initiative aims to push the capabilities of existing AI models while simultaneously providing a bounded framework for analyzing their performance.
"Games provide a clear, unambiguous signal of success," Google wrote in its blog post. "Their structured nature and measurable outcomes make them the perfect testbed for evaluating models and agents. They force models to demonstrate many skills including strategic reasoning, long-term planning and dynamic adaptation against an intelligent opponent, providing a robust signal of their general problem-solving intelligence."
Critically, games are also scalable; it's easy to increase the level of difficulty, thus theoretically pushing the models' capabilities.
"The goal is to build an ever-expanding benchmark that grows in difficulty as models face tougher competition," the blog post notes.
Ultimately, the initiative could lead to advancements beyond the realm of games. Google noted in its blog post that as the models become increasingly adept at gameplay, they could exhibit surprising new strategies that reshape our understanding of the technology's potential.
It could also help to inform R&D efforts in more economically practical arenas: "The ability to plan, adapt, and reason under pressure in a game is analogous to the thinking needed to solve complex challenges in science and business," Google said.
All fun and games
Artificial intelligence has always been about games.
The field emerged in the mid-20th century in conjunction with game theory, or the mathematical study of strategic interaction between competing entities. Today's models "learn" essentially by playing millions of rounds of games against themselves and refining their performance based on how well they achieve some predetermined goal, which can range from predicting the next token of text to generating a video depicting real-world physics.
Games have also long been an important benchmark that AI researchers have used to assess model performance and capability. Meta's Cicero, for example, was trained to analyze millions of games of the board game Diplomacy played by humans. Through a large language model, Cicero learned to play Diplomacy by typing the words it believed a human player would say in each move. Its performance was then measured through gameplay with human users, who assessed its ability to make strategic decisions and communicate those through natural language.
Also: My 8 ChatGPT Agent tests produced only 1 near-perfect result - and a lot of alternative facts
And unlike more esoteric industry benchmarks like the International Math Olympiad, games offer a poignant context for the average layperson. It may not mean much to non-experts when they hear that an AI model beat human experts at debugging computer code, for example, but it packs a weighty emotional punch when a chess grandmaster, say, is defeated by a computer, as happened for the first time in 1997 when IBM's Deep Blue defeated Gary Kasparov.
Games can also help to reveal new and unexpected behavior from algorithms. One of the most famous (or infamous, depending on your point of view) moments from the history of AI was AlphaGo's "Move 37" during the model's historic 2016 game against Go champion Lee Sedol. In the moment, the move vexed human experts, who said it defied logic. But as the game progressed, it became clear that the move had in fact been a stroke of unconventional and creative brilliance, one that allowed AlphaGo to defeat Sedol.
You can tune in to the Game Arena at 12:30 p.m. ET on Tuesday to watch a chess showdown between eight frontier AI models.
Artificial Intelligence
- 
     Coding with AI? My top 5 tips for vetting its output - and staying out of trouble
- 
     The best AI for coding in 2025 (and what not to use)
- 
     I found 5 AI content detectors that can correctly identify AI text 100% of the time
- 
     I'm an AI tools expert, and these are the only two I pay for (plus three I'm considering)
 
       
              
         
     
    




















