VideoGameBench, a new tool developed to test how well artificial intelligence (AI) models can play video games, has revealed that even advanced models still struggle with older, simpler titles.
The benchmark was designed to evaluate vision-language models like GPT-4o, Claude Sonnet 3.7, and Gemini 2.5 Pro using a set of 20 popular games, including Doom, Prince of Persia, and Warcraft II.
Instead of relying on code or special inputs, these models were only given the visual game screen to decide their next move. The AI takes a screenshot, analyzes it, suggests an action, and then tries to carry it out.
Did you know?
Subscribe – We publish new crypto explainer videos every week!
What is Terra Luna? History & Crash Explained (ANIMATED)
This delay is especially noticeable in fast-paced games like Doom, where quick reactions are key. If the AI takes too long to respond, the situation on the screen has already changed, which makes its decision outdated. For example, an enemy might have moved, or the player may already be in danger before the model responds.
According to the research team, current models are not only slow to react but also struggle with basic tasks. They often miss items, fail to interact with the environment properly, or keep repeating the same actions without making progress.
The team used older Game Boy and MS-DOS games because their simple graphics and variety of control types provide a good way to test how well models understand space and timing.
The benchmark was developed by computer scientist Alex Zhang, who explained that these games help reveal how much work is still needed before AI can play games reliably in real-time.
Meanwhile, on April 14, Meta received approval from the EU’s data regulator to use public posts from its platforms to train its AI systems. What does this mean? Read the full story.
Having completed a Master’s degree in Economics, Politics, and Cultures of the East Asia region, Aaron has written scientific papers analyzing the differences between Western and Collective forms of capitalism in the post-World War II era.With close to a decade of experience in the FinTech industry, Aaron understands all of the biggest issues and struggles that crypto enthusiasts face. He’s a passionate analyst who is concerned with data-driven and fact-based content, as well as that which speaks to both Web3 natives and industry newcomers.Aaron is the go-to person for everything and anything related to digital currencies. With a huge passion for blockchain & Web3 education, Aaron strives to transform the space as we know it, and make it more approachable to complete beginners.Aaron has been quoted by multiple established outlets, and is a published author himself. Even during his free time, he enjoys researching the market trends, and looking for the next supernova.