AI agents developed by Google’s DeepMind subsidiary have beaten human pros at Starcraft II — a first in the world of artificial intelligence. In a series of matches streamed on YouTube and Twitch, AI players beat the humans 10 games in a row. In the final match, pro player Grzegorz “MaNa” Komincz was able to snatch a single victory for humanity.
“The history of AI has been marked by a number of significant benchmark victories in different games,” David Silver, DeepMind’s research co-lead, said after the matches. “And I hope — though there’s clearly work to do — that people in the future may look back at [today] and perhaps consider this as another step forward for what AI systems can do.”
Beating humans at video games might seem like a sideshow in AI development, but it’s a significant research challenge. Games like Starcraft II are harder for computers to play than board games like chess or Go. In video games, AI agents can’t watch the movement of every piece to calculate their next move, and they have to react in real time.
A screenshot from the games in December, showing AlphaStar facing off against TLO.
These factors didn’t seem like much of an impediment to DeepMind’s AI system, dubbed AlphaStar. First, it beat pro player Dario “TLO” Wünsch, before moving to take on MaNa. The games were originally played in December last year at DeepMind’s London HQ, but a final match against MaNa was streamed live today, providing humans with their single victory.
Professional Starcraft commentators described AlphaStar’s play as “phenomenal” and “superhuman.” In Starcraft II, players start on different sides of the same map before building up a base, training an army, and invading the enemy’s territory. AlphaStar was particularly good at what’s called “micro,” short for micromanagement, referring to the ability to control troops quickly and decisively on the battlefield.
“phenomenal unit control, just not something we see very often”
Even though the human players sometimes managed to train more powerful units, AlphaZero was able to outmaneuver them in close quarters. In one game, AlphaStar swarmed MaNa with a fast-moving unit called the Stalker. Commentator Kevin “RotterdaM” van der Kooi described it as “phenomenal unit control, just not something we see very often.” MaNa noted after the match: “If I play any human player they’re not going to be microing their Stalkers this nicely.”
This echoes behavior we’ve seen from other high-level game-playing AI. When OpenAI’s agents played human pros at Dota 2 last year, they were ultimately defeated. But experts noted that the agents again played with a “clarity and precision” that was “hypnotic.” Making quick decisions without any errors is, unsurprisingly, a machine’s home turf.
Experts have already begun to dissect the games and argue over whether AlphaStar had any unfair advantages. The AI agent was hobbled in some ways. For example, it was restricted from performing more clicks per minute than a human. But unlike human players, it was able to view the whole map at once, rather than navigating it manually.
DeepMind’s researchers said this provided no real advantage as the agent only focuses on a single part of the map at any one time. But, as the games showed, this didn’t stop AlphaStar from expertly controlling units in three different parts areas simultaneously — something that the commentators said would be impossible for humans. Notably, when MaNa beat AlphaStar in the live match, the AI was playing with a restricted camera view.
Another potential sore point included the fact that the human players, while professionals, were not world-champion standard. TLO in particular also had to play with one of Starcraft II’s three races that he was not familiar with.
A graphical representation of AlphaStar’s processing. The system sees whole map from the top down and predicts what behavior will lead to victory.
This discussion aside, experts say the matches were a significant step forward. Dave Churchill, an AI researcher who’s long been involved in the Starcraft AI scene, told The Verge: “I think that the strength of the agent is a significant accomplishment, and came at least a year ahead of the most optimistic guesses that I’ve heard among AI researchers.”
However, Churchill added that as DeepMind had yet to release any research papers about the work, it was difficult to say whether or not it showed any technological leap forward. “I have not read the blog article yet or had access to any papers or technical details to make that call,” said Churchill.
Mark Riedl, an associate AI professor at Georgia Tech, said he was less surprised by the results, and that this victory had only been “a matter of time.” Riedl added that he didn’t think the games showed that Starcraft II had been definitively beaten. “In the last, live game, restricting AlphaStar to the window did remove some of its artificial advantage,” said Riedl. “But the bigger issue that we have seen… is that the policy learned [by the AI] is brittle, and when a human can push the AI out of its comfort zone, the AI falls apart.”
Ultimately, the end goal of work like this is not to beat humans at video games but to sharpen AI training methods, particularly in order to create systems that can operate in complex virtual environments like Starcraft.
In order to train AlphaStar, DeepMind’s researchers used a method known as reinforcement learning. Agents play the game essentially by trial and error while trying to reach certain goals like winning or simply staying alive. They learn first by copying human players and then play one another in a coliseum-like competition. The strongest agents survive, and the weakest are discarded. DeepMind estimated that its AlphaStar agents each racked up about 200 years of game time in this way, played at an accelerated rate.
DeepMind was clear about its goal in conducting this work. “First and foremost the mission at DeepMind is to build an artificial general intelligence,” said Oriol Vinyals, co-lead of the AlphaStar project, referring to the quest to build an AI agent that can perform any mental task a human being can. “To do so, it’s important to benchmark how our agents perform on a wide variety of tasks.”