What DeepMind’s AlphaStar beating StarCraft players means for AI research

Jan 23 2019

After playing benchmark matches back in December, DeepMind’s StarCraft playing AI AlphaStar has beaten professional players in a series of games.

Blizzard’s StarCraft is complex E-sports game with no single winning strategy. It has its own AI in singleplayer mode, but it relies on hand crafted rules, having somewhat more information on the state of the map and its opponents than actual players, and being able to execute commands simultaneously, much faster than humans. Given its complexity, beating humans is considered another, huge milestone in AI research. But all other StarCraft AIs before relied mostly on a series of manually written rules and restrictions. None of them came close to a professional player’s level, until now.

But AlphaStar is not Artificial General Intelligence. DeepMind’s systems are not one single model, capable of beating humans in GO, chess and StarCraft at the same time. Neither of their models would be able to beat a human in 5-in-a-row, checkers or WarCraft. Still, technologically this achievement might be even more significant than the rest:

  • StarCraft matches are very long, and the number of actions executed are much higher than the number of turns in a GO game. Decisions made at the very beginning of the game, might only turn out to be useful during the endgame. Wrong decisions could still disguise themselves as useful ones, if later on the player manages to fix them and turn the game around. A strategy’s success can only be fully evaluated at the end of each game, while all successful strategies can still be countered with other ones (especially in the case of StarCraft). AlphaStar agents were able to learn multiple, generally good long term strategies without relying on exploitation only.
  • Player’s don’t see the entirety of the map and all their opponents’s actions and only have a rough idea of what strategies they are going to chose, meaning that StarCraft is both an incomplete and imperfect information game. This means that AlphaStar is able to “come up” with successful long term strategies and even alter them in the face of unknown.
  • Actions take time to have an effect, commands will only be fully executed later, and only if the environment allows it. AlphaGo could move pieces to any free location, and its decision would alter the board immediately. AlphaStar on the other hand needs to have some notion of time, how to time its plans ahead and know which strategies are more useful during different phases of the game.
  • Unlike Turn-Based board games, StarCraft is a Real-Time strategy game, where players take actions simultaneously. The faster a player can respond to their observations, the more optimal the outcome will be on their part. The engineers at DeepMind have managed to train a Deep Learning model that is able to respond to its environment as quickly as a professional would (or even faster, but they disallowed that). Training Deep Learning models is expensive, but running them is fast and cheap, hence the spread of AI chips running models in new generation smart phones. Even though training the agents was done for 2+ weeks on 16 Cloud TPU V3s in parallel with 420 teraflops each (for measure, a PS4 Pro has a 4.2 teraflops GPU) the resulting AlphaStar league models fit on a pendrive and can even run on an average desktop PC.

So why is this still not General Intelligence?

The games were all played on 1 single map, with 1 single race (protos) against itself. A minimap-like representation of the gaming field was fed to the models instead of a series of screenshots from StarCraft. AlphaStar has not learned StarCraft, it has learned which actions to execute on a map and how to time them, based on available information. This is no minor difference: AlphaStar would probably fail to beat humans on different maps, with or against different races, or even at playing a different version of the game. Although its newer versions will most likely manage to play other races, the abilities learned will still differ to how we perceive and select actions in an environment. Don’t get me wrong, AlphaStar winning is still a huge achievement for humankind, but it also means that it has only succeeded in a narrow domain: coming up with a series of useful actions, does not equate to being a strategist over different fields. This is most clear during the very last game against MaNa, where he finds an exploit in AlphaStar’s strategy which he continuously abuses to finally win (forcing humans to make mistakes or computer players to redo suboptimal strategy are also strategies used by pro players). His trick could have been avoided by most human players by simply building a single countering unit and putting a stop to it. But AlphaStar has never before seen such information, and did not have a series of learned actions to execute in response. AlphaStar has not learned StarCraft. Yet it’s a never before seen solution for areas where strategic planning is needed.

AlphaStar winning means that in theory, we now has the technology to use AI in fields where long-term strategies are needed, information is only partly available, and real-time decision making is key. Despite not being Artificial General Intelligence, it’s a promising solution to most complex cognitive problems humans had to figure out on their own until now. Except we still have to somewhat narrow down the problem if we want AI to successfully tackle it.

Source: Towards Data Science