- Imitation Learning
- 3 LSTM NNs
- AlphaStar plays the full game of StarCraft II, using a deep neural network that is trained directly from raw game data by supervised learning and reinforcement learning.
- To start, a player must choose to play one of three different alien “races” – Zerg, Protoss or Terran, all of which have distinctive characteristics and abilities
- Each player starts with a number of worker units, which gather basic resources to build more units and structures and create new technologies. These in turn allow a player to harvest other resources, build more sophisticated bases and structures, and develop new capabilities that can be used to outwit the opponent. To win, a player must carefully balance big-picture management of their economy – known as macro – along with low-level control of their individual units – known as micro.
- AlphaStar’s behaviour is generated by a deep neural networkthat receives input data from the raw game interface (a list of units and their properties), and outputs a sequence of instructions that constitute an action within the game. More specifically, the neural network architecture applies a transformer torso to the units (similar to relational deep reinforcement learning), combined with a deep LSTM core, an auto-regressive policy head with a pointer network, and a centralised value baseline. We believe that this advanced model will help with many other challenges in machine learning research that involve long-term sequence modelling and large output spaces such as translation, language modelling and visual representations.
AlphaStar also uses a novel multi-agent learning algorithm. The neural network was initially trained by supervised learning from anonymised human games released by Blizzard. This allowed AlphaStar to learn, by imitation, the basic micro and macro-strategies used by players on the StarCraft ladder. This initial agent defeated the built-in “Elite” level AI – around gold level for a human player – in 95% of games.
These were then used to seed a multi-agent reinforcement learning process. A continuous league was created, with the agents of the league – competitors – playing games against each other, akin to how humans experience the game of StarCraft by playing on the StarCraft ladder. New competitors were dynamically added to the league, by branching from existing competitors; each agent then learns from games against other competitors. This new form of training takes the ideas of population-based and multi-agent reinforcement learning further, creating a process that continually explores the huge strategic space of StarCraft gameplay, while ensuring that each competitor performs well against the strongest strategies, and does not forget how to defeat earlier ones.