OpenAI, the research organisation co-founded by Elon Musk, has made history by creating artificial intelligence-based bots that can play and compete as a team in one of the most complicated strategy games in esports. They’ve even beaten the current reigning champions.

It may seem like a foregone conclusion that computer programs should be able to beat humans in almost any game that they are trained to play; however, such conclusions do not typically take into consideration the complexity of most mobile online battle area (MOBA) games. In the same way that it wasn’t until 2017 that computer programs were able to beat humans at the ancient Chinese board game Go, computer games such as MOBAs increase the amount of variables for a computer program to take into account by a factor of thousands.

The MOBA chosen for this feat was Dota 2, the game with the biggest prize pool in all of esports (a prize pool of over USD25 million for the biggest tournament of 2018) and its worth taking a little bit of time to explain just how complex Dota 2 is to understand what an accomplishment this is.

prize-pools-in-major-tournaments

Dota 2 (typically just referred to as Dota; the original game, Defence of the Ancients, was a community-developed mod for Warcraft III back in 2003, now almost exclusively only played by those seeking a nostalgia kick) is a game where two teams of five battle each other in a complex strategy setting with the objective of destroying the opposing team’s base. At the opening stage of a game, the two teams of five take turns in picking a hero, a playable character, from the pool of over 100 different beasts and wizards.

For added complexity, throughout the game each player earns gold in exchange for certain tasks, such as taking an enemy tower or picking up a bounty rune. With this gold players can purchase items which improve their hero in a certain way – there are hundreds of items to choose from. As if that wasn’t enough, heroes also develop throughout each game, gaining strength and additional abilities the longer the game goes on and the better a player performs – hopefully this gives some idea of the complexity involved that an AI would need to not just learn, but master.

dota-2

Dota 2 from a typical player’s perspective

Back in 1997 when Deep Blue beat Gary Kasparov, then the reigning world chess champion, the methods used to beat the human player were vastly different to how modern reinforced learning method are now utilised, such as in the OpenAI bots. Deep Blue effectively brute forced its way through the opposition; using its vast calculations-per-minute ability to meritoriously look at every possible move combination and pick the one most likely to result in a win. For the OpenAI bots, if it wasn’t enough added complexity to trawl through all of the possible permutations of heroes, items and skill builds (there is no point in quoting such numbers, especially when they begin to enter the realms of “more than there are atoms in the universe”) the real added complexity far above and beyond that of a chess playing computer program is the fact that Dota 2 is played in real time – there is no turn-based time to pause and think (for the bots or the humans!).

Instead of the old brute force methods, the OpenAI bots use reinforced learning, whereby they are incentivised as to what “good” outcomes are and use past experiences to bring about these positive outcomes. To get to this standard of gameplay, the OpenAI bots have reportedly played an equivalent of 45,000 years’ worth of Dota 2.

The human vs robot match itself, for a seasoned spectator of Dota 2 at least, was fascinating to watch as the bots do not play like humans at all; they have developed their own set of tactics, which to the uninitiated observer would look bizarre or even counterproductive. The primary example of this is the bots utilisation of “buy-backs,” a game mechanic that allows a player who has fallen in battle to use gold they have earned to “respawn” (re-join the game) instantly, rather than waiting the required 60-120 seconds. Given how valuable and hard-earned gold is to players, buy-backs are typically only used as a last resort, when not doing so would result in certain loss. The bots, however, use buy-backs instantly, at every opportunity. It’s notable that the only time human players do this is when they are overwhelmed by frustration, giving it the nickname a “rage buy-back,” a manoeuvre that often attracts ridicule for not being able to keep one’s cool. This persistent use of the rage buy-back genuinely gives the bots a Terminator style persistence to their game play which is truly quite terrifying.

This effectively brings OpenAI’s experiment into Dota-playing bots to a conclusion; they have achieved the ultimate goal, and Sam Altman, co-founder and CEO of OpenAI, states that there are now no multiplayer games that the bots could not ultimately master beyond human capability. OpenAI will now take their research into the real world, as fun as it may be to create the ultimate gaming machines…

For more on AI, machine learning, deep neural networks and the legal and regulatory issues that touch upon them, come along to DLA Piper’s European Tech Summit in London on 15 October. More details here.