While planning for the launch of Halite II, we at Two Sigma could not ignore how important it would be to provide better machine learning (ML) support. Last year we saw a ML bot reach the 12th place in the competition. We also received feedback from the community last year that some of the game constraints -- technical bandwidth limits as well as limits to how much time bots could use -- had significantly limited the ability for ML bots to succeed. The Halite team proceeded to remove these technical barriers by providing GPU access to some players as part of hosting the game, and by increasing time limits for bot processing. But we also wanted to launch with more starter kit support for machine learning right out of the gate this year.
The Halite team decided we needed machine learning expertise on board to build the ML starter kit. We met with Sat Chatterjee who heads up the Deep Learning team at Two Sigma to evaluate how to proceed. He recruited Piotr Zielinski, an engineer with a strong interest in machine learning, to work on this, and the two of them outlined a plan for a starter bot. You can see the result in Piotr’s tutorial here. You can see the ML starter bot play on the leaderboard as tscommander-ML.
Choosing an ML Approach
With all the excitement around AlphaGo and AlphaGoZero it was natural to consider trying to apply reinforcement learning, particularly through self-play, to teach a bot to play Halite. Although that would have been a very interesting research project, given the expected complexity of the solution and the computational power needed for training, it was natural to take it out of scope for a starter bot. Instead, Piotr and Sat decided to start with supervised learning since it would be easier to get such a bot off the ground quickly. Furthermore, it would allow anyone who was interested to start playing with the ML starter bot on their laptops in short order.
What should ML do?
When thinking about a supervised learning approach to Halite, Piotr started by selecting a specific question for his ML bot to answer. Since Halite is a complex game with a branching factor billions of times greater than the game Go, this would allow the ML to focus on a relatively simple problem without grappling with the whole complexity of Halite. So to start, Piotr gave the bot the question:
“Which planets are the most attractive?”
Any decisions about what ships did once they got to the planet would be defined via procedural code and heuristics similar to a non-ML bot. Only this one decision would be defined by ML and the answer would be something like “send x% of our ships to planet A, y% to planet B, etc.”.
And we’d learn this by looking at a large number of historical games to see what winning bots normally did.
Architecture and Training
Designing a neural network to address this problem proved fairly straightforward, but with one interesting twist. The network received as inputs various attributes of the planets computed from the current game state, but since Halite can have many different maps and different numbers of planets, it made sense to make the network equivariant to permutations of the inputs i.e. the score assigned to a planet would depend solely on the features of the planet and not its order in the input. This makes it harder for the net to overfit to the training data (which is always a challenge when dealing with neural nets). See the code here for more details about how we normalize planet features within each frame.
Once we’d set up the neural net and starter bot, we needed to start training it. We keep all games ever played in a cloud storage bucket. We created a training function that will take a set of games (we used a pack of 1000 to start) and train the bot. It parses the game data, checks for bots worth imitating and develops winning strategies, then saves the output for the bot to play in the game.
Evaluating the ML Bot
Our most interesting observation from having built the ML bot was that it is difficult to know if ML is helping at all or not. With one of our earliest bots, we were very excited to see it do reasonably well against the other bots on the early leaderboard, but when we ran the null test (i.e. replaced the neural network with random numbers), we found that the bot continued to do just as well. This is because the procedural code and heuristics surrounding the neural network were too powerful compared to the value that the network was adding.
This motivated a different evaluation metric for ML going forward - instead of just comparing the ML to other bots on the board, we started comparing ML bots to each other - comparing the strengths of bots with different amounts of training (including no-training) to make sure that ML was actually adding something.
Another interesting improvement we made to the starter bot was to hard code the seed number into the starter kit. We noticed that every now and then (~1 out of 10 games) the ML bot with less training would win. This happened because of the randomness of the map -- certain situations led to strange behaviors. For the sake of all users having the same experience when they first used the starter bot, we decided to hard code the seed to create more consistency. You can see more about how we did ML vs ML training here.
One final issue that Piotr and Sat encountered when writing the ML bot was that pathfinding with the default navigate routine (in the Python starter bot) took way too long and led to timeouts when there were many ships. Since path finding was not the main focus, Piotr hacked around it by only using the navigation routine when the bot had sufficient compute budget and otherwise resorting to heading in a straight line.
Next Steps for the Bot
There are many different directions in which we could proceed from the starter bot. For example, sticking with supervised learning, it may be possible to learn to score the different actions a ship may take given the board state and its location on it. It would be natural to use convolutional nets for this purpose. One could also try to learn a global utility (or a global flow) for all the actions all the ships could take as a whole. A more challenging approach would be to learn using reinforcement learning against an existing set of bots or perhaps even through self-play.
Lessons Learned for Halite III
Writing the starter bot led to a couple of interesting insights about the competition framework itself to make it more friendly for machine learning:
Right now there are two paths in the ML starter bot code for feature construction. The first path is for training where we read in from a replay JSON and build the features. The second path is during game play when we query the game engine data structures to build the features. This duplication introduced quite a few bugs, so for the next iteration we could expose replays and live game state using the same API (or perhaps more precisely the live game state could be a “sub-API” of the replay API).
In order to encourage reinforcement learning and other self-play techniques, we need a fast simulator that we can invoke along with one or more bots without crossing process boundaries and doing IO. In other words, the simulator should be a library that we can call into either from Python or C++.
We’re excited to see our starter bot at in the top portion of the competition -- as of this writing it is ranked 690 out of the 5721 players on the board. What’s even more exciting is to see that over 100 different players are using ML as their bot language, and one ML player has achieved rank 53 --- beating the ts-admiral high-level bot. We’re curious to see how players continue to use ML and whether an ML bot can make the top 10 this year!