How Two Sigma Wrote a Machine Learning Halite Bot


#1

While planning for the launch of Halite II, we at Two Sigma could not ignore how important it would be to provide better machine learning (ML) support. Last year we saw a ML bot reach the 12th place in the competition. We also received feedback from the community last year that some of the game constraints -- technical bandwidth limits as well as limits to how much time bots could use -- had significantly limited the ability for ML bots to succeed. The Halite team proceeded to remove these technical barriers by providing GPU access to some players as part of hosting the game, and by increasing time limits for bot processing. But we also wanted to launch with more starter kit support for machine learning right out of the gate this year.

The Halite team decided we needed machine learning expertise on board to build the ML starter kit. We met with Sat Chatterjee who heads up the Deep Learning team at Two Sigma to evaluate how to proceed. He recruited Piotr Zielinski, an engineer with a strong interest in machine learning, to work on this, and the two of them outlined a plan for a starter bot. You can see the result in Piotr’s tutorial here. You can see the ML starter bot play on the leaderboard as tscommander-ML.

Choosing an ML Approach

With all the excitement around AlphaGo and AlphaGoZero it was natural to consider trying to apply reinforcement learning, particularly through self-play, to teach a bot to play Halite. Although that would have been a very interesting research project, given the expected complexity of the solution and the computational power needed for training, it was natural to take it out of scope for a starter bot. Instead, Piotr and Sat decided to start with supervised learning since it would be easier to get such a bot off the ground quickly. Furthermore, it would allow anyone who was interested to start playing with the ML starter bot on their laptops in short order.

What should ML do?

When thinking about a supervised learning approach to Halite, Piotr started by selecting a specific question for his ML bot to answer. Since Halite is a complex game with a branching factor billions of times greater than the game Go, this would allow the ML to focus on a relatively simple problem without grappling with the whole complexity of Halite. So to start, Piotr gave the bot the question:

“Which planets are the most attractive?”

Any decisions about what ships did once they got to the planet would be defined via procedural code and heuristics similar to a non-ML bot. Only this one decision would be defined by ML and the answer would be something like “send x% of our ships to planet A, y% to planet B, etc.”.

And we’d learn this by looking at a large number of historical games to see what winning bots normally did.

Architecture and Training

Designing a neural network to address this problem proved fairly straightforward, but with one interesting twist. The network received as inputs various attributes of the planets computed from the current game state, but since Halite can have many different maps and different numbers of planets, it made sense to make the network equivariant to permutations of the inputs i.e. the score assigned to a planet would depend solely on the features of the planet and not its order in the input. This makes it harder for the net to overfit to the training data (which is always a challenge when dealing with neural nets). See the code here for more details about how we normalize planet features within each frame.

Once we’d set up the neural net and starter bot, we needed to start training it. We keep all games ever played in a cloud storage bucket. We created a training function that will take a set of games (we used a pack of 1000 to start) and train the bot. It parses the game data, checks for bots worth imitating and develops winning strategies, then saves the output for the bot to play in the game.

Evaluating the ML Bot

Our most interesting observation from having built the ML bot was that it is difficult to know if ML is helping at all or not. With one of our earliest bots, we were very excited to see it do reasonably well against the other bots on the early leaderboard, but when we ran the null test (i.e. replaced the neural network with random numbers), we found that the bot continued to do just as well. This is because the procedural code and heuristics surrounding the neural network were too powerful compared to the value that the network was adding.

This motivated a different evaluation metric for ML going forward - instead of just comparing the ML to other bots on the board, we started comparing ML bots to each other - comparing the strengths of bots with different amounts of training (including no-training) to make sure that ML was actually adding something.

Another interesting improvement we made to the starter bot was to hard code the seed number into the starter kit. We noticed that every now and then (~1 out of 10 games) the ML bot with less training would win. This happened because of the randomness of the map -- certain situations led to strange behaviors. For the sake of all users having the same experience when they first used the starter bot, we decided to hard code the seed to create more consistency. You can see more about how we did ML vs ML training here.

One final issue that Piotr and Sat encountered when writing the ML bot was that pathfinding with the default navigate routine (in the Python starter bot) took way too long and led to timeouts when there were many ships. Since path finding was not the main focus, Piotr hacked around it by only using the navigation routine when the bot had sufficient compute budget and otherwise resorting to heading in a straight line.

Next Steps for the Bot

There are many different directions in which we could proceed from the starter bot. For example, sticking with supervised learning, it may be possible to learn to score the different actions a ship may take given the board state and its location on it. It would be natural to use convolutional nets for this purpose. One could also try to learn a global utility (or a global flow) for all the actions all the ships could take as a whole. A more challenging approach would be to learn using reinforcement learning against an existing set of bots or perhaps even through self-play.

Lessons Learned for Halite III

Writing the starter bot led to a couple of interesting insights about the competition framework itself to make it more friendly for machine learning:

  1. Right now there are two paths in the ML starter bot code for feature construction. The first path is for training where we read in from a replay JSON and build the features. The second path is during game play when we query the game engine data structures to build the features. This duplication introduced quite a few bugs, so for the next iteration we could expose replays and live game state using the same API (or perhaps more precisely the live game state could be a “sub-API” of the replay API).

  2. In order to encourage reinforcement learning and other self-play techniques, we need a fast simulator that we can invoke along with one or more bots without crossing process boundaries and doing IO. In other words, the simulator should be a library that we can call into either from Python or C++.

Conclusion

We’re excited to see our starter bot at in the top portion of the competition -- as of this writing it is ranked 690 out of the 5721 players on the board. What’s even more exciting is to see that over 100 different players are using ML as their bot language, and one ML player has achieved rank 53 --- beating the ts-admiral high-level bot. We’re curious to see how players continue to use ML and whether an ML bot can make the top 10 this year!


#2

#3

Thank you for this competition!
Have you compared the bot to one with only using ML in the first couple of rounds?
Why have you considered planets in the first place?
You have used the following features in the starter bot:

            planet.health,
            remaining_docking_spots,
            planet.remaining_resources,
            signed_current_production,
            gravity,
            my_best_distance,
            enemy_best_distance,
            ownership,
            distance_from_center,
            health_weighted_ship_distance,
            is_active

Implicitly you say that your aim is to control resources instead of attacking enemy ships.
The aim is to produce more ships till the end of the 130th round or to capture all planets before.
Looking at this deterministicly, one can calculate how many ships you need (if you consider a 50-50 destroyed/survived attack ratio) to conquer a planet given the number of defending and producing ships. Considering the growth in your own production one can calculate (first derivative) a score how long it takes to regain (break even) your pre-attack strength. In addition one needs to factor in the time to reach that planet or to pass by to another one by weight by the distance.
There is just one problem: The ships of your enemy could just be better directed. Learning from local clashes, how to attack a bunch of ships to building a strategy to maximize ships produced is in my opinion the most important part of this game since it has exponential effects (otherwise you could just mimic the enemy player), since ships can tunnel planets and since swarming implemented by heuristics is chaotic.


#4

Thank you for your time and effort in creating halite.io and building the ML starter bot, I appreciate it.

Since the purpose of Halite is to recruit (and have fun) I'm wondering if you may want to outline a learning path to next year's competition, to gain more interest in ML?

The shear amount of high quality information (paid and free) about AI/ML makes the task of gaining insight/knowledge somewhat daunting. Being an industry leader (I.E. Halite) in the ML field gives Two Sigma the opportunity to build a well defined path for those interested in developing a career in the FinAI field.

I would be willing to help develop/use such a path. Contact me if yo'd like to discuss this further, I'm local to NYC.


#5

@Cok3Zer0 A few quick thoughts from our team:

  • This was meant to be just a starter bot, not the definitive Halite bot but something to help people get started and explore their own ideas. So we didn't want to keep building a pure ML bot - we left that for players to consider on their own.

  • We picked our focus to keep effort of building the bot down as well as maintaining simplicity. We agree that some of the strategies you suggest could be valuable but potentially too time intensive.

Thanks for the thoughts!

@snavazio feel free to email us at halite@halite.io :slight_smile:


#6

Great post, very informative. This would have been even better to read while the competition was live -some useful info re ML development process in there.

Halite got me to pick back up learning ai. The starter bot was really helpful in seeing an actual implementation and tinkering with it. I'm thankful you guys took the time to put one together.

As I learned more AI (from almost scratch) in the past few months, I realized, a) 'solving' halite itself is a challenge more difficult than I realized. b) the current api for the game is not super friendly for ML development.

Learned a lot and had lots of fun. Looking forward to the next season!