Concrete info: top 100 with RL/ policy gradients


@mathias just a short while longer... getting bored of hols now which is good
I'm loving the SB2, great ergonomics and the 1060 dgpu is a big plus. The screen is much too glossy. if you want a dev-only (ie some but not serious gaming) machine have a look also at the xps13 or xps 15 linux versions. xps 13 in particular is nearly perfect if you travel a lot . Or the mbp although the xps13 outperforms the mbp on nearly every axis. the x1 carbon is awesome though, cant really go wrong with any of the above (except the mbp) :wink:


I saw this and came up with a (hopefully valid) solution -- you can "pad" the map with more empty space around the edges, to let the gaussian settle down, then truncate the size again. In my case, I add on something like another 20 pixels to each edge, then after applying the radial basis function, I truncate the map again. It seemed to get rid of weird values at the edges. Full disclosure: I accidentally found this approach in this Stackoverflow answer:


Thanks, it definitely looks like an interesting approach. However, it turns out that it was the result of a small miscalculation in my gradient formula that I was using, So I've now fixed that up which has resolved most of the ships running off the screen. There's still a couple that do this that might be fixed by the approach you've mentioned, but I want to train my parameters properly before using more methods.


Here’s another paper for those playing with various optimization strategies


Hey, @arjunvis!

How generous of you sharing your knowledge! Thanks a lot! Made my day (and ruined my night :wink:)
I tried to micro-manage my bots and assigning tasks to them but that made them look even more stupid than before. This approach looks very elegant to me. Have to take a deeper look into this concept and how to generate the flowfield. My math-knowledge on that topic is kinda rusty...


@thorncorona , without your basis function code hard to be sure but definitely your plot looks too discontinuous . Points close together should have similar vectors to each other. It almost looks like you have a division by zero ?


@arjunvis I was wondering how you were able to train parameters quickly using the LBFGS method? I wrote a training program using simulated annealing, but it takes ages to run even 25 runs with 2 bots and 25 with four as one iteration of the algorithm. As a result, it doesn't seem feasible for training using that method. The main restriction seems to be the running time of those 50 instances of halite. I was using the winrate of the haliteBot against my control Bot as the score value for simulated annealing, but it just takes too long to calculate that value. Is there a better approach you could recommend?


@arjunvis First of all thanks for the interesting informations.
I feel quite stupid, but... how did you derive the real values from the gradient to 3d plot the map?


@arjunvis Amazing work! Definitely changed my view of how machine learning applies on real tasks.

I am trying to reimplement your approach and see where it leads to. But I am running into the same problem as @udpatil: the evaluation is just too slow. It takes up to 10 second for one match on my laptop. I saw you did quite some work to speed the process up:

I will start with recompiling the game engine and see where it brings me. I am using the scipy optimizer L-BFGS-B. Just signed up for google cloud platform as well for 24 cores, although I am unsure whether multiple cores help with my problem at all. If you have any other suggestions, please let me know.


Hi @vricci , I start with the scalar potential function - ie i have a function z = f(x,y). it's easy to plot this function, which produces hills and valleys. I can then differentiate this function wrt x and y to get the gradient at any point. That is, i calculate the gradient from the potential function, not the other way around .

The scalar function is a sum of many individual peaks and valleys , as detailed in the long post.

The form of the scalar function is such that it is a) sensible , b) interpretable and c) calculating the gradient of the function is almost free assuming i have already calculated the function itself

Hope rereading the long post with the above context helps- pm me if you have further issues ! cheers -Arj


Hi @udpatil and @JD-ETH
The issue of speed v complexity is a good q and deserves deeper exploration:
This thread on the ML subforum touched on speed issues

My personal thoughts are

  • First thing: i would check how much time your bot uses vs how much time the engine uses. Run a game with full stats output and you will see how many milliseconds each bot takes on average. Eg if a game takes 10 seconds in total , and the sum of average_frame_response_time x number of frames alive for all bots is substantially less than 10 seconds, then you are engine-bound. Its v important to do this as it will give you a sense of what you can realistically do: eg if you take a minute per game, then youre better off relying on good old human artisanal coding, which I still think will win this contest :wink: OTOH if you get down to even a game a second (including parallelism), that's really not bad at all- you can get to sub-100 rank from scratch in a couple of hours.

  • It is always a good idea to speed up the execution engine, particularly for RL based approaches which require running many games - but now you may not have time to rewrite it before the contest ends .

  • More cores are useful, if you use them . Eg with 24 cores, you should be able to run several games in parallel. I am not sure if the python gym does this, i rolled my own game gym in matlab. GCP and AWS are not exactly cheap , buying hundreds of cores for a few days will easily hit hundreds of dollars. So first, do make sure your game testing code is actually running in parallel and maxing out your cores. This , at this point, seems like a better effort than rewriting the engine: ie, make sure you can run many games in parallel, properly, before you shell out for cores :slight_smile:

Having said that, if your model is parsimonious, you can run maybe 100 sets of 72 individual games in a few hours at most, several minutes at best , and get a strong result- you dont need to run hundreds of thousands of games.



Hi @arjunvis,

Thanks for the suggestion!

My average_frame_response pretty much take half of the real time(depends heavily on how many were spawned though), Right now around 12 -18 seconds each on IBM T460p. I also recompiled the environment with O3 and march=native. No significant gains however.

So I started looking at parallel processing. (multi threading) . I wish I could use "parfor" here as in matlab :stuck_out_tongue:

Currently my naive version simply rewrites the file for change of parameters, which is pretty slow and probably won't work with multi-threading. I suppose you created function calls for all this by modifying the game engine?

With joblib and running on 24 cores, my real time for evaluating 72 1v1 games on a 288x192 map became 30 seconds so 0.4 s/ game. Next step run the optimization!



@arjunvis, Thank you for this tutorial!

I may be missing something obvious, but how do we calculate direction from the radial basis function given?

I have the function implemented, but once the "weight" is all summed up, how does one decide which direction to move when rolling down a hill or into a valley?

Thank you!


My math and calculus ain't so hot, so I didn't understand this either.

Are you supposed to calculate distance separately for x and y to get 2 sums? I was calculating distance as distance_squared = (x1 - x2)2 + (y1 - y2)2. But that gives a single value.


Whoops- been busy with work and ignoring halite , so missed all this.

Lets suppose you have m ships and there are n total objects (your ships, enemy ships, slots etc). let's also suppose that every ship and object has attributes x,y g and w (where g is the kernel width, and w is the weight, which is +1 for your ships and -1 otherwise.
The +1 creates a hill, and the -1 creates a valley.

then you are doing just m*n pairwise distance calculations and exponents, something like

grad_x = 0;grad_y= 0;
for s in my_ships
    for o in all_objects
        d = (s.x - o.x)**2 + (s.y-o.y)**2
        f = o.w * math.exp(-o.g * d) 
        grad_x += o.w * o.g * f * (s.x - o.x)
        grad_y += o.w * o.g * f * (s.y - o.y)

For each iteration of the outer loop, grad_x, grad_y are the x,y components of your direction from s (note, the length of grad_x,grady could be large- this is ok, you just want the direction). i wrote this on my phone so check the syntax but the math is good.

kind regards and hope you are enjoying explorations into machine learning !

    grad_y += o.w * o.g * f * (s.x - o.x)

Shouldn't this be (s.y - o.y)?

I really appreciate the clarification on the gradients. Thanks a lot.

What kind of parameter would I need to make a friendly ship run away from being outnumbered?


It currently won't. Enemyships are attraction, so you are running into a bunch of them even harder.


Doh yes, corrected , s.y-o.y of course , thank you @dsbmac



Very late now but curious how you chose your kernel widths. Both that and the exponent for the distance in your function seem like they could also be optimised through reinforcement learning but I'm presuming it would take a lot longer to train. For example, I'd imagine for friendly ships you'd want a small width coupled with a higher power on the distance so it'd produce a steeper gradient but only at shorter distances? Or do you find when training the weights that this isn't an issue?


please kindly share your simple policy-gradient based approach code this is really interesting. Thanks @arjunvis this is amazing.