SteadyEddie (part 6/16): Performance-threading and locking

steadyeddie
Junior Member

Posts: 58

SteadyEddie (part 6/16): Performance-threading and locking Dec 18, 2016 6:17:37 GMT -8

Quote

Post by steadyeddie on Dec 18, 2016 6:17:37 GMT -8

SteadyEddie (in competitions) now runs at about 120,000-250,000 (sometimes 300,000) real rollouts per second. Ok, there are complex games where this doesn't hold, but it's true often. Why? Because I've recently started experimenting with the EC2 m4.16xlarge instance, which has 64 cores, so rollout threads only need to bring in ~2,000 rollouts per second, and then the contol thread starts to get overloaded (see below).

A detour into EC2 instances (as at Dec 2016). The 64 core systems are new, and for cost effectiveness I recommend spot instances where you can get a 80+% discount if you are prepared for your instance to die at any time. The finals day cost me $3 for a 64 core box. Even more exciting is the x1.32xlarge instance with 128 cores. But be careful with that, the spot price really does go up to over $10 an hour (it is usually $1/hour)- and so you'll need deep pockets or be very careful. Oh, and you have to ask nicely for them too. I said I was exploring ultra high end machine learning performance and that was good enough. I'm sure in years to come 128 cores will look like peanuts, but for now, wow, 128 cores. You need to run there at least once, and hope you get some nice games dealt you from tiltyard.

Writing a pipeline which needs to push 150,000 items per second from one thread to 50, and another which takes 50 threads input into a single thread has proven to be an engineering challenge.

First I used "synchronize" which was woefully inadequate for the task. I seem to recall low 10,000s of lock grabs maxed out the control thread. So I tried the java.concurrent package classes which seemed likely to be the right tool for the job. They too maxed out before 100,000 rollouts per second total.

So eventually I wrote simple classes to pass the work items using atomic operations, and overlaid them with hand written queueing classes. If you do this for yourself you will need to use the "volatile" keyword to make sure a value written by one thread actually appears for other threads- very exciting!

Going forward it seems like as if I may be ahead of the game for now, but better designs are clearly available.

Last Edit: Dec 18, 2016 12:17:13 GMT -8 by steadyeddie

Claus
New Member

Posts: 18

SteadyEddie (part 6/16): Performance-threading and locking Dec 30, 2016 14:21:31 GMT -8

Quote

Post by Claus on Dec 30, 2016 14:21:31 GMT -8

Thanks for your thoughts on spot instances. I am also running on Amazon EC2 but had not seriously considered spot instances. Prompted by your post, I see that the arbitrary termination property is much more manageable than I had imagined. Cheap(er) hardware, here I come.

steadyeddie
Junior Member

Posts: 58

SteadyEddie (part 6/16): Performance-threading and locking Jan 4, 2017 16:50:33 GMT -8

Quote

Post by steadyeddie on Jan 4, 2017 16:50:33 GMT -8

It's also worth exploring Google Compute and vCloud air. I believe both still offer $300 of free compute, but max out at 16 cores. I used both and fully used my free $$$ on vCloud (enjoy the networking setup!).

Andrew Rose
Global Moderator

Posts: 100

SteadyEddie (part 6/16): Performance-threading and locking Jan 27, 2017 13:54:40 GMT -8

Quote

Post by Andrew Rose on Jan 27, 2017 13:54:40 GMT -8

If you're thinking about Java and performance, I can't recommend the Mechanical Sympathy blog (https://mechanical-sympathy.blogspot.co.uk/) highly enough. The content is old now (last post in 2014 and much of the useful stuff well before that), but definitely worth a read.

Using the Disruptor (https://lmax-exchange.github.io/disruptor/), either the code or the pattern, can give huge benefits for queueing performance.

General Game Playing

SteadyEddie (part 6/16): Performance-threading and locking

Post by steadyeddie on Dec 18, 2016 6:17:37 GMT -8

Post by Claus on Dec 30, 2016 14:21:31 GMT -8

Post by steadyeddie on Jan 4, 2017 16:50:33 GMT -8

Post by Andrew Rose on Jan 27, 2017 13:54:40 GMT -8

Quick Reply