|
Post by steadyeddie on Dec 18, 2016 6:18:57 GMT -8
The more I write, the more I realise the 1 control thread is clearly barmy in the long run (high CPU cores world). Oh well, it's what I've got for now.
Turn on a profiler against your Player for the first time and you'll find a bunch of crass performance bugs. Once I'd taken mine out, for a game where the control thread was the bottleneck (C4 is may favorite), I found the following:
⦁ If you are blocked in methods near a synchronize, it's the synchronize- find another way.
⦁ You need to be very careful with processing results and back propagation (see later).
⦁ Actually, posting and reading from the worker queues is not that expensive if you get the locking right.
⦁ Selection of nodes is surprisingly expensive. First up I discovered an inaccurate, but fast version of Math.log that has helped me a lot. And caching of values to avoid recalculation.
|
|
|
Post by Andrew Rose on Jan 27, 2017 13:57:48 GMT -8
Node selection is very expensive for Sancho too. I'm pretty sure I tried the cheap and chearful version of Math.log (and friends) that was mentioned in the chat channel during the last competition. Sadly, for Sancho, I don't think I saw any difference in performance, even for C4 (which is also my go-to game for tree-thread bottleneck).
|
|
|
Post by steadyeddie on Jan 30, 2017 16:00:45 GMT -8
To put a figure on how fast my selection is now, towards the end of a game, when I'm faking a lot of rollouts, and don't have to bother the OS with annoying context switching, I get 500K-1M selections per second. I'm not as good when I'm full on sending/receiving to worker threads. At C4 I'm seeing (using your propnet at 40-50,000 rollouts/second/thread) a total throughput of ~200K rollouts/sec in the tree.
|
|