Post by steadyeddie on Dec 18, 2016 6:16:07 GMT -8
Back in the summer of 2015 I looked upon the wonders of EC2, and figured I could use the 32 core machine to pimp my player for the Worlds. How wrong I was. SteadyEddie wasn't that bad on a 8 core laptop, but finished a miserable last on 32 cores. Amongst the reasons for this was that I had done no performance tuning of memory, and I was crushed with timeouts during GC which was much higher with more cores.
For a while afterwards I struggled to convince Java to encourage Java to GC (without STW) at a time of my choosing. There is a discussion is StackOverflow about it, which ultimately lead me to believe what I was trying to do was the wrong approach.
Nowadays, following a lot of memory profiling, SteadyEddie hits GC very rarely. The trick is to reuse everything. It's incredibly tiresome, but I have a class whose entire job is maintain objects for reuse. As a result you have to track down all the places you create objects.
As the basic free profilers like visualvm don't do memory profiling, I recommend the "YourKit" tool. It's very expensive for an individual, but I managed to convince them GGP counted as academia to reduce the cost to $120 or so. At the very least it's available as a trial. It kicks butt so hard I even managed to get my tight fisted employer to shell out. Right now I'm recovering from the awesome fact that I can have a license on my laptop and attach to remotely running programs in an EC2 monster that doesn't have to have a license.
Cousera suggests G1 for your GC. I don't believe that is actually optimal on todays high end boxes. Also, my advice when you have a lot of memory available is to have a high eden space and survivor spaces. Here's my GC config:
nohup java -Xmx24g -XX:NewSize=2000M -XX:SurvivorRatio=1 -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark -cp NN-Tiltyard56C.jar org.ggp.base.player.gamer.statemachine.sample.SteadyEddieCreator
Finally, one limitation of SteadyEddie, which is high up on my Trello "to do" list is to fix the fact that SteadyEddie reclaims Nodes ater a move blocking the control thread. This can take 3 seconds, which after a 3 second safety, is 25% of the CPU time available in a 15 second move! Either I need to do this asynchronously, or wait and do 2 reclaims in the off-move.
Edit 23 Dec 2016: I now reclaim in the off move.
For a while afterwards I struggled to convince Java to encourage Java to GC (without STW) at a time of my choosing. There is a discussion is StackOverflow about it, which ultimately lead me to believe what I was trying to do was the wrong approach.
Nowadays, following a lot of memory profiling, SteadyEddie hits GC very rarely. The trick is to reuse everything. It's incredibly tiresome, but I have a class whose entire job is maintain objects for reuse. As a result you have to track down all the places you create objects.
As the basic free profilers like visualvm don't do memory profiling, I recommend the "YourKit" tool. It's very expensive for an individual, but I managed to convince them GGP counted as academia to reduce the cost to $120 or so. At the very least it's available as a trial. It kicks butt so hard I even managed to get my tight fisted employer to shell out. Right now I'm recovering from the awesome fact that I can have a license on my laptop and attach to remotely running programs in an EC2 monster that doesn't have to have a license.
Cousera suggests G1 for your GC. I don't believe that is actually optimal on todays high end boxes. Also, my advice when you have a lot of memory available is to have a high eden space and survivor spaces. Here's my GC config:
nohup java -Xmx24g -XX:NewSize=2000M -XX:SurvivorRatio=1 -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark -cp NN-Tiltyard56C.jar org.ggp.base.player.gamer.statemachine.sample.SteadyEddieCreator
Finally, one limitation of SteadyEddie, which is high up on my Trello "to do" list is to fix the fact that SteadyEddie reclaims Nodes ater a move blocking the control thread. This can take 3 seconds, which after a 3 second safety, is 25% of the CPU time available in a 15 second move! Either I need to do this asynchronously, or wait and do 2 reclaims in the off-move.
Edit 23 Dec 2016: I now reclaim in the off move.