|
Post by steadyeddie on Aug 10, 2015 14:43:02 GMT -8
Yesterday I spotted that when I expanded a node and discovered a terminal node, I wasn't doing anything with the information. Sounded daft to me.
Do I figured I'd fake up a rollout result (like when I select a terminal node). Now my player plays like a drunk monkey with a machine gun. I'm pretty sure on average it's doing better, and I _think_ it's really good near the end of a game, but the strategy has gone out the window.
Any views?
|
|
|
Post by Steve Draper on Aug 10, 2015 15:01:31 GMT -8
Unconditionally faking a rollout when you expand and find ONE child to be terminal is not necessarily a good thing, since you are short circuiting whether that node would be likely to be selected and hence normally contribute a rollout. It actually depends on its goal value. If it has a high goal value for the role that would be choosing from that node then it's probably good.
For reference the way Sancho uses this information (and you're right it can be very useful) is to see if the terminal value would propagate unconditionally (that is a maximal score [typically 100] on a node that the 100-scorer has the choice to make should always be taken). In such cases we mark the parent node (the one you were expanding) as 'complete' (pseudo-terminal in effect) and set its score to that of the child in question. This prevents the need to ever explore it again (it's a done deal - you should always select the terminal child you just found). We apply this back up the tree, completing ancestors if either the choice should always be taken or all children are now complete (so no further information can be gained by selecting through it). This propagation of 'completeness' saves a ton of wasted time, but it gets tricky in the all-children-complete case (if none of them were wins choose the highest) and especially (like MUCH more complicated) in simultaneous move games.
|
|
|
Post by steadyeddie on Aug 10, 2015 15:21:28 GMT -8
I'll try just faking the rollout result when it is good news.
I do have a lot of code to propagate confidence=100. You're right, it took ages to get right (if it is right now, who knows).
|
|