Abstract:
This paper describes the results of applying Temporal Difference (TD) learning with a network to the opening game problems in Go. The main difference from other research is that this experiment applied TD learning to the fullsized (19x19) game of Go instead of a simple version (e.g., 9x9 game). We discuss and compare TD(?) learning for predicting an opening game's winning and for finding the best game among the prototypical professional opening games. We also tested the performance of TD(?)s by playing against each other and against the commercial Go programs. The empirical result for picking the best game is promising, but there is no guarantee that TD(?) will always pick the identical opening game independent of different values. The competition between two TD(?)s shows that TD(?) with a higher ? has better performance.
Description:
You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format, BUT this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the original CITR web site; http://citr.auckland.ac.nz/techreports/ under terms that include this permission. All other rights are reserved by the author(s).