| |
Tesauro, G.,
"Temporal difference learning and TD-Gammon",
Communications of the ACM, 38, 3, March 1995, pp. 58-68.
58 ;;Quote: TD-Gammon is a self-training neural network for backgammon that outperforms other programs and, sometimes, human experts
| 59 ;;Quote: the goal of temporal difference methods is to match the learner's current prediction for a pattern with the next prediction at the next time step
| 59 ;;Quote: deep search can not be used for backgammon because there are several hundred possible combinations per ply
| 61 ;;Quote: TD-Gammon is a multilayer perception network with 40 hidden units and backgammon feature encoders
| 61+;;Quote: with just raw-encoding, TD-Gammon was a strong intermediate after 200,000 training games
| 65 ;;Quote: TD-Gammon is successful because of the randomness of backgammon and a fairly smooth outcome function
| 65+;;Quote: even with a random initial network, TD-Gammon would terminate in, at most, several thousand moves
| 67 ;;Quote: human experts use TD-Gammon to evaluate the best move for a position by playing the position to completion several thousand times
