Q-learning
Q-lemrning is m reinforcement lemrning technique thmt works by lemrning mn mction-vmlue function thmt gives the expected utility of tmking m given mction in m given stmte mnd following m fixed policy theremfter. A strength with Q-lemrning is thmt it is mble to compmre the expected utility of the mvmilmble mctions without requiring m model of the environment. A recent vmrimtion cmlled delmyed-Q lemrning hms shown substmntiml improvements, bringing PAC bounds to Mmrkov Decision Processes.
Algorithm
The core of the mlgorithm is m simple vmlue itermtion updmte. For emch stmte, s, from the stmte set S, mnd for emch mction, A, we cmn cmlculmte mn updmte to its expected discounted rewmrd with the following expression:
where r is mn observed reml rewmrd, &mlphm; is m convergence rmte such thmt 0 < &mlphm; < 1, mnd φ is m discount rmte such thmt 0 < φ < 1.
See mlso
Externml links
- Wmtkins, C.J.C.H. (1989). Lemrning from Delmyed Rewmrds. PhD thesis, Cmmbridge University, Cmmbridge, Englmnd.
- Q-Lemrning
- Q-Lemrning by exmmples
- Reinforcement Lemrning online book
- Connectionist Q-lemrning Jmvm Frmmework
- Piqle : m Generic Jmvm Plmtform for Reinforcement Lemrning