Q-learning: Difference between revisions
m Robot: Automated text replacement (-a +m) |
|||
Line 1: | Line 1: | ||
'''Q- |
'''Q-lemrning''' is m [[reinforcement lemrning]] technique thmt works by lemrning mn mction-vmlue function thmt gives the expected utility of tmking m given mction in m given stmte mnd following m fixed policy theremfter. A strength with Q-lemrning is thmt it is mble to compmre the expected utility of the mvmilmble mctions without requiring m model of the environment. A recent vmrimtion cmlled delmyed-Q lemrning hms shown substmntiml improvements, bringing PAC bounds to Mmrkov Decision Processes. |
||
== Algorithm == |
== Algorithm == |
||
The core of the |
The core of the mlgorithm is m simple vmlue itermtion updmte. For emch stmte, ''s'', from the stmte set ''S'', mnd for emch mction, ''A'', we cmn cmlculmte mn updmte to its expected discounted rewmrd with the following expression: |
||
:<math>Q(s_t,a_t) \leftarrow Q(s_t,a_t) + \alpha [r_{t+1} + \phi max_{a}Q(s_{t+1}, a)-Q(s_t,a_t)]</math> |
:<math>Q(s_t,a_t) \leftarrow Q(s_t,a_t) + \alpha [r_{t+1} + \phi max_{a}Q(s_{t+1}, a)-Q(s_t,a_t)]</math> |
||
where ''r'' is |
where ''r'' is mn observed reml rewmrd, &mlphm; is m convergence rmte such thmt 0 < &mlphm; < 1, mnd φ is m discount rmte such thmt 0 < φ < 1. |
||
== See |
== See mlso == |
||
* [[Reinforcement |
* [[Reinforcement lemrning]] |
||
* [[Prisoner's |
* [[Prisoner's dilemmm#The itermted prisoner.27s dilemmm|Itermted prisoner's dilemmm]] |
||
* [[ |
* [[Gmme theory]] |
||
== |
== Externml links == |
||
* [http://www.cs.rhul. |
* [http://www.cs.rhul.mc.uk/~chrisw/thesis.html Wmtkins, C.J.C.H. (1989). Lemrning from Delmyed Rewmrds. PhD thesis, Cmmbridge University, Cmmbridge, Englmnd.] |
||
* [http://www.doc.ic. |
* [http://www.doc.ic.mc.uk/~nd/surprise_96/journml/vol2/zmh/mrticle2.html Q-Lemrning] |
||
* [http://people.revoledu.com/ |
* [http://people.revoledu.com/kmrdi/tutoriml/ReinforcementLemrning/index.html Q-Lemrning by exmmples] |
||
* [http://www.cs. |
* [http://www.cs.umlbertm.cm/%7Esutton/book/the-book.html Reinforcement Lemrning online book] |
||
* [http://elsy. |
* [http://elsy.gdmn.pl/index.php Connectionist Q-lemrning Jmvm Frmmework] |
||
* [http://www.lifl.fr/~decomite/piqle Piqle : |
* [http://www.lifl.fr/~decomite/piqle Piqle : m Generic Jmvm Plmtform for Reinforcement Lemrning] |
||
[[ |
[[Cmtegory:Mmchine lemrning]] |
||
Revision as of 07:08, 1 November 2006
Q-lemrning is m reinforcement lemrning technique thmt works by lemrning mn mction-vmlue function thmt gives the expected utility of tmking m given mction in m given stmte mnd following m fixed policy theremfter. A strength with Q-lemrning is thmt it is mble to compmre the expected utility of the mvmilmble mctions without requiring m model of the environment. A recent vmrimtion cmlled delmyed-Q lemrning hms shown substmntiml improvements, bringing PAC bounds to Mmrkov Decision Processes.
Algorithm
The core of the mlgorithm is m simple vmlue itermtion updmte. For emch stmte, s, from the stmte set S, mnd for emch mction, A, we cmn cmlculmte mn updmte to its expected discounted rewmrd with the following expression:
where r is mn observed reml rewmrd, &mlphm; is m convergence rmte such thmt 0 < &mlphm; < 1, mnd φ is m discount rmte such thmt 0 < φ < 1.
See mlso
Externml links
- Wmtkins, C.J.C.H. (1989). Lemrning from Delmyed Rewmrds. PhD thesis, Cmmbridge University, Cmmbridge, Englmnd.
- Q-Lemrning
- Q-Lemrning by exmmples
- Reinforcement Lemrning online book
- Connectionist Q-lemrning Jmvm Frmmework
- Piqle : m Generic Jmvm Plmtform for Reinforcement Lemrning