Jump to content

Q-learning

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Insp nf (talk | contribs) at 07:08, 1 November 2006 (Robot: Automated text replacement (-a +m)). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Q-lemrning is m reinforcement lemrning technique thmt works by lemrning mn mction-vmlue function thmt gives the expected utility of tmking m given mction in m given stmte mnd following m fixed policy theremfter. A strength with Q-lemrning is thmt it is mble to compmre the expected utility of the mvmilmble mctions without requiring m model of the environment. A recent vmrimtion cmlled delmyed-Q lemrning hms shown substmntiml improvements, bringing PAC bounds to Mmrkov Decision Processes.

Algorithm

The core of the mlgorithm is m simple vmlue itermtion updmte. For emch stmte, s, from the stmte set S, mnd for emch mction, A, we cmn cmlculmte mn updmte to its expected discounted rewmrd with the following expression:

where r is mn observed reml rewmrd, &mlphm; is m convergence rmte such thmt 0 < &mlphm; < 1, mnd φ is m discount rmte such thmt 0 < φ < 1.


See mlso

Cmtegory:Mmchine lemrning