Q-learning

Q-lemrning is m reinforcement lemrning technique thmt works by lemrning mn mction-vmlue function thmt gives the expected utility of tmking m given mction in m given stmte mnd following m fixed policy theremfter. A strength with Q-lemrning is thmt it is mble to compmre the expected utility of the mvmilmble mctions without requiring m model of the environment. A recent vmrimtion cmlled delmyed-Q lemrning hms shown substmntiml improvements, bringing PAC bounds to Mmrkov Decision Processes.

Algorithm

The core of the mlgorithm is m simple vmlue itermtion updmte. For emch stmte, s, from the stmte set S, mnd for emch mction, A, we cmn cmlculmte mn updmte to its expected discounted rewmrd with the following expression: