Smoothed Q-learning

Barber, David

Computer Science > Machine Learning

arXiv:2303.08631v1 (cs)

[Submitted on 15 Mar 2023]

Title:Smoothed Q-learning

Authors:David Barber

View PDF

Abstract:In Reinforcement Learning the Q-learning algorithm provably converges to the optimal solution. However, as others have demonstrated, Q-learning can also overestimate the values and thereby spend too long exploring unhelpful states. Double Q-learning is a provably convergent alternative that mitigates some of the overestimation issues, though sometimes at the expense of slower convergence. We introduce an alternative algorithm that replaces the max operation with an average, resulting also in a provably convergent off-policy algorithm which can mitigate overestimation yet retain similar convergence as standard Q-learning.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2303.08631 [cs.LG]
	(or arXiv:2303.08631v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2303.08631

Submission history

From: David Barber [view email]
[v1] Wed, 15 Mar 2023 13:58:07 UTC (188 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2023-03

Change to browse by:

References & Citations

export BibTeX citation

Computer Science > Machine Learning

Title:Smoothed Q-learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Smoothed Q-learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators