The Uncertainty Bellman Equation and Exploration

O'Donoghue, Brendan; Osband, Ian; Munos, Remi; Mnih, Volodymyr

Computer Science > Artificial Intelligence

arXiv:1709.05380 (cs)

[Submitted on 15 Sep 2017 (v1), last revised 22 Oct 2018 (this version, v4)]

Title:The Uncertainty Bellman Equation and Exploration

Authors:Brendan O'Donoghue, Ian Osband, Remi Munos, Volodymyr Mnih

View PDF

Abstract:We consider the exploration/exploitation problem in reinforcement learning. For exploitation, it is well known that the Bellman equation connects the value at any time-step to the expected value at subsequent time-steps. In this paper we consider a similar \textit{uncertainty} Bellman equation (UBE), which connects the uncertainty at any time-step to the expected uncertainties at subsequent time-steps, thereby extending the potential exploratory benefit of a policy beyond individual time-steps. We prove that the unique fixed point of the UBE yields an upper bound on the variance of the posterior distribution of the Q-values induced by any policy. This bound can be much tighter than traditional count-based bonuses that compound standard deviation rather than variance. Importantly, and unlike several existing approaches to optimism, this method scales naturally to large systems with complex generalization. Substituting our UBE-exploration strategy for $\epsilon$-greedy improves DQN performance on 51 out of 57 games in the Atari suite.

Subjects:	Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Cite as:	arXiv:1709.05380 [cs.AI]
	(or arXiv:1709.05380v4 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.1709.05380

Submission history

From: Brendan O'Donoghue [view email]
[v1] Fri, 15 Sep 2017 19:55:58 UTC (3,471 KB)
[v2] Fri, 8 Jun 2018 11:47:43 UTC (340 KB)
[v3] Tue, 19 Jun 2018 15:42:36 UTC (340 KB)
[v4] Mon, 22 Oct 2018 15:25:04 UTC (372 KB)

Computer Science > Artificial Intelligence

Title:The Uncertainty Bellman Equation and Exploration

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:The Uncertainty Bellman Equation and Exploration

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators