Near-Optimal Randomized Exploration for Tabular Markov Decision Processes

Xiong, Zhihan; Shen, Ruoqi; Cui, Qiwen; Fazel, Maryam; Du, Simon S.

Computer Science > Machine Learning

arXiv:2102.09703 (cs)

[Submitted on 19 Feb 2021 (v1), last revised 13 Oct 2022 (this version, v5)]

Title:Near-Optimal Randomized Exploration for Tabular Markov Decision Processes

Authors:Zhihan Xiong, Ruoqi Shen, Qiwen Cui, Maryam Fazel, Simon S. Du

View PDF

Abstract:We study algorithms using randomized value functions for exploration in reinforcement learning. This type of algorithms enjoys appealing empirical performance. We show that when we use 1) a single random seed in each episode, and 2) a Bernstein-type magnitude of noise, we obtain a worst-case $\widetilde{O}\left(H\sqrt{SAT}\right)$ regret bound for episodic time-inhomogeneous Markov Decision Process where $S$ is the size of state space, $A$ is the size of action space, $H$ is the planning horizon and $T$ is the number of interactions. This bound polynomially improves all existing bounds for algorithms based on randomized value functions, and for the first time, matches the $\Omega\left(H\sqrt{SAT}\right)$ lower bound up to logarithmic factors. Our result highlights that randomized exploration can be near-optimal, which was previously achieved only by optimistic algorithms. To achieve the desired result, we develop 1) a new clipping operation to ensure both the probability of being optimistic and the probability of being pessimistic are lower bounded by a constant, and 2) a new recursive formula for the absolute value of estimation errors to analyze the regret.

Comments:	41 pages, 3 figures, Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022)
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2102.09703 [cs.LG]
	(or arXiv:2102.09703v5 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2102.09703

Submission history

From: Zhihan Xiong [view email]
[v1] Fri, 19 Feb 2021 01:42:50 UTC (69 KB)
[v2] Thu, 3 Jun 2021 04:07:14 UTC (1 KB) (withdrawn)
[v3] Mon, 8 Nov 2021 05:45:10 UTC (81 KB)
[v4] Tue, 14 Jun 2022 08:05:54 UTC (78 KB)
[v5] Thu, 13 Oct 2022 01:13:42 UTC (908 KB)

Computer Science > Machine Learning

Title:Near-Optimal Randomized Exploration for Tabular Markov Decision Processes

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Near-Optimal Randomized Exploration for Tabular Markov Decision Processes

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators