Is Pessimism Provably Efficient for Offline RL?

Jin, Ying; Yang, Zhuoran; Wang, Zhaoran

Computer Science > Machine Learning

arXiv:2012.15085v1 (cs)

[Submitted on 30 Dec 2020 (this version), latest version 4 May 2022 (v3)]

Title:Is Pessimism Provably Efficient for Offline RL?

Authors:Ying Jin, Zhuoran Yang, Zhaoran Wang

View PDF

Abstract:We study offline reinforcement learning (RL), which aims to learn an optimal policy based on a dataset collected a priori. Due to the lack of further interactions with the environment, offline RL suffers from the insufficient coverage of the dataset, which eludes most existing theoretical analysis. In this paper, we propose a pessimistic variant of the value iteration algorithm (PEVI), which incorporates an uncertainty quantifier as the penalty function. Such a penalty function simply flips the sign of the bonus function for promoting exploration in online RL, which makes it easily implementable and compatible with general function approximators.
Without assuming the sufficient coverage of the dataset, we establish a data-dependent upper bound on the suboptimality of PEVI for general Markov decision processes (MDPs). When specialized to linear MDPs, it matches the information-theoretic lower bound up to multiplicative factors of the dimension and horizon. In other words, pessimism is not only provably efficient but also minimax optimal. In particular, given the dataset, the learned policy serves as the ``best effort'' among all policies, as no other policies can do better. Our theoretical analysis identifies the critical role of pessimism in eliminating a notion of spurious correlation, which emerges from the ``irrelevant'' trajectories that are less covered by the dataset and not informative for the optimal policy.

Comments:	53 pages, 3 figures
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Statistics Theory (math.ST); Machine Learning (stat.ML)
Cite as:	arXiv:2012.15085 [cs.LG]
	(or arXiv:2012.15085v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2012.15085

Submission history

From: Zhuoran Yang [view email]
[v1] Wed, 30 Dec 2020 09:06:57 UTC (77 KB)
[v2] Wed, 12 May 2021 15:05:39 UTC (81 KB)
[v3] Wed, 4 May 2022 22:30:48 UTC (491 KB)

Computer Science > Machine Learning

Title:Is Pessimism Provably Efficient for Offline RL?

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Is Pessimism Provably Efficient for Offline RL?

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators