Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity

Shi, Laixi; Li, Gen; Wei, Yuting; Chen, Yuxin; Chi, Yuejie

Computer Science > Machine Learning

arXiv:2202.13890 (cs)

[Submitted on 28 Feb 2022 (v1), last revised 10 Jun 2022 (this version, v2)]

Title:Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity

Authors:Laixi Shi, Gen Li, Yuting Wei, Yuxin Chen, Yuejie Chi

View PDF

Abstract:Offline or batch reinforcement learning seeks to learn a near-optimal policy using history data without active exploration of the environment. To counter the insufficient coverage and sample scarcity of many offline datasets, the principle of pessimism has been recently introduced to mitigate high bias of the estimated values. While pessimistic variants of model-based algorithms (e.g., value iteration with lower confidence bounds) have been theoretically investigated, their model-free counterparts -- which do not require explicit model estimation -- have not been adequately studied, especially in terms of sample efficiency. To address this inadequacy, we study a pessimistic variant of Q-learning in the context of finite-horizon Markov decision processes, and characterize its sample complexity under the single-policy concentrability assumption which does not require the full coverage of the state-action space. In addition, a variance-reduced pessimistic Q-learning algorithm is proposed to achieve near-optimal sample complexity. Altogether, this work highlights the efficiency of model-free algorithms in offline RL when used in conjunction with pessimism and variance reduction.

Comments:	International Conference on Machine Learning (ICML), 2022
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2202.13890 [cs.LG]
	(or arXiv:2202.13890v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2202.13890

Submission history

From: Laixi Shi [view email]
[v1] Mon, 28 Feb 2022 15:39:36 UTC (403 KB)
[v2] Fri, 10 Jun 2022 21:36:10 UTC (404 KB)

Computer Science > Machine Learning

Title:Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators