Single-Trajectory Distributionally Robust Reinforcement Learning

Liang, Zhipeng; Ma, Xiaoteng; Blanchet, Jose; Zhang, Jiheng; Zhou, Zhengyuan

Statistics > Machine Learning

arXiv:2301.11721 (stat)

[Submitted on 27 Jan 2023 (v1), last revised 21 Sep 2024 (this version, v2)]

Title:Single-Trajectory Distributionally Robust Reinforcement Learning

Authors:Zhipeng Liang, Xiaoteng Ma, Jose Blanchet, Jiheng Zhang, Zhengyuan Zhou

View PDF HTML (experimental)

Abstract:To mitigate the limitation that the classical reinforcement learning (RL) framework heavily relies on identical training and test environments, Distributionally Robust RL (DRRL) has been proposed to enhance performance across a range of environments, possibly including unknown test environments. As a price for robustness gain, DRRL involves optimizing over a set of distributions, which is inherently more challenging than optimizing over a fixed distribution in the non-robust case. Existing DRRL algorithms are either model-based or fail to learn from a single sample trajectory. In this paper, we design a first fully model-free DRRL algorithm, called distributionally robust Q-learning with single trajectory (DRQ). We delicately design a multi-timescale framework to fully utilize each incrementally arriving sample and directly learn the optimal distributionally robust policy without modelling the environment, thus the algorithm can be trained along a single trajectory in a model-free fashion. Despite the algorithm's complexity, we provide asymptotic convergence guarantees by generalizing classical stochastic approximation tools. Comprehensive experimental results demonstrate the superior robustness and sample complexity of our proposed algorithm, compared to non-robust methods and other robust RL algorithms.

Comments:	First two authors contribute equally
Subjects:	Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2301.11721 [stat.ML]
	(or arXiv:2301.11721v2 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2301.11721

Submission history

From: Zhipeng Liang [view email]
[v1] Fri, 27 Jan 2023 14:08:09 UTC (189 KB)
[v2] Sat, 21 Sep 2024 15:32:03 UTC (5,082 KB)

Statistics > Machine Learning

Title:Single-Trajectory Distributionally Robust Reinforcement Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Single-Trajectory Distributionally Robust Reinforcement Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators