Doubly Robust Off-Policy Actor-Critic Algorithms for Reinforcement Learning

Islam, Riashat; Seraj, Raihan; Arnob, Samin Yeasar; Precup, Doina

Computer Science > Machine Learning

arXiv:1912.05109 (cs)

[Submitted on 11 Dec 2019]

Title:Doubly Robust Off-Policy Actor-Critic Algorithms for Reinforcement Learning

Authors:Riashat Islam, Raihan Seraj, Samin Yeasar Arnob, Doina Precup

View PDF

Abstract:We study the problem of off-policy critic evaluation in several variants of value-based off-policy actor-critic algorithms. Off-policy actor-critic algorithms require an off-policy critic evaluation step, to estimate the value of the new policy after every policy gradient update. Despite enormous success of off-policy policy gradients on control tasks, existing general methods suffer from high variance and instability, partly because the policy improvement depends on gradient of the estimated value function. In this work, we present a new way of off-policy policy evaluation in actor-critic, based on the doubly robust estimators. We extend the doubly robust estimator from off-policy policy evaluation (OPE) to actor-critic algorithms that consist of a reward estimator performance model. We find that doubly robust estimation of the critic can significantly improve performance in continuous control tasks. Furthermore, in cases where the reward function is stochastic that can lead to high variance, doubly robust critic estimation can improve performance under corrupted, stochastic reward signals, indicating its usefulness for robust and safe reinforcement learning.

Comments:	In Submission; Appeared at NeurIPS 2019 Workshop on Safety and Robustness in Decision Making
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1912.05109 [cs.LG]
	(or arXiv:1912.05109v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1912.05109

Submission history

From: Riashat Islam [view email]
[v1] Wed, 11 Dec 2019 04:21:47 UTC (350 KB)

Computer Science > Machine Learning

Title:Doubly Robust Off-Policy Actor-Critic Algorithms for Reinforcement Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Doubly Robust Off-Policy Actor-Critic Algorithms for Reinforcement Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators