Expected Sarsa($\lambda$) with Control Variate for Variance Reduction

Yang, Long; Zhang, Yu; Wen, Jun; Zheng, Qian; Li, Pengfei; Pan, Gang

Computer Science > Machine Learning

arXiv:1906.11058 (cs)

[Submitted on 25 Jun 2019 (v1), last revised 6 Sep 2019 (this version, v2)]

Title:Expected Sarsa($λ$) with Control Variate for Variance Reduction

Authors:Long Yang, Yu Zhang, Jun Wen, Qian Zheng, Pengfei Li, Gang Pan

View PDF

Abstract:Off-policy learning is powerful for reinforcement learning. However, the high variance of off-policy evaluation is a critical challenge, which causes off-policy learning falls into an uncontrolled instability. In this paper, for reducing the variance, we introduce control variate technique to $\mathtt{Expected}$ $\mathtt{Sarsa}$($\lambda$) and propose a tabular $\mathtt{ES}$($\lambda$)-$\mathtt{CV}$ algorithm. We prove that if a proper estimator of value function reaches, the proposed $\mathtt{ES}$($\lambda$)-$\mathtt{CV}$ enjoys a lower variance than $\mathtt{Expected}$ $\mathtt{Sarsa}$($\lambda$). Furthermore, to extend $\mathtt{ES}$($\lambda$)-$\mathtt{CV}$ to be a convergent algorithm with linear function approximation, we propose the $\mathtt{GES}$($\lambda$) algorithm under the convex-concave saddle-point formulation. We prove that the convergence rate of $\mathtt{GES}$($\lambda$) achieves $\mathcal{O}(1/T)$, which matches or outperforms lots of state-of-art gradient-based algorithms, but we use a more relaxed condition. Numerical experiments show that the proposed algorithm performs better with lower variance than several state-of-art gradient-based TD learning algorithms: $\mathtt{GQ}$($\lambda$), $\mathtt{GTB}$($\lambda$) and $\mathtt{ABQ}$($\zeta$).

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:1906.11058 [cs.LG]
	(or arXiv:1906.11058v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1906.11058

Submission history

From: Long Yang [view email]
[v1] Tue, 25 Jun 2019 11:35:44 UTC (421 KB)
[v2] Fri, 6 Sep 2019 12:38:02 UTC (449 KB)

Computer Science > Machine Learning

Title:Expected Sarsa($λ$) with Control Variate for Variance Reduction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Expected Sarsa($λ$) with Control Variate for Variance Reduction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators