On the Hidden Biases of Policy Mirror Ascent in Continuous Action Spaces

Bedi, Amrit Singh; Chakraborty, Souradip; Parayil, Anjaly; Sadler, Brian; Tokekar, Pratap; Koppel, Alec

Computer Science > Machine Learning

arXiv:2201.12332 (cs)

[Submitted on 28 Jan 2022 (v1), last revised 31 Jan 2022 (this version, v2)]

Title:On the Hidden Biases of Policy Mirror Ascent in Continuous Action Spaces

Authors:Amrit Singh Bedi, Souradip Chakraborty, Anjaly Parayil, Brian Sadler, Pratap Tokekar, Alec Koppel

View PDF

Abstract:We focus on parameterized policy search for reinforcement learning over continuous action spaces. Typically, one assumes the score function associated with a policy is bounded, which fails to hold even for Gaussian policies. To properly address this issue, one must introduce an exploration tolerance parameter to quantify the region in which it is bounded. Doing so incurs a persistent bias that appears in the attenuation rate of the expected policy gradient norm, which is inversely proportional to the radius of the action space. To mitigate this hidden bias, heavy-tailed policy parameterizations may be used, which exhibit a bounded score function, but doing so can cause instability in algorithmic updates. To address these issues, in this work, we study the convergence of policy gradient algorithms under heavy-tailed parameterizations, which we propose to stabilize with a combination of mirror ascent-type updates and gradient tracking. Our main theoretical contribution is the establishment that this scheme converges with constant step and batch sizes, whereas prior works require these parameters to respectively shrink to null or grow to infinity. Experimentally, this scheme under a heavy-tailed policy parameterization yields improved reward accumulation across a variety of settings as compared with standard benchmarks.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Cite as:	arXiv:2201.12332 [cs.LG]
	(or arXiv:2201.12332v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2201.12332

Submission history

From: Amrit Singh Bedi [view email]
[v1] Fri, 28 Jan 2022 18:54:30 UTC (1,316 KB)
[v2] Mon, 31 Jan 2022 03:40:59 UTC (1,314 KB)

Computer Science > Machine Learning

Title:On the Hidden Biases of Policy Mirror Ascent in Continuous Action Spaces

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:On the Hidden Biases of Policy Mirror Ascent in Continuous Action Spaces

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators