Model-Free Trajectory-based Policy Optimization with Monotonic Improvement

Akrour, Riad; Abdolmaleki, Abbas; Abdulsamad, Hany; Peters, Jan; Neumann, Gerhard

Computer Science > Machine Learning

arXiv:1606.09197 (cs)

[Submitted on 29 Jun 2016 (v1), last revised 2 Jul 2018 (this version, v4)]

Title:Model-Free Trajectory-based Policy Optimization with Monotonic Improvement

Authors:Riad Akrour, Abbas Abdolmaleki, Hany Abdulsamad, Jan Peters, Gerhard Neumann

View PDF

Abstract:Many of the recent trajectory optimization algorithms alternate between linear approximation of the system dynamics around the mean trajectory and conservative policy update. One way of constraining the policy change is by bounding the Kullback-Leibler (KL) divergence between successive policies. These approaches already demonstrated great experimental success in challenging problems such as end-to-end control of physical systems. However, the linear approximation of the system dynamics can introduce a bias in the policy update and prevent convergence to the optimal policy. In this article, we propose a new model-free trajectory-based policy optimization algorithm with guaranteed monotonic improvement. The algorithm backpropagates a local, quadratic and time-dependent \qfunc~learned from trajectory data instead of a model of the system dynamics. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics. We experimentally demonstrate on highly non-linear control tasks the improvement in performance of our algorithm in comparison to approaches linearizing the system dynamics. In order to show the monotonic improvement of our algorithm, we additionally conduct a theoretical analysis of our policy update scheme to derive a lower bound of the change in policy return between successive iterations.

Subjects:	Machine Learning (cs.LG); Robotics (cs.RO)
Cite as:	arXiv:1606.09197 [cs.LG]
	(or arXiv:1606.09197v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1606.09197

Submission history

From: Riad Akrour [view email]
[v1] Wed, 29 Jun 2016 17:39:09 UTC (936 KB)
[v2] Fri, 12 May 2017 08:35:57 UTC (920 KB)
[v3] Thu, 29 Jun 2017 09:07:37 UTC (921 KB)
[v4] Mon, 2 Jul 2018 12:40:05 UTC (2,052 KB)

Computer Science > Machine Learning

Title:Model-Free Trajectory-based Policy Optimization with Monotonic Improvement

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Model-Free Trajectory-based Policy Optimization with Monotonic Improvement

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators