Reinforcement Learning in a Safety-Embedded MDP with Trajectory Optimization

Yang, Fan; Zhou, Wenxuan; Liu, Zuxin; Zhao, Ding; Held, David

Computer Science > Robotics

arXiv:2310.06903 (cs)

[Submitted on 10 Oct 2023 (v1), last revised 14 Jul 2024 (this version, v2)]

Title:Reinforcement Learning in a Safety-Embedded MDP with Trajectory Optimization

Authors:Fan Yang, Wenxuan Zhou, Zuxin Liu, Ding Zhao, David Held

View PDF HTML (experimental)

Abstract:Safe Reinforcement Learning (RL) plays an important role in applying RL algorithms to safety-critical real-world applications, addressing the trade-off between maximizing rewards and adhering to safety constraints. This work introduces a novel approach that combines RL with trajectory optimization to manage this trade-off effectively. Our approach embeds safety constraints within the action space of a modified Markov Decision Process (MDP). The RL agent produces a sequence of actions that are transformed into safe trajectories by a trajectory optimizer, thereby effectively ensuring safety and increasing training stability. This novel approach excels in its performance on challenging Safety Gym tasks, achieving significantly higher rewards and near-zero safety violations during inference. The method's real-world applicability is demonstrated through a safe and effective deployment in a real robot task of box-pushing around obstacles.

Subjects:	Robotics (cs.RO); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2310.06903 [cs.RO]
	(or arXiv:2310.06903v2 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2310.06903

Submission history

From: Fan Yang [view email]
[v1] Tue, 10 Oct 2023 18:01:16 UTC (12,779 KB)
[v2] Sun, 14 Jul 2024 15:56:37 UTC (17,484 KB)

Computer Science > Robotics

Title:Reinforcement Learning in a Safety-Embedded MDP with Trajectory Optimization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:Reinforcement Learning in a Safety-Embedded MDP with Trajectory Optimization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators