Prioritized Soft Q-Decomposition for Lexicographic Reinforcement Learning

doi:10.48550/arXiv.2310.02360

Prioritized Soft Q-Decomposition for Lexicographic Reinforcement Learning

Reinforcement learning (RL) for complex tasks remains a challenge, primarily due to the difficulties of engineering scalar reward functions and the inherent inefficiency of training models from scratch. Instead, it would be better to specify complex tasks in terms of elementary subtasks and to reuse subtask solutions whenever possible. In this work, we address continuous space lexicographic multi-objective RL problems, consisting of prioritized subtasks, which are notoriously difficult to solve. We show that these can be scalarized with a subtask transformation and then solved incrementally using value decomposition. Exploiting this insight, we propose prioritized soft Q-decomposition (PSQD), a novel algorithm for learning and adapting subtask solutions under lexicographic priorities in continuous state-action spaces. PSQD offers the ability to reuse previously learned subtask solutions in a zero-shot composition, followed by an adaptation step. Its ability to use retained subtask training data for offline learning eliminates the need for new environment interaction during adaptation. We demonstrate the efficacy of our approach by presenting successful learning, reuse, and adaptation results for both low- and high-dimensional simulated robot control tasks, as well as offline learning results. In contrast to baseline approaches, PSQD does not trade off between conflicting subtasks or priority constraints and satisfies subtask priorities during learning. PSQD provides an intuitive framework for tackling complex RL problems, offering insights into the inner workings of the subtask composition.

Publication:

arXiv e-prints

Pub Date:

October 2023

DOI:

10.48550/arXiv.2310.02360

arXiv:

arXiv:2310.02360

Bibcode:

2023arXiv231002360R

Keywords:

Computer Science - Artificial Intelligence

E-Print:

Camera ready version

NASA/ADS

Prioritized Soft Q-Decomposition for Lexicographic Reinforcement Learning

Abstract