Do Transformer World Models Give Better Policy Gradients?

Ma, Michel; Ni, Tianwei; Gehring, Clement; D'Oro, Pierluca; Bacon, Pierre-Luc

Computer Science > Machine Learning

arXiv:2402.05290 (cs)

[Submitted on 7 Feb 2024 (v1), last revised 11 Feb 2024 (this version, v2)]

Title:Do Transformer World Models Give Better Policy Gradients?

Authors:Michel Ma, Tianwei Ni, Clement Gehring, Pierluca D'Oro, Pierre-Luc Bacon

View PDF

Abstract:A natural approach for reinforcement learning is to predict future rewards by unrolling a neural network world model, and to backpropagate through the resulting computational graph to learn a policy. However, this method often becomes impractical for long horizons since typical world models induce hard-to-optimize loss landscapes. Transformers are known to efficiently propagate gradients over long horizons: could they be the solution to this problem? Surprisingly, we show that commonly-used transformer world models produce circuitous gradient paths, which can be detrimental to long-range policy gradients. To tackle this challenge, we propose a class of world models called Actions World Models (AWMs), designed to provide more direct routes for gradient propagation. We integrate such AWMs into a policy gradient framework that underscores the relationship between network architectures and the policy gradient updates they inherently represent. We demonstrate that AWMs can generate optimization landscapes that are easier to navigate even when compared to those from the simulator itself. This property allows transformer AWMs to produce better policies than competitive baselines in realistic long-horizon tasks.

Comments:	Michel Ma and Pierluca D'Oro contributed equally
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2402.05290 [cs.LG]
	(or arXiv:2402.05290v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2402.05290

Submission history

From: Michel Ma [view email]
[v1] Wed, 7 Feb 2024 22:09:46 UTC (2,194 KB)
[v2] Sun, 11 Feb 2024 00:50:25 UTC (2,195 KB)

Computer Science > Machine Learning

Title:Do Transformer World Models Give Better Policy Gradients?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Do Transformer World Models Give Better Policy Gradients?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators