On Double Descent in Reinforcement Learning with LSTD and Random Features

Brellmann, David; Berthier, Eloïse; Filliat, David; Frehse, Goran

Computer Science > Machine Learning

arXiv:2310.05518 (cs)

[Submitted on 9 Oct 2023 (v1), last revised 18 Feb 2024 (this version, v4)]

Title:On Double Descent in Reinforcement Learning with LSTD and Random Features

Authors:David Brellmann, Eloïse Berthier, David Filliat, Goran Frehse

View PDF

Abstract:Temporal Difference (TD) algorithms are widely used in Deep Reinforcement Learning (RL). Their performance is heavily influenced by the size of the neural network. While in supervised learning, the regime of over-parameterization and its benefits are well understood, the situation in RL is much less clear. In this paper, we present a theoretical analysis of the influence of network size and $l_2$-regularization on performance. We identify the ratio between the number of parameters and the number of visited states as a crucial factor and define over-parameterization as the regime when it is larger than one. Furthermore, we observe a double descent phenomenon, i.e., a sudden drop in performance around the parameter/state ratio of one. Leveraging random features and the lazy training regime, we study the regularized Least-Square Temporal Difference (LSTD) algorithm in an asymptotic regime, as both the number of parameters and states go to infinity, maintaining a constant ratio. We derive deterministic limits of both the empirical and the true Mean-Squared Bellman Error (MSBE) that feature correction terms responsible for the double descent. Correction terms vanish when the $l_2$-regularization is increased or the number of unvisited states goes to zero. Numerical experiments with synthetic and small real-world environments closely match the theoretical predictions.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:2310.05518 [cs.LG]
	(or arXiv:2310.05518v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2310.05518

Submission history

From: David Brellmann [view email]
[v1] Mon, 9 Oct 2023 08:33:22 UTC (8,925 KB)
[v2] Fri, 20 Oct 2023 09:23:53 UTC (8,925 KB)
[v3] Wed, 29 Nov 2023 20:54:05 UTC (9,228 KB)
[v4] Sun, 18 Feb 2024 10:34:45 UTC (895 KB)

Computer Science > Machine Learning

Title:On Double Descent in Reinforcement Learning with LSTD and Random Features

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:On Double Descent in Reinforcement Learning with LSTD and Random Features

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators