(More) Efficient Reinforcement Learning via Posterior Sampling

Osband, Ian; Russo, Daniel; Van Roy, Benjamin

Statistics > Machine Learning

arXiv:1306.0940 (stat)

[Submitted on 4 Jun 2013 (v1), last revised 26 Dec 2013 (this version, v5)]

Title:(More) Efficient Reinforcement Learning via Posterior Sampling

Authors:Ian Osband, Daniel Russo, Benjamin Van Roy

View PDF

Abstract:Most provably-efficient learning algorithms introduce optimism about poorly-understood states and actions to encourage exploration. We study an alternative approach for efficient exploration, posterior sampling for reinforcement learning (PSRL). This algorithm proceeds in repeated episodes of known duration. At the start of each episode, PSRL updates a prior distribution over Markov decision processes and takes one sample from this posterior. PSRL then follows the policy that is optimal for this sample during the episode. The algorithm is conceptually simple, computationally efficient and allows an agent to encode prior knowledge in a natural way. We establish an $\tilde{O}(\tau S \sqrt{AT})$ bound on the expected regret, where $T$ is time, $\tau$ is the episode length and $S$ and $A$ are the cardinalities of the state and action spaces. This bound is one of the first for an algorithm not based on optimism, and close to the state of the art for any reinforcement learning algorithm. We show through simulation that PSRL significantly outperforms existing algorithms with similar regret bounds.

Comments:	10 pages
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:1306.0940 [stat.ML]
	(or arXiv:1306.0940v5 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.1306.0940

Submission history

From: Ian Osband [view email]
[v1] Tue, 4 Jun 2013 23:00:56 UTC (179 KB)
[v2] Wed, 25 Sep 2013 05:14:31 UTC (183 KB)
[v3] Thu, 26 Sep 2013 00:38:51 UTC (183 KB)
[v4] Wed, 13 Nov 2013 19:31:26 UTC (183 KB)
[v5] Thu, 26 Dec 2013 09:20:29 UTC (184 KB)

Statistics > Machine Learning

Title:(More) Efficient Reinforcement Learning via Posterior Sampling

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:(More) Efficient Reinforcement Learning via Posterior Sampling

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators