Regret Minimization for Partially Observable Deep Reinforcement Learning

Jin, Peter; Keutzer, Kurt; Levine, Sergey

Computer Science > Machine Learning

arXiv:1710.11424 (cs)

[Submitted on 31 Oct 2017 (v1), last revised 25 Oct 2018 (this version, v2)]

Title:Regret Minimization for Partially Observable Deep Reinforcement Learning

Authors:Peter Jin, Kurt Keutzer, Sergey Levine

View PDF

Abstract:Deep reinforcement learning algorithms that estimate state and state-action value functions have been shown to be effective in a variety of challenging domains, including learning control strategies from raw image pixels. However, algorithms that estimate state and state-action value functions typically assume a fully observed state and must compensate for partial observations by using finite length observation histories or recurrent networks. In this work, we propose a new deep reinforcement learning algorithm based on counterfactual regret minimization that iteratively updates an approximation to an advantage-like function and is robust to partially observed state. We demonstrate that this new algorithm can substantially outperform strong baseline methods on several partially observed reinforcement learning tasks: learning first-person 3D navigation in Doom and Minecraft, and acting in the presence of partially observed objects in Doom and Pong.

Comments:	ICML 2018
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:1710.11424 [cs.LG]
	(or arXiv:1710.11424v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1710.11424

Submission history

From: Peter Jin [view email]
[v1] Tue, 31 Oct 2017 12:15:38 UTC (450 KB)
[v2] Thu, 25 Oct 2018 00:58:42 UTC (3,236 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2017-10

Change to browse by:

cs
cs.AI

References & Citations

DBLP - CS Bibliography

listing | bibtex

Peter H. Jin
Sergey Levine
Kurt Keutzer

export BibTeX citation

Computer Science > Machine Learning

Title:Regret Minimization for Partially Observable Deep Reinforcement Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Regret Minimization for Partially Observable Deep Reinforcement Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators