Shapley Q-value: A Local Reward Approach to Solve Global Reward Games

Wang, Jianhong; Zhang, Yuan; Kim, Tae-Kyun; Gu, Yunjie

doi:10.1609/aaai.v34i05.6220

Computer Science > Machine Learning

arXiv:1907.05707 (cs)

[Submitted on 11 Jul 2019 (v1), last revised 13 Oct 2022 (this version, v6)]

Title:Shapley Q-value: A Local Reward Approach to Solve Global Reward Games

Authors:Jianhong Wang, Yuan Zhang, Tae-Kyun Kim, Yunjie Gu

View PDF

Abstract:Cooperative game is a critical research area in the multi-agent reinforcement learning (MARL). Global reward game is a subclass of cooperative games, where all agents aim to maximize the global reward. Credit assignment is an important problem studied in the global reward game. Most of previous works stood by the view of non-cooperative-game theoretical framework with the shared reward approach, i.e., each agent being assigned a shared global reward directly. This, however, may give each agent an inaccurate reward on its contribution to the group, which could cause inefficient learning. To deal with this problem, we i) introduce a cooperative-game theoretical framework called extended convex game (ECG) that is a superset of global reward game, and ii) propose a local reward approach called Shapley Q-value. Shapley Q-value is able to distribute the global reward, reflecting each agent's own contribution in contrast to the shared reward approach. Moreover, we derive an MARL algorithm called Shapley Q-value deep deterministic policy gradient (SQDDPG), using Shapley Q-value as the critic for each agent. We evaluate SQDDPG on Cooperative Navigation, Prey-and-Predator and Traffic Junction, compared with the state-of-the-art algorithms, e.g., MADDPG, COMA, Independent DDPG and Independent A2C. In the experiments, SQDDPG shows a significant improvement on the convergence rate. Finally, we plot Shapley Q-value and validate the property of fair credit assignment.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Cite as:	arXiv:1907.05707 [cs.LG]
	(or arXiv:1907.05707v6 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1907.05707
Journal reference:	AAAI 2020
Related DOI:	https://doi.org/10.1609/aaai.v34i05.6220

Submission history

From: Jianhong Wang [view email]
[v1] Thu, 11 Jul 2019 15:12:33 UTC (1,336 KB)
[v2] Mon, 9 Sep 2019 09:25:00 UTC (1,242 KB)
[v3] Thu, 21 Nov 2019 21:19:38 UTC (1,298 KB)
[v4] Mon, 25 Nov 2019 11:26:06 UTC (1,298 KB)
[v5] Tue, 24 Nov 2020 17:03:53 UTC (1,298 KB)
[v6] Thu, 13 Oct 2022 08:54:11 UTC (1,298 KB)

Computer Science > Machine Learning

Title:Shapley Q-value: A Local Reward Approach to Solve Global Reward Games

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Shapley Q-value: A Local Reward Approach to Solve Global Reward Games

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators