A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning

Dong Ki Kim; Miao Liu; Matthew D Riemer; Chuangchuang Sun; Marwa Abdulhai; Golnaz Habibi; Sebastian Lopez-Cot; Gerald Tesauro; Jonathan How

A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning

Dong Ki Kim, Miao Liu, Matthew D Riemer, Chuangchuang Sun, Marwa Abdulhai, Golnaz Habibi, Sebastian Lopez-Cot, Gerald Tesauro, Jonathan How

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:5541-5550, 2021.

Abstract

A fundamental challenge in multiagent reinforcement learning is to learn beneficial behaviors in a shared environment with other simultaneously learning agents. In particular, each agent perceives the environment as effectively non-stationary due to the changing policies of other agents. Moreover, each agent is itself constantly learning, leading to natural non-stationarity in the distribution of experiences encountered. In this paper, we propose a novel meta-multiagent policy gradient theorem that directly accounts for the non-stationary policy dynamics inherent to multiagent learning settings. This is achieved by modeling our gradient updates to consider both an agent’s own non-stationary policy dynamics and the non-stationary policy dynamics of other agents in the environment. We show that our theoretically grounded approach provides a general solution to the multiagent learning problem, which inherently comprises all key aspects of previous state of the art approaches on this topic. We test our method on a diverse suite of multiagent benchmarks and demonstrate a more efficient ability to adapt to new agents as they learn than baseline methods across the full spectrum of mixed incentive, competitive, and cooperative domains.

Cite this Paper

BibTeX


@InProceedings{pmlr-v139-kim21g,
  title = 	 {A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning},
  author =       {Kim, Dong Ki and Liu, Miao and Riemer, Matthew D and Sun, Chuangchuang and Abdulhai, Marwa and Habibi, Golnaz and Lopez-Cot, Sebastian and Tesauro, Gerald and How, Jonathan},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {5541--5550},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/kim21g/kim21g.pdf},
  url = 	 {https://proceedings.mlr.press/v139/kim21g.html},
  abstract = 	 {A fundamental challenge in multiagent reinforcement learning is to learn beneficial behaviors in a shared environment with other simultaneously learning agents. In particular, each agent perceives the environment as effectively non-stationary due to the changing policies of other agents. Moreover, each agent is itself constantly learning, leading to natural non-stationarity in the distribution of experiences encountered. In this paper, we propose a novel meta-multiagent policy gradient theorem that directly accounts for the non-stationary policy dynamics inherent to multiagent learning settings. This is achieved by modeling our gradient updates to consider both an agent’s own non-stationary policy dynamics and the non-stationary policy dynamics of other agents in the environment. We show that our theoretically grounded approach provides a general solution to the multiagent learning problem, which inherently comprises all key aspects of previous state of the art approaches on this topic. We test our method on a diverse suite of multiagent benchmarks and demonstrate a more efficient ability to adapt to new agents as they learn than baseline methods across the full spectrum of mixed incentive, competitive, and cooperative domains.}
}

Endnote

%0 Conference Paper
%T A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning
%A Dong Ki Kim
%A Miao Liu
%A Matthew D Riemer
%A Chuangchuang Sun
%A Marwa Abdulhai
%A Golnaz Habibi
%A Sebastian Lopez-Cot
%A Gerald Tesauro
%A Jonathan How
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-kim21g
%I PMLR
%P 5541--5550
%U https://proceedings.mlr.press/v139/kim21g.html
%V 139
%X A fundamental challenge in multiagent reinforcement learning is to learn beneficial behaviors in a shared environment with other simultaneously learning agents. In particular, each agent perceives the environment as effectively non-stationary due to the changing policies of other agents. Moreover, each agent is itself constantly learning, leading to natural non-stationarity in the distribution of experiences encountered. In this paper, we propose a novel meta-multiagent policy gradient theorem that directly accounts for the non-stationary policy dynamics inherent to multiagent learning settings. This is achieved by modeling our gradient updates to consider both an agent’s own non-stationary policy dynamics and the non-stationary policy dynamics of other agents in the environment. We show that our theoretically grounded approach provides a general solution to the multiagent learning problem, which inherently comprises all key aspects of previous state of the art approaches on this topic. We test our method on a diverse suite of multiagent benchmarks and demonstrate a more efficient ability to adapt to new agents as they learn than baseline methods across the full spectrum of mixed incentive, competitive, and cooperative domains.

APA


Kim, D.K., Liu, M., Riemer, M.D., Sun, C., Abdulhai, M., Habibi, G., Lopez-Cot, S., Tesauro, G. & How, J.. (2021). A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:5541-5550 Available from https://proceedings.mlr.press/v139/kim21g.html.

A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning

Abstract

Cite this Paper

Related Material