×
In COMDPs, the competing agents (players) interact with each other within the environment, and through their interactions, learn how to develop their behavior and improve their policy.
Jun 18, 2020 · We propose competitive policy optimization (CoPO), a novel policy gradient approach that exploits the game-theoretic nature of competitive games to derive ...
We propose a paradigm called competitive policy optimization, which leads to two algorithms competitive policy gradient and trust-region competitive policy ...
People also ask
Nov 6, 2020 · To tackle this, we propose competitive policy optimization (CoPO), a novel policy gradient approach that exploits the game-theoretic nature of ...
We instantiate COPO in two ways: (i) competitive policy gradient, and (ii) trust-region competitive policy optimization. We theoretically study these ...
This work proposes competitive policy optimization (CoPO), a novel policy gradient approach that exploits the game-theoretic nature of competitive games to ...
To tackle this, we propose competitive policy optimization (CoPO), a novel policy gradient approach that exploits the game-theoretic nature of competitive games ...
Jun 18, 2020 · In CoPO, each player optimizes strategy by considering the interaction with the environment and the opponent through game theoretic bilinear ...
Policy optimization provably converges to Nash equilibria in zero-sum linear quadratic games. In Advances in Neural Information. Processing Systems, pages ...
This repository contains all code and experiments for competitive policy gradient (CoPG) algorithm. The paper for competitive policy gradient can be found here.