On the near-optimality of local policies in large cooperative multi-agent reinforcement learning

WU Mondal, V Aggarwal, SV Ukkusuri - arXiv preprint arXiv:2209.03491, 2022 - arxiv.org
We show that in a cooperative $ N $-agent network, one can design locally executable
policies for the agents such that the resulting discounted sum of average rewards (value)
well approximates the optimal value computed over all (including non-local) policies.
Specifically, we prove that, if $|\mathcal {X}|,|\mathcal {U}| $ denote the size of state, and
action spaces of individual agents, then for sufficiently small discount factor, the
approximation error is given by $\mathcal {O}(e) $ where $ e\triangleq\frac {1}{\sqrt {N}}\left …

On the Near-Optimality of Local Policies in Large Cooperative Multi-Agent Reinforcement Learning

W Uddin Mondal, V Aggarwal, SV Ukkusuri - arXiv e-prints, 2022 - ui.adsabs.harvard.edu
We show that in a cooperative $ N $-agent network, one can design locally executable
policies for the agents such that the resulting discounted sum of average rewards (value)
well approximates the optimal value computed over all (including non-local) policies.
Specifically, we prove that, if $|\mathcal {X}|,|\mathcal {U}| $ denote the size of state, and
action spaces of individual agents, then for sufficiently small discount factor, the
approximation error is given by $\mathcal {O}(e) $ where $ e\triangleq\frac {1}{\sqrt {N}}\left …
Showing the best results for this search. See all results