Model-Free Learning of Optimal Ergodic Policies in Wireless Systems

Kalogerias, Dionysios S.; Eisen, Mark; Pappas, George J.; Ribeiro, Alejandro

doi:10.1109/TSP.2020.3030073

Abstract:Learning optimal resource allocation policies in wireless systems can be effectively achieved by formulating finite dimensional constrained programs which depend on system configuration, as well as the adopted learning parameterization. The interest here is in cases where system models are unavailable, prompting methods that probe the wireless system with candidate policies, and then use observed performance to determine better policies. This generic procedure is difficult because of the need to cull accurate gradient estimates out of these limited system queries. This paper constructs and exploits smoothed surrogates of constrained ergodic resource allocation problems, the gradients of the former being representable exactly as averages of finite differences that can be obtained through limited system probing. Leveraging this unique property, we develop a new model-free primal-dual algorithm for learning optimal ergodic resource allocations, while we rigorously analyze the relationships between original policy search problems and their surrogates, in both primal and dual domains. First, we show that both primal and dual domain surrogates are uniformly consistent approximations of their corresponding original finite dimensional counterparts. Upon further assuming the use of near-universal policy parameterizations, we also develop explicit bounds on the gap between optimal values of initial, infinite dimensional resource allocation problems, and dual values of their parameterized smoothed surrogates. In fact, we show that this duality gap decreases at a linear rate relative to smoothing and universality parameters. Thus, it can be made arbitrarily small at will, also justifying our proposed primal-dual algorithmic recipe. Numerical simulations confirm the effectiveness of our approach.

Comments:	13 pages, 4 figures
Subjects:	Systems and Control (eess.SY); Machine Learning (cs.LG); Signal Processing (eess.SP); Optimization and Control (math.OC); Machine Learning (stat.ML)
Cite as:	arXiv:1911.03988 [eess.SY]
	(or arXiv:1911.03988v1 [eess.SY] for this version)
	https://doi.org/10.48550/arXiv.1911.03988
Related DOI:	https://doi.org/10.1109/TSP.2020.3030073

Electrical Engineering and Systems Science > Systems and Control

Title:Model-Free Learning of Optimal Ergodic Policies in Wireless Systems

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators