Smooth Non-Stationary Bandits

Jia, Su; Xie, Qian; Kallus, Nathan; Frazier, Peter I.

Computer Science > Machine Learning

arXiv:2301.12366 (cs)

[Submitted on 29 Jan 2023 (v1), last revised 17 Nov 2024 (this version, v3)]

Title:Smooth Non-Stationary Bandits

Authors:Su Jia, Qian Xie, Nathan Kallus, Peter I. Frazier

View PDF HTML (experimental)

Abstract:In many applications of online decision making, the environment is non-stationary and it is therefore crucial to use bandit algorithms that handle changes. Most existing approaches are designed to protect against non-smooth changes, constrained only by total variation or Lipschitzness over time. However, in practice, environments often change {\em smoothly}, so such algorithms may incur higher-than-necessary regret. We study a non-stationary bandits problem where each arm's mean reward sequence can be embedded into a $\beta$-Hölder function, i.e., a function that is $(\beta-1)$-times Lipschitz-continuously differentiable. The non-stationarity becomes more smooth as $\beta$ increases. When $\beta=1$, this corresponds to the non-smooth regime, where \cite{besbes2014stochastic} established a minimax regret of $\tilde \Theta(T^{2/3})$. We show the first separation between the smooth (i.e., $\beta\ge 2$) and non-smooth (i.e., $\beta=1$) regimes by presenting a policy with $\tilde O(k^{4/5} T^{3/5})$ regret on any $k$-armed, $2$-Hölder instance. We complement this result by showing that the minimax regret on the $\beta$-Hölder family of instances is $\Omega(T^{(\beta+1)/(2\beta+1)})$ for any integer $\beta\ge 1$. This matches our upper bound for $\beta=2$ up to logarithmic factors. Furthermore, we validated the effectiveness of our policy through a comprehensive numerical study using real-world click-through rate data.

Comments:	Accepted by ICML 2023
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Statistics Theory (math.ST)
Cite as:	arXiv:2301.12366 [cs.LG]
	(or arXiv:2301.12366v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2301.12366

Submission history

From: Qian Xie [view email]
[v1] Sun, 29 Jan 2023 06:03:20 UTC (1,848 KB)
[v2] Wed, 7 Jun 2023 17:32:00 UTC (1,774 KB)
[v3] Sun, 17 Nov 2024 18:03:40 UTC (1,970 KB)

Computer Science > Machine Learning

Title:Smooth Non-Stationary Bandits

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Smooth Non-Stationary Bandits

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators