Thompson Sampling for the MNL-Bandit

Agrawal, Shipra; Avadhanula, Vashist; Goyal, Vineet; Zeevi, Assaf

Computer Science > Machine Learning

arXiv:1706.00977 (cs)

[Submitted on 3 Jun 2017 (v1), last revised 3 Jan 2019 (this version, v7)]

Title:Thompson Sampling for the MNL-Bandit

Authors:Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, Assaf Zeevi

View PDF

Abstract:We consider a sequential subset selection problem under parameter uncertainty, where at each time step, the decision maker selects a subset of cardinality $K$ from $N$ possible items (arms), and observes a (bandit) feedback in the form of the index of one of the items in said subset, or none. Each item in the index set is ascribed a certain value (reward), and the feedback is governed by a Multinomial Logit (MNL) choice model whose parameters are a priori unknown. The objective of the decision maker is to maximize the expected cumulative rewards over a finite horizon $T$, or alternatively, minimize the regret relative to an oracle that knows the MNL parameters. We refer to this as the MNL-Bandit problem. This problem is representative of a larger family of exploration-exploitation problems that involve a combinatorial objective, and arise in several important application domains. We present an approach to adapt Thompson Sampling to this problem and show that it achieves near-optimal regret as well as attractive numerical performance.

Comments:	Accepted for presentation at Conference on Learning Theory (COLT) 2017
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:1706.00977 [cs.LG]
	(or arXiv:1706.00977v7 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1706.00977

Submission history

From: Vashist Avadhanula [view email]
[v1] Sat, 3 Jun 2017 16:48:34 UTC (263 KB)
[v2] Tue, 13 Jun 2017 09:47:40 UTC (263 KB)
[v3] Sat, 1 Jul 2017 17:36:16 UTC (263 KB)
[v4] Sat, 27 Oct 2018 09:53:17 UTC (252 KB)
[v5] Wed, 31 Oct 2018 06:57:46 UTC (252 KB)
[v6] Wed, 19 Dec 2018 23:14:39 UTC (252 KB)
[v7] Thu, 3 Jan 2019 19:45:01 UTC (252 KB)

Computer Science > Machine Learning

Title:Thompson Sampling for the MNL-Bandit

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Thompson Sampling for the MNL-Bandit

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators