Collaboratively Learning the Best Option on Graphs, Using Bounded Local Memory

Su, Lili; Zubeldia, Martin; Lynch, Nancy

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:1811.03968 (cs)

[Submitted on 8 Nov 2018 (v1), last revised 23 Dec 2018 (this version, v3)]

Title:Collaboratively Learning the Best Option on Graphs, Using Bounded Local Memory

Authors:Lili Su, Martin Zubeldia, Nancy Lynch

View PDF

Abstract:We consider multi-armed bandit problems in social groups wherein each individual has bounded memory and shares the common goal of learning the best arm/option. We say an individual learns the best option if eventually (as $t\to \infty$) it pulls only the arm with the highest expected reward. While this goal is provably impossible for an isolated individual due to bounded memory, we show that, in social groups, this goal can be achieved easily with the aid of social persuasion (i.e., communication) as long as the communication networks/graphs satisfy some mild conditions. To deal with the interplay between the randomness in the rewards and in the social interaction, we employ the {\em mean-field approximation} method. Considering the possibility that the individuals in the networks may not be exchangeable when the communication networks are not cliques, we go beyond the classic mean-field techniques and apply a refined version of mean-field approximation:
(1) Using coupling we show that, if the communication graph is connected and is either regular or has doubly-stochastic degree-weighted adjacency matrix, with probability $\to 1$ as the social group size $N \to \infty$, every individual in the social group learns the best option.
(2) If the minimum degree of the graph diverges as $N \to \infty$, over an arbitrary but given finite time horizon, the sample paths describing the opinion evolutions of the individuals are asymptotically independent. In addition, the proportions of the population with different opinions converge to the unique solution of a system of ODEs. In the solution of the obtained ODEs, the proportion of the population holding the correct opinion converges to $1$ exponentially fast in time.
Notably, our results hold even if the communication graphs are highly sparse.

Comments:	arXiv admin note: text overlap with arXiv:1802.08159. Authors' note: This work shares some overlap with our preliminary preprint arXiv:1802.08159 which focuses on complete graphs. arXiv:1802.08159 is combined with this work
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
Cite as:	arXiv:1811.03968 [cs.DC]
	(or arXiv:1811.03968v3 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.1811.03968

Submission history

From: Lili Su [view email]
[v1] Thu, 8 Nov 2018 03:59:55 UTC (108 KB)
[v2] Mon, 12 Nov 2018 02:17:41 UTC (108 KB)
[v3] Sun, 23 Dec 2018 05:16:40 UTC (89 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Collaboratively Learning the Best Option on Graphs, Using Bounded Local Memory

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Collaboratively Learning the Best Option on Graphs, Using Bounded Local Memory

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators