Learning Populations of Parameters

Tian, Kevin; Kong, Weihao; Valiant, Gregory

Computer Science > Machine Learning

arXiv:1709.02707 (cs)

[Submitted on 8 Sep 2017 (v1), last revised 22 Nov 2017 (this version, v2)]

Title:Learning Populations of Parameters

Authors:Kevin Tian, Weihao Kong, Gregory Valiant

View PDF

Abstract:Consider the following estimation problem: there are $n$ entities, each with an unknown parameter $p_i \in [0,1]$, and we observe $n$ independent random variables, $X_1,\ldots,X_n$, with $X_i \sim $ Binomial$(t, p_i)$. How accurately can one recover the "histogram" (i.e. cumulative density function) of the $p_i$'s? While the empirical estimates would recover the histogram to earth mover distance $\Theta(\frac{1}{\sqrt{t}})$ (equivalently, $\ell_1$ distance between the CDFs), we show that, provided $n$ is sufficiently large, we can achieve error $O(\frac{1}{t})$ which is information theoretically optimal. We also extend our results to the multi-dimensional parameter case, capturing settings where each member of the population has multiple associated parameters. Beyond the theoretical results, we demonstrate that the recovery algorithm performs well in practice on a variety of datasets, providing illuminating insights into several domains, including politics, sports analytics, and variation in the gender ratio of offspring.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:1709.02707 [cs.LG]
	(or arXiv:1709.02707v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1709.02707

Submission history

From: Weihao Kong [view email]
[v1] Fri, 8 Sep 2017 13:53:26 UTC (180 KB)
[v2] Wed, 22 Nov 2017 06:07:44 UTC (534 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2017-09

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Kevin Tian
Weihao Kong
Gregory Valiant

export BibTeX citation

Computer Science > Machine Learning

Title:Learning Populations of Parameters

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Learning Populations of Parameters

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators