Data-driven Error Estimation: Upper Bounding Multiple Errors with No Technical Debt

Krishnamurthy, Sanath Kumar; Athey, Susan; Brunskill, Emma

Computer Science > Machine Learning

arXiv:2405.04636 (cs)

[Submitted on 7 May 2024]

Title:Data-driven Error Estimation: Upper Bounding Multiple Errors with No Technical Debt

Authors:Sanath Kumar Krishnamurthy, Susan Athey, Emma Brunskill

View PDF HTML (experimental)

Abstract:We formulate the problem of constructing multiple simultaneously valid confidence intervals (CIs) as estimating a high probability upper bound on the maximum error for a class/set of estimate-estimand-error tuples, and refer to this as the error estimation problem. For a single such tuple, data-driven confidence intervals can often be used to bound the error in our estimate. However, for a class of estimate-estimand-error tuples, nontrivial high probability upper bounds on the maximum error often require class complexity as input -- limiting the practicality of such methods and often resulting in loose bounds. Rather than deriving theoretical class complexity-based bounds, we propose a completely data-driven approach to estimate an upper bound on the maximum error. The simple and general nature of our solution to this fundamental challenge lends itself to several applications including: multiple CI construction, multiple hypothesis testing, estimating excess risk bounds (a fundamental measure of uncertainty in machine learning) for any training/fine-tuning algorithm, and enabling the development of a contextual bandit pipeline that can leverage any reward model estimation procedure as input (without additional mathematical analysis).

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2405.04636 [cs.LG]
	(or arXiv:2405.04636v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2405.04636

Submission history

From: Sanath Kumar Krishnamurthy [view email]
[v1] Tue, 7 May 2024 19:38:26 UTC (22 KB)

Computer Science > Machine Learning

Title:Data-driven Error Estimation: Upper Bounding Multiple Errors with No Technical Debt

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Data-driven Error Estimation: Upper Bounding Multiple Errors with No Technical Debt

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators