Predicting is not Understanding: Recognizing and Addressing Underspecification in Machine Learning

Teney, Damien; Peyrard, Maxime; Abbasnejad, Ehsan

Computer Science > Machine Learning

arXiv:2207.02598 (cs)

[Submitted on 6 Jul 2022]

Title:Predicting is not Understanding: Recognizing and Addressing Underspecification in Machine Learning

Authors:Damien Teney, Maxime Peyrard, Ehsan Abbasnejad

View PDF

Abstract:Machine learning (ML) models are typically optimized for their accuracy on a given dataset. However, this predictive criterion rarely captures all desirable properties of a model, in particular how well it matches a domain expert's understanding of a task. Underspecification refers to the existence of multiple models that are indistinguishable in their in-domain accuracy, even though they differ in other desirable properties such as out-of-distribution (OOD) performance. Identifying these situations is critical for assessing the reliability of ML models.
We formalize the concept of underspecification and propose a method to identify and partially address it. We train multiple models with an independence constraint that forces them to implement different functions. They discover predictive features that are otherwise ignored by standard empirical risk minimization (ERM), which we then distill into a global model with superior OOD performance. Importantly, we constrain the models to align with the data manifold to ensure that they discover meaningful features. We demonstrate the method on multiple datasets in computer vision (collages, WILDS-Camelyon17, GQA) and discuss general implications of underspecification. Most notably, in-domain performance cannot serve for OOD model selection without additional assumptions.

Comments:	Long version of a paper accepted at the 2022 European Conference on Computer Vision (ECCV)
Subjects:	Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2207.02598 [cs.LG]
	(or arXiv:2207.02598v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2207.02598

Submission history

From: Damien Teney [view email]
[v1] Wed, 6 Jul 2022 11:20:40 UTC (9,897 KB)

Computer Science > Machine Learning

Title:Predicting is not Understanding: Recognizing and Addressing Underspecification in Machine Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Predicting is not Understanding: Recognizing and Addressing Underspecification in Machine Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators