Resilience in Numerical Methods: A Position on Fault Models and Methodologies

Elliott, James; Hoemmen, Mark; Mueller, Frank

Computer Science > Mathematical Software

arXiv:1401.3013 (cs)

[Submitted on 13 Jan 2014]

Title:Resilience in Numerical Methods: A Position on Fault Models and Methodologies

Authors:James Elliott, Mark Hoemmen, Frank Mueller

View PDF

Abstract:Future extreme-scale computer systems may expose silent data corruption (SDC) to applications, in order to save energy or increase performance. However, resilience research struggles to come up with useful abstract programming models for reasoning about SDC. Existing work randomly flips bits in running applications, but this only shows average-case behavior for a low-level, artificial hardware model. Algorithm developers need to understand worst-case behavior with the higher-level data types they actually use, in order to make their algorithms more resilient. Also, we know so little about how SDC may manifest in future hardware, that it seems premature to draw conclusions about the average case. We argue instead that numerical algorithms can benefit from a numerical unreliability fault model, where faults manifest as unbounded perturbations to floating-point data. Algorithms can use inexpensive "sanity" checks that bound or exclude error in the results of computations. Given a selective reliability programming model that requires reliability only when and where needed, such checks can make algorithms reliable despite unbounded faults. Sanity checks, and in general a healthy skepticism about the correctness of subroutines, are wise even if hardware is perfectly reliable.

Comments:	Position Paper
Subjects:	Mathematical Software (cs.MS); Emerging Technologies (cs.ET); Numerical Analysis (math.NA)
Cite as:	arXiv:1401.3013 [cs.MS]
	(or arXiv:1401.3013v1 [cs.MS] for this version)
	https://doi.org/10.48550/arXiv.1401.3013

Submission history

From: James Elliott [view email]
[v1] Mon, 13 Jan 2014 21:18:48 UTC (29 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.MS

< prev | next >

new | recent | 2014-01

Change to browse by:

cs
cs.ET
math
math.NA

References & Citations

DBLP - CS Bibliography

listing | bibtex

James Elliott
Mark Hoemmen
Frank Mueller

export BibTeX citation

Computer Science > Mathematical Software

Title:Resilience in Numerical Methods: A Position on Fault Models and Methodologies

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Mathematical Software

Title:Resilience in Numerical Methods: A Position on Fault Models and Methodologies

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators