Handling silent data corruption with the sparse grid combination technique

AP Hinojosa, B Harding, M Hegland… - Software for Exascale …, 2016 - Springer
Software for Exascale Computing-SPPEXA 2013-2015, 2016Springer
We describe two algorithms to detect and filter silent data corruption (SDC) when solving
time-dependent PDEs with the Sparse Grid Combination Technique (SGCT). The SGCT
solves a PDE on many regular full grids of different resolutions, which are then combined to
obtain a high quality solution. The algorithm can be parallelized and run on large HPC
systems. We investigate silent data corruption and show that the SGCT can be used with
minor modifications to filter corrupted data and obtain good results. We apply sanity checks …
Abstract
We describe two algorithms to detect and filter silent data corruption (SDC) when solving time-dependent PDEs with the Sparse Grid Combination Technique (SGCT). The SGCT solves a PDE on many regular full grids of different resolutions, which are then combined to obtain a high quality solution. The algorithm can be parallelized and run on large HPC systems. We investigate silent data corruption and show that the SGCT can be used with minor modifications to filter corrupted data and obtain good results. We apply sanity checks before combining the solution fields to make sure that the data is not corrupted. These sanity checks are derived from well-known error bounds of the classical theory of the SGCT and do not rely on checksums or data replication. We apply our algorithms on a 2D advection equation and discuss the main advantages and drawbacks.
Springer
Showing the best result for this search. See all results