On the suitability of MPI as a PGAS runtime
2014 21st International Conference on High Performance Computing …, 2014•ieeexplore.ieee.org
Partitioned Global Address Space (PGAS) models are emerging as a popular alternative to
MPI models for designing scalable applications. At the same time, MPI remains a ubiquitous
communication subsystem due to its standardization, high performance, and availability on
leading platforms. In this paper, we explore the suitability of using MPI as a scalable PGAS
communication subsystem. We focus on the Remote Memory Access (RMA) communication
in PGAS models which typically includes get, put, and atomic memory operations. We …
MPI models for designing scalable applications. At the same time, MPI remains a ubiquitous
communication subsystem due to its standardization, high performance, and availability on
leading platforms. In this paper, we explore the suitability of using MPI as a scalable PGAS
communication subsystem. We focus on the Remote Memory Access (RMA) communication
in PGAS models which typically includes get, put, and atomic memory operations. We …
Partitioned Global Address Space (PGAS) models are emerging as a popular alternative to MPI models for designing scalable applications. At the same time, MPI remains a ubiquitous communication subsystem due to its standardization, high performance, and availability on leading platforms. In this paper, we explore the suitability of using MPI as a scalable PGAS communication subsystem. We focus on the Remote Memory Access (RMA) communication in PGAS models which typically includes get, put, and atomic memory operations. We perform an in-depth exploration of design alternatives based on MPI. These alternatives include using a semantically-matching interface such as MPI-RMA, as well as not-so-intuitive interfaces such as MPI two-sided with a combination of multi-threading and dynamic process management. With an in-depth exploration of these alternatives and their shortcomings, we propose a novel design which is facilitated by the data-centric view in PGAS models. This design leverages a combination of highly tuned MPI two-sided semantics and an automatic, user-transparent split of MPI communicators to provide asynchronous progress. We implement the asynchronous progress ranks approach and other approaches within the Communication Runtime for Exascale which is a communication subsystem for Global Arrays. Our performance evaluation spans pure communication benchmarks, graph community detection and sparse matrix-vector multiplication kernels, and a computational chemistry application. The utility of our proposed PR-based approach is demonstrated by a 2.17x speedup on 1008 processors over the other MPI-based designs.
ieeexplore.ieee.org
Showing the best result for this search. See all results