Groute: An asynchronous multi-GPU programming model for irregular computations
Nodes with multiple GPUs are becoming the platform of choice for high-performance
computing. However, most applications are written using bulk-synchronous programming
models, which may not be optimal for irregular algorithms that benefit from low-latency,
asynchronous communication. This paper proposes constructs for asynchronous multi-GPU
programming, and describes their implementation in a thin runtime environment called
Groute. Groute also implements common collective operations and distributed work-lists …
computing. However, most applications are written using bulk-synchronous programming
models, which may not be optimal for irregular algorithms that benefit from low-latency,
asynchronous communication. This paper proposes constructs for asynchronous multi-GPU
programming, and describes their implementation in a thin runtime environment called
Groute. Groute also implements common collective operations and distributed work-lists …
Nodes with multiple GPUs are becoming the platform of choice for high-performance computing. However, most applications are written using bulk-synchronous programming models, which may not be optimal for irregular algorithms that benefit from low-latency, asynchronous communication. This paper proposes constructs for asynchronous multi-GPU programming, and describes their implementation in a thin runtime environment called Groute. Groute also implements common collective operations and distributed work-lists, enabling the development of irregular applications without substantial programming effort. We demonstrate that this approach achieves state-of-the-art performance and exhibits strong scaling for a suite of irregular applications on 8-GPU and heterogeneous systems, yielding over 7x speedup for some algorithms.

Showing the best result for this search. See all results