A study of network quality of service in many-core MPI applications

L Savoie, DK Lowenthal… - 2018 IEEE …, 2018 - ieeexplore.ieee.org
2018 IEEE International Parallel and Distributed Processing …, 2018ieeexplore.ieee.org
Network contention in existing high performance computing (HPC) systems increases job
execution time and reduces machine throughput. This problem is expected to become worse
in future systems as core counts increase and networks become larger and more
complicated. In this paper, we investigate the use of network Quality of Service (QoS) to
mitigate the effects of network contention. QoS allocates bandwidth to individual jobs, thus
limiting the impact that one job can have on another through network contention. We …
Network contention in existing high performance computing (HPC) systems increases job execution time and reduces machine throughput. This problem is expected to become worse in future systems as core counts increase and networks become larger and more complicated. In this paper, we investigate the use of network Quality of Service (QoS) to mitigate the effects of network contention. QoS allocates bandwidth to individual jobs, thus limiting the impact that one job can have on another through network contention. We consider coarse-grained QoS, in which each job runs at a different priority level, by running a number of micro-benchmarks and applications in different QoS configurations on real hardware with QoS capabilities. Our results indicate that while network contention reduces job performance by as much as 70%, coarse-grained QoS is unlikely to improve throughput on HPC systems and may increase job execution times by more than 100%. Based on our analysis, finer-grained QoS is more likely to improve performance and throughput.
ieeexplore.ieee.org
Showing the best result for this search. See all results