[CITATION][C] Themis: Fair and efficient GPU cluster scheduling for machine learning workloads

K Mahajan, A Singhvi, A Balasubramanian, V Batra… - arXiv preprint arXiv …, 2019

Themis: Fair and efficient {GPU} cluster scheduling

K Mahajan, A Balasubramanian, A Singhvi… - … USENIX Symposium on …, 2020 - usenix.org
Modern distributed machine learning (ML) training workloads benefit significantly from
leveraging GPUs. However, significant contention ensues when multiple such workloads are
run atop a shared cluster of GPUs. A key question is how to fairly apportion GPUs across
workloads. We find that established cluster scheduling disciplines are a poor fit because of
ML workloads' unique attributes: ML jobs have long-running tasks that need to be gang-
scheduled, and their performance is sensitive to tasks' relative placement.
Showing the best results for this search. See all results