[CITATION][C] Themis: Fair and efficient GPU cluster scheduling for machine learning workloads
K Mahajan, A Singhvi, A Balasubramanian, V Batra… - arXiv preprint arXiv …, 2019
Themis: Fair and efficient {GPU} cluster scheduling
Modern distributed machine learning (ML) training workloads benefit significantly from
leveraging GPUs. However, significant contention ensues when multiple such workloads are
run atop a shared cluster of GPUs. A key question is how to fairly apportion GPUs across
workloads. We find that established cluster scheduling disciplines are a poor fit because of
ML workloads' unique attributes: ML jobs have long-running tasks that need to be gang-
scheduled, and their performance is sensitive to tasks' relative placement.
leveraging GPUs. However, significant contention ensues when multiple such workloads are
run atop a shared cluster of GPUs. A key question is how to fairly apportion GPUs across
workloads. We find that established cluster scheduling disciplines are a poor fit because of
ML workloads' unique attributes: ML jobs have long-running tasks that need to be gang-
scheduled, and their performance is sensitive to tasks' relative placement.
Showing the best results for this search. See all results