Stragglers in Distributed Matrix Multiplication

R Nissim, O Schwartz - Workshop on Job Scheduling Strategies for …, 2023 - Springer
Workshop on Job Scheduling Strategies for Parallel Processing, 2023Springer
A delay in a single processor may affect an entire system since the slowest processor
typically determines the runtime. Problems with such stragglers are often mitigated using
dynamic load balancing or redundancy solutions such as task replication. Unfortunately, the
former option incurs high communication cost, and the latter significantly increases the
arithmetic cost and memory footprint, making high resource overhead seem inevitable.
Matrix multiplication and other numerical linear algebra kernels typically have structures that …
Abstract
A delay in a single processor may affect an entire system since the slowest processor typically determines the runtime. Problems with such stragglers are often mitigated using dynamic load balancing or redundancy solutions such as task replication. Unfortunately, the former option incurs high communication cost, and the latter significantly increases the arithmetic cost and memory footprint, making high resource overhead seem inevitable. Matrix multiplication and other numerical linear algebra kernels typically have structures that allow better straggler management. Redundancy based solutions tailored for such algorithms often combine codes in the algorithm’s structure. These solutions add fixed cost overhead and may perform worse than the original algorithm when little or no delays occur. We propose a new load-balancing solution tailored for distributed matrix multiplication. Our solution reduces latency overhead by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O \left( P/\log {P} \right) $$\end{document} compared to existing dynamic load-balancing solutions, where P is the number of processors. Our solution overtakes redundancy-based solutions in all parameters: arithmetic cost, bandwidth cost, latency cost, memory footprint, and the number of stragglers it can tolerate. Moreover, our overhead costs depend on the severity of delays and are negligible when delays are minor. We compare our solution with previous ones and demonstrate significant improvements in asymptotic analysis and simulations: up to x4.4 and x5.3 compared to general-purpose dynamic load balancing and redundancy-based solutions, respectively.
Springer
Showing the best result for this search. See all results