[PDF][PDF] Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks.

Z Jia, S Lin, CR Qi, A Aiken - ICML, 2018 - proceedings.mlr.press
ICML, 2018proceedings.mlr.press
The past few years have witnessed growth in the computational requirements for training
deep convolutional neural networks. Current approaches parallelize training onto multiple
devices by applying a single parallelization strategy (eg, data or model parallelism) to all
layers in a network. Although easy to reason about, these approaches result in suboptimal
runtime performance in largescale distributed training, since different layers in a network
may prefer different parallelization strategies. In this paper, we propose layer-wise …
Abstract
The past few years have witnessed growth in the computational requirements for training deep convolutional neural networks. Current approaches parallelize training onto multiple devices by applying a single parallelization strategy (eg, data or model parallelism) to all layers in a network. Although easy to reason about, these approaches result in suboptimal runtime performance in largescale distributed training, since different layers in a network may prefer different parallelization strategies. In this paper, we propose layer-wise parallelism that allows each layer in a network to use an individual parallelization strategy. We jointly optimize how each layer is parallelized by solving a graph search problem. Our evaluation shows that layer-wise parallelism outperforms state-of-the-art approaches by increasing training throughput, reducing communication costs, achieving better scalability to multiple GPUs, while maintaining original network accuracy.
proceedings.mlr.press
Showing the best result for this search. See all results