Google Scholar

A batched GEMM optimization framework for deep learning

Z Yang, L Lu, R Wang - The Journal of Supercomputing, 2022 - Springer

Z Yang, L Lu, R Wang

The Journal of Supercomputing, 2022•Springer

Abstract

Generalized matrix multiplication (GEMM) is one of the most widely utilized algorithms in many fields such as deep learning, astrophysics, signal processing, and advanced physical analysis. It plays an extremely important role in deep learning, especially for convolutional neural networks, because many of the calculations involved are converted into matrix multiplications in order to speed up the computation process leveraging the parallel processing power of GPUs. However, the sizes of the converted matrices are generally too small to fully occupy the GPU. In this paper, we focus on the impact of GEMM on deep learning and propose a framework for calculating a batch of GEMMs in one kernel function so as to increase GPU occupancy. A suite of tiling strategies is designed for a batch of matrices with small dimensions and variable sizes. The tiling strategy is determined by considering Kernel Occupancy for each GEMM to fit different matrix sizes and GPU architectures. Then the GoogLeNet is implemented using MIOpen as a representative case and the batched GEMM framework is integrated into it. The experimental results show that compared with MAGMA, the elapsed time of the GoogLeNet optimized with our framework obtains 2.60 and 2.79 speedup on AMD Radeon Instinct MI50 and MI100 GPU, respectively.

Springer

Show moreShow less

Save Cite Cited by 3 Related articles All 3 versions

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

A batched GEMM optimization framework for deep learning