Optimizing Huffman Decoding for Error-Bounded Lossy Compression on GPUs

Rivera, Cody; Di, Sheng; Tian, Jiannan; Yu, Xiaodong; Tao, Dingwen; Cappello, Franck

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2201.09118 (cs)

[Submitted on 22 Jan 2022 (v1), last revised 10 Mar 2022 (this version, v2)]

Title:Optimizing Huffman Decoding for Error-Bounded Lossy Compression on GPUs

Authors:Cody Rivera, Sheng Di, Jiannan Tian, Xiaodong Yu, Dingwen Tao, Franck Cappello

View PDF

Abstract:More and more HPC applications require fast and effective compression techniques to handle large volumes of data in storage and transmission. Not only do these applications need to compress the data effectively during simulation, but they also need to perform decompression efficiently for post hoc analysis. SZ is an error-bounded lossy compressor for scientific data, and cuSZ is a version of SZ designed to take advantage of the GPU's power. At present, cuSZ's compression performance has been optimized significantly while its decompression still suffers considerably lower performance because of its sophisticated lossless compression step -- a customized Huffman decoding. In this work, we aim to significantly improve the Huffman decoding performance for cuSZ, thus improving the overall decompression performance in turn. To this end, we first investigate two state-of-the-art GPU Huffman decoders in depth. Then, we propose a deep architectural optimization for both algorithms. Specifically, we take full advantage of CUDA GPU architectures by using shared memory on decoding/writing phases, online tuning the amount of shared memory to use, improving memory access patterns, and reducing warp divergence. Finally, we evaluate our optimized decoders on an Nvidia V100 GPU using eight representative scientific datasets. Our new decoding solution obtains an average speedup of 3.64X over cuSZ's Huffman decoder and improves its overall decompression performance by 2.43X on average.

Comments:	11 pages, 5 figures, 5 tables, accepted by IEEE IPDPS'22
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2201.09118 [cs.DC]
	(or arXiv:2201.09118v2 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2201.09118

Submission history

From: Dingwen Tao [view email]
[v1] Sat, 22 Jan 2022 19:18:18 UTC (560 KB)
[v2] Thu, 10 Mar 2022 01:06:52 UTC (559 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Optimizing Huffman Decoding for Error-Bounded Lossy Compression on GPUs

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Optimizing Huffman Decoding for Error-Bounded Lossy Compression on GPUs

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators