Automatic Generation of Algorithms for High-Speed Reliable Lossy Compression


Project Summary

Fast reliable data compression is urgently needed for many leading-edge scientific instruments and for exascale high-performance computing applications because they produce vast amounts of data at extremely high rates. The goal of this project is to develop a high-speed reliable lossy-compression framework named LC that meets three critical needs: (i) improving the trustworthiness of lossy compression methods and the data reduction quality, (ii) increasing the compression/decompression speed to match the high data generation/acquisition rates, and (iii) supporting progressive compression and decompression with multiple levels of resolution to meet the demands of today's leading scientific applications and instruments.

The project comprises the following three research thrusts. (1) To address the trustworthiness and data-reduction-quality issues, the LC framework will allow users to synthesize customized algorithms for the coding stage in the lossy compression pipeline, optimizing the quality-of-interest preservation and compression ratio. To this end, LC will provide a very large tradeoff space with numerous coding algorithms to choose from and automatically emit the code of the optimal configuration with reliable execution time bounds. (2) To address the speed challenge, we will develop lightweight error-bounded decorrelation strategies, high-speed data predictors, efficient quantization methods, and a new class of encoders called 'essentially lossless' that will compress faster and better than the current state of the art. We will also parallelize the LC framework as well as the generated compression/decompression codes both for CPUs and GPUs and will create algorithms that are portable across heterogeneous architectures. (3) To enable users to build their own multi-resolution progressive compressors, we will extend LC to support the generation of progressive algorithms that meet user requirements adaptively by employing a hierarchical block-wise tree-based structure that can suppress subtrees on demand. The resulting fast reliable lossy compression framework will greatly benefit the many scientific applications that need not only high trustworthiness but also high performance.

DOE Press release

Texas State press release

Project summary slide

LC framework overview slide


Publications

Noushin Azami, Rain Lawson, and Martin Burtscher. LICO: An Effective, High-Speed, Lossless Compressor for Images. Proceedings of the 2024 Data Compression Conference. March 2024. [DCC'24]

Brandon A. Burtchell and Martin Burtscher. Using Machine Learning to Predict Effective Compression Algorithms for Heterogeneous Datasets. Proceedings of the 2024 Data Compression Conference. March 2024. [DCC'24]

Noushin Azami and Martin Burtscher. Compressed In-memory Graphs for Accelerating GPU-based Analytics. Proceedings of the 12th SC Workshop on Irregular Applications: Architectures and Algorithms. November 2022. [IA'22]

Code Releases

The LC Framework for Generating Efficient Data-Compression Algorithms: LC framework code

High-Speed Lossless Image Compression: LICO code

Compressed In-memory Graphs for Accelerating GPU-based Analytics: MPLG code


Team

Martin Burtscher (PI)
Sheng Di (Co-PI)
Franck Cappello (senior advisor)
Noushin Azami (Ph.D. student)
Brandon Burtchell (Ph.D. student)
Alex Fallin (Ph.D. student)
Benila Jerald (Ph.D. student)
Yiqian Liu (Ph.D. student)
Andrew Rodriguez (Ph.D. student)

This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Research (ASCR), under contracts DE-SC0022223 and DE-AC02-06CH11357.

Official Texas State University Disclaimer