Pipelined parallel LZSS for streaming data compression on GPGPUs. Ozsoy, A., Swany, M., & Chauhan, A. In Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS, 2012.
abstract   bibtex   
In this paper, we present an algorithm and provide design improvements needed to port the serial Lempel-Ziv-Storer-Szymanski (LZSS), lossless data compression algorithm, to a parallelized version suitable for general purpose graphic processor units (GPGPU), specifically for NVIDIA's CUDA Framework. The two main stages of the algorithm, substring matching and encoding, are studied in detail to fit into the GPU architecture. We conducted detailed analysis of our performance results and compared them to serial and parallel CPU implementations of LZSS algorithm. We also benchmarked our algorithm in comparison with well known, widely used programs; GZIP and ZLIB. We achieved up to 34x better throughput than the serial CPU implementation of LZSS algorithm and up to 2.21x better than the parallelized version. © 2012 IEEE.
@inproceedings{
 title = {Pipelined parallel LZSS for streaming data compression on GPGPUs},
 type = {inproceedings},
 year = {2012},
 identifiers = {[object Object]},
 id = {7664d342-fb4a-3f9b-a8f9-6fc41cc464c1},
 created = {2019-10-01T17:20:48.579Z},
 file_attached = {false},
 profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
 last_modified = {2019-10-01T17:23:30.829Z},
 read = {false},
 starred = {false},
 authored = {true},
 confirmed = {true},
 hidden = {false},
 citation_key = {Ozsoy2012},
 folder_uuids = {73f994b4-a3be-4035-a6dd-3802077ce863},
 private_publication = {false},
 abstract = {In this paper, we present an algorithm and provide design improvements needed to port the serial Lempel-Ziv-Storer-Szymanski (LZSS), lossless data compression algorithm, to a parallelized version suitable for general purpose graphic processor units (GPGPU), specifically for NVIDIA's CUDA Framework. The two main stages of the algorithm, substring matching and encoding, are studied in detail to fit into the GPU architecture. We conducted detailed analysis of our performance results and compared them to serial and parallel CPU implementations of LZSS algorithm. We also benchmarked our algorithm in comparison with well known, widely used programs; GZIP and ZLIB. We achieved up to 34x better throughput than the serial CPU implementation of LZSS algorithm and up to 2.21x better than the parallelized version. © 2012 IEEE.},
 bibtype = {inproceedings},
 author = {Ozsoy, A. and Swany, M. and Chauhan, A.},
 booktitle = {Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS}
}

Downloads: 0