skip to main content
10.1145/2491899.2465557acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
research-article

Practical speculative parallelization of variable-length decompression algorithms

Published: 20 June 2013 Publication History

Abstract

Variable-length coding is widely used for efficient data compression. Typically, the compressor splits the original data into blocks and compresses each block with variable-length codes, hence producing variable-length compressed blocks. Although the compressor can easily exploit ample block-level parallelism, it is much more difficult to extract such coarse-grain parallelism from the decompressor because a block boundary cannot be located until decompression of the previous block is completed. This paper presents novel algorithms to efficiently predict block boundaries and a runtime system that enables efficient block-level parallel decompression, called SDM. The SDM execution model features speculative pipelining with three stages: Scanner, Decompressor, and Merger. The scanner stage employs a high-confidence prediction algorithm that finds compressed block boundaries without fully decompressing individual blocks. This information is communicated to the parallel decompressor stage in which multiple blocks are decompressed in parallel. The decompressed blocks are merged in order by the merger stage to produce the final output. The SDM runtime is specialized to execute this pipeline correctly and efficiently on resource-constrained embedded platforms. With SDM we effectively parallelize three production-grade variable-length decompression algorithms?zlib, bzip2, and H.264?with maximum speedups of 2.50× and 8.53× (and geometric mean speedups of 1.96× and 4.04×) on 4-core and 36-core embedded platforms, respectively.

References

[1]
bzip2 and libbzip. http://bzip2.org/.
[2]
gzip homepage. http://www.gzip.org/.
[3]
H.264: Advanced video coding for generic audiovisual services. http://www.itu.int/rec/T-REC-H.264/.
[4]
JPEG homepage. http://www.jpeg.org/jpeg/.
[5]
The Linux Information Project. http://linfo.org/.
[6]
Mozilla Developer Network. https://developer.mozilla.org/.
[7]
Parallel bzip2. http://compression.ca/pbzip2/.
[8]
A parallel implementation of gzip. http://zlib.net/pigz/}.
[9]
Portable Network Graphics. http://www.libpng.org/pub/png/.
[10]
Samsung Exynos 4 Quad. http://www.samsung.com/exynos/.
[11]
The Linux Kernel Archives. http://www.kernel.org/.
[12]
Tilera TILE-Gx processor family. http://www.tilera.com/.
[13]
Vorbis audio compression. http://xiph.org/vorbis/.
[14]
YUV CIF reference videos. http://trace.eas.asu.edu/yuv/}.
[15]
zlib: A massively spiffy yet delicately unobtrusive compression library. http://zlib.net/.
[16]
A. Bilas, J. Fritts, and J. P. Singh. Real-time parallel MPEG-2 decoding in software. In Proc. of IPPS, 1997.
[17]
M. T. Biskup. Guaranteed synchronization of Huffman codes. In Proc. of Data Compression Conference (DCC), 2008.
[18]
A. Gurhanli, C. C.-P. Chen, and S.-H. Hung. Coarse grain parallelization of H.264 video decoder and memory bottleneck in multi-core architectures. International Journal of Computer Theory and Engineering, 2011.
[19]
S. T. Klein and Y. Wiseman. Parallel Huffman decoding with applications to JPEG files. Computer Journal, 2003.
[20]
P. P. C. Lee, T. Bu, and G. Chandranmenon. A lock-free, cache-efficient multi-core synchronization mechanism for line-rate network traffic monitoring. In Proc. of IPDPS, 2010.
[21]
W. Liu, J. Tuck, L. Ceze, W. Ahn, K. Strauss, J. Renau, and J. Torrellas. POSH: a TLS compiler that exploits program structure. In Proc. of PPoPP, 2006.
[22]
J. Mankin, D. Kaeli, and J. Ardini. Software transactional memory for multicore embedded systems. In Proc. of LCTES, 2009.
[23]
P. Marcuello, J. Tubella, and A. Gonzalez. Value prediction for speculative multithreaded architectures. In Proc. of ISCA, 1999.
[24]
J. Nikara, S. Vassiliadis, J. Takala, M. Sima, and P. Liuha. Parallel multiple-symbol variable-length decoding. In Proc. of ICCD, 2002.
[25]
A. Raman, H. Kim, T. R. Mason, T. B. Jablin, and D. I. August. Speculative parallelization using software multi-threaded transactions. In Proc. of ASPLOS, 2010.
[26]
E. Raman, N. Vachharajani, R. Rangan, and D. I. August. Spice: speculative parallel iteration chunk execution. In Proc. of CGO, 2008.
[27]
Standard Performance Evaluation Corporation. http://www.spec.org/.
[28]
J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. Improving value communication for thread-level speculation. In HPCA, 2002.
[29]
C. Tian, M. Feng, and R. Gupta. Speculative parallelization using state separation and multiple value prediction. In Proc. of ISMM, 2010.
[30]
C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Copy or discard execution model for speculative parallelization on multicores. In Proc. of MICRO, 2008.
[31]
Z. Zhao, B. Wu, and X. She. Speculative parallelization needs rigor: Probabilistic analysis for optimal speculation of finite state machine applications. In Proc. of PACT, 2012.
[32]
C. Zilles and G. Sohi. Master/slave speculative parallelization. In Proc. of MICRO, 2002.
[33]
J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theor., 23(3):337--343, Sept. 2006.

Cited By

View all
  • (2017)JParEnt: Parallel entropy decoding for JPEG decompression on heterogeneous multicore architecturesConcurrency and Computation: Practice and Experience10.1002/cpe.411129:15Online publication date: 10-Jul-2017
  • (2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
  • (2014)Security Vulnerability in Processor-Interconnect Router DesignProceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security10.1145/2660267.2660290(358-368)Online publication date: 3-Nov-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
LCTES '13: Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
June 2013
184 pages
ISBN:9781450320856
DOI:10.1145/2491899
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 48, Issue 5
    LCTES '13
    May 2013
    165 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2499369
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. compression
  2. embedded systems
  3. multicores
  4. parallelization
  5. runtime
  6. speculation

Qualifiers

  • Research-article

Conference

LCTES '13

Acceptance Rates

LCTES '13 Paper Acceptance Rate 16 of 60 submissions, 27%;
Overall Acceptance Rate 116 of 438 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2017)JParEnt: Parallel entropy decoding for JPEG decompression on heterogeneous multicore architecturesConcurrency and Computation: Practice and Experience10.1002/cpe.411129:15Online publication date: 10-Jul-2017
  • (2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
  • (2014)Security Vulnerability in Processor-Interconnect Router DesignProceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security10.1145/2660267.2660290(358-368)Online publication date: 3-Nov-2014
  • (2014)Improved Branch Prediction for Just-in-Time Decompression of Canonical Huffman Bytecode StreamsFrontier and Innovation in Future Computing and Communications10.1007/978-94-017-8798-7_82(719-729)Online publication date: 19-Apr-2014
  • (2022)Lossy Compression of Remote Sensing and Dental Images Corrupted by Spatially Correlated NoiseIntegrated Computer Technologies in Mechanical Engineering - 202110.1007/978-3-030-94259-5_77(1003-1014)Online publication date: 21-Feb-2022
  • (2021)Lossless Decompression Accelerator for Embedded Processor with GUIMicromachines10.3390/mi1202014512:2(145)Online publication date: 31-Jan-2021
  • (2020)High-Throughput FPGA-Based Hardware Accelerators for Deflate Compression and Decompression Using High-Level SynthesisIEEE Access10.1109/ACCESS.2020.29841918(62207-62217)Online publication date: 2020
  • (2020)An Efficient High-Throughput LZ77-Based Decompressor in Reconfigurable LogicJournal of Signal Processing Systems10.1007/s11265-020-01547-w92:9(931-947)Online publication date: 28-May-2020
  • (2019)Accelerating parallel graph computing with speculationProceedings of the 16th ACM International Conference on Computing Frontiers10.1145/3310273.3323049(115-124)Online publication date: 30-Apr-2019
  • (2019)A Speculative Parallel Optimization Method for Industrial Big Data Algorithms2019 IEEE International Conference on Industrial Internet (ICII)10.1109/ICII.2019.00077(417-422)Online publication date: Nov-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media