research-article

Practical speculative parallelization of variable-length decompression algorithms

Authors:

Jae W. LeeAuthors Info & Claims

LCTES '13: Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems

Pages 55 - 64

https://doi.org/10.1145/2491899.2465557

Published: 20 June 2013 Publication History

Abstract

Variable-length coding is widely used for efficient data compression. Typically, the compressor splits the original data into blocks and compresses each block with variable-length codes, hence producing variable-length compressed blocks. Although the compressor can easily exploit ample block-level parallelism, it is much more difficult to extract such coarse-grain parallelism from the decompressor because a block boundary cannot be located until decompression of the previous block is completed. This paper presents novel algorithms to efficiently predict block boundaries and a runtime system that enables efficient block-level parallel decompression, called SDM. The SDM execution model features speculative pipelining with three stages: Scanner, Decompressor, and Merger. The scanner stage employs a high-confidence prediction algorithm that finds compressed block boundaries without fully decompressing individual blocks. This information is communicated to the parallel decompressor stage in which multiple blocks are decompressed in parallel. The decompressed blocks are merged in order by the merger stage to produce the final output. The SDM runtime is specialized to execute this pipeline correctly and efficiently on resource-constrained embedded platforms. With SDM we effectively parallelize three production-grade variable-length decompression algorithms?zlib, bzip2, and H.264?with maximum speedups of 2.50× and 8.53× (and geometric mean speedups of 1.96× and 4.04×) on 4-core and 36-core embedded platforms, respectively.

References

[1]

bzip2 and libbzip. http://bzip2.org/.

[2]

gzip homepage. http://www.gzip.org/.

[3]

H.264: Advanced video coding for generic audiovisual services. http://www.itu.int/rec/T-REC-H.264/.

[4]

JPEG homepage. http://www.jpeg.org/jpeg/.

[5]

The Linux Information Project. http://linfo.org/.

[6]

Mozilla Developer Network. https://developer.mozilla.org/.

[7]

Parallel bzip2. http://compression.ca/pbzip2/.

[8]

A parallel implementation of gzip. http://zlib.net/pigz/}.

[9]

Portable Network Graphics. http://www.libpng.org/pub/png/.

[10]

Samsung Exynos 4 Quad. http://www.samsung.com/exynos/.

[11]

The Linux Kernel Archives. http://www.kernel.org/.

[12]

Tilera TILE-Gx processor family. http://www.tilera.com/.

[13]

Vorbis audio compression. http://xiph.org/vorbis/.

[14]

YUV CIF reference videos. http://trace.eas.asu.edu/yuv/}.

[15]

zlib: A massively spiffy yet delicately unobtrusive compression library. http://zlib.net/.

[16]

A. Bilas, J. Fritts, and J. P. Singh. Real-time parallel MPEG-2 decoding in software. In Proc. of IPPS, 1997.

Digital Library

[17]

M. T. Biskup. Guaranteed synchronization of Huffman codes. In Proc. of Data Compression Conference (DCC), 2008.

Digital Library

[18]

A. Gurhanli, C. C.-P. Chen, and S.-H. Hung. Coarse grain parallelization of H.264 video decoder and memory bottleneck in multi-core architectures. International Journal of Computer Theory and Engineering, 2011.

[19]

S. T. Klein and Y. Wiseman. Parallel Huffman decoding with applications to JPEG files. Computer Journal, 2003.

[20]

P. P. C. Lee, T. Bu, and G. Chandranmenon. A lock-free, cache-efficient multi-core synchronization mechanism for line-rate network traffic monitoring. In Proc. of IPDPS, 2010.

[21]

W. Liu, J. Tuck, L. Ceze, W. Ahn, K. Strauss, J. Renau, and J. Torrellas. POSH: a TLS compiler that exploits program structure. In Proc. of PPoPP, 2006.

Digital Library

[22]

J. Mankin, D. Kaeli, and J. Ardini. Software transactional memory for multicore embedded systems. In Proc. of LCTES, 2009.

Digital Library

[23]

P. Marcuello, J. Tubella, and A. Gonzalez. Value prediction for speculative multithreaded architectures. In Proc. of ISCA, 1999.

[24]

J. Nikara, S. Vassiliadis, J. Takala, M. Sima, and P. Liuha. Parallel multiple-symbol variable-length decoding. In Proc. of ICCD, 2002.

Digital Library

[25]

A. Raman, H. Kim, T. R. Mason, T. B. Jablin, and D. I. August. Speculative parallelization using software multi-threaded transactions. In Proc. of ASPLOS, 2010.

Digital Library

[26]

E. Raman, N. Vachharajani, R. Rangan, and D. I. August. Spice: speculative parallel iteration chunk execution. In Proc. of CGO, 2008.

Digital Library

[27]

Standard Performance Evaluation Corporation. http://www.spec.org/.

[28]

J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. Improving value communication for thread-level speculation. In HPCA, 2002.

Digital Library

[29]

C. Tian, M. Feng, and R. Gupta. Speculative parallelization using state separation and multiple value prediction. In Proc. of ISMM, 2010.

Digital Library

[30]

C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Copy or discard execution model for speculative parallelization on multicores. In Proc. of MICRO, 2008.

Digital Library

[31]

Z. Zhao, B. Wu, and X. She. Speculative parallelization needs rigor: Probabilistic analysis for optimal speculation of finite state machine applications. In Proc. of PACT, 2012.

Digital Library

[32]

C. Zilles and G. Sohi. Master/slave speculative parallelization. In Proc. of MICRO, 2002.

Digital Library

[33]

J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theor., 23(3):337--343, Sept. 2006.

Digital Library

Cited By

Sodsong WJung MPark JBurgstaller B(2017)JParEnt: Parallel entropy decoding for JPEG decompression on heterogeneous multicore architecturesConcurrency and Computation: Practice and Experience10.1002/cpe.411129:15Online publication date: 10-Jul-2017
https://doi.org/10.1002/cpe.4111
Estebanez ALlanos DGonzalez-Escribano A(2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
https://dl.acm.org/doi/10.1145/2938369
Song WKim JLee JAbts DAhn GYung MLi N(2014)Security Vulnerability in Processor-Interconnect Router DesignProceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security10.1145/2660267.2660290(358-368)Online publication date: 3-Nov-2014
https://dl.acm.org/doi/10.1145/2660267.2660290
Show More Cited By

Index Terms

Practical speculative parallelization of variable-length decompression algorithms
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments
    2. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

Practical speculative parallelization of variable-length decompression algorithms
LCTES '13

Variable-length coding is widely used for efficient data compression. Typically, the compressor splits the original data into blocks and compresses each block with variable-length codes, hence producing variable-length compressed blocks. Although the ...
Practical speculative parallelization of variable-length decompression algorithms
LCTES '13: Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems

Variable-length coding is widely used for efficient data compression. Typically, the compressor splits the original data into blocks and compresses each block with variable-length codes, hence producing variable-length compressed blocks. Although the ...
Parallelization and performance evaluation of open-source HEVC codecs

High Efficiency Video Coding (HEVC) was developed by the Joint Collaborative Team on Video Coding (JCT-VC) to replace the current H.264/Advanced Video Coding (AVC) standard, which has dominated digital video services in all segments of the domestic and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

LCTES '13: Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems

June 2013

184 pages

ISBN:9781450320856

DOI:10.1145/2491899

General Chair:
Björn Franke
University of Edinburgh, UK
,
Program Chair:
Jingling Xue
University of New South Wales, Australia

ACM SIGPLAN Notices Volume 48, Issue 5
LCTES '13
May 2013
165 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2499369
Issue’s Table of Contents

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

LCTES '13

Sponsor:

LCTES '13: SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems 2013

June 20 - 21, 2013

Washington, Seattle, USA

Acceptance Rates

LCTES '13 Paper Acceptance Rate 16 of 60 submissions, 27%;

Overall Acceptance Rate 116 of 438 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
261
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sodsong WJung MPark JBurgstaller B(2017)JParEnt: Parallel entropy decoding for JPEG decompression on heterogeneous multicore architecturesConcurrency and Computation: Practice and Experience10.1002/cpe.411129:15Online publication date: 10-Jul-2017
https://doi.org/10.1002/cpe.4111
Estebanez ALlanos DGonzalez-Escribano A(2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
https://dl.acm.org/doi/10.1145/2938369
Song WKim JLee JAbts DAhn GYung MLi N(2014)Security Vulnerability in Processor-Interconnect Router DesignProceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security10.1145/2660267.2660290(358-368)Online publication date: 3-Nov-2014
https://dl.acm.org/doi/10.1145/2660267.2660290
Jeong CHang SBurgstaller B(2014)Improved Branch Prediction for Just-in-Time Decompression of Canonical Huffman Bytecode StreamsFrontier and Innovation in Future Computing and Communications10.1007/978-94-017-8798-7_82(719-729)Online publication date: 19-Apr-2014
https://doi.org/10.1007/978-94-017-8798-7_82
Lukin VKrivenko SKaluzhinov IKrylova OKryvenko L(2022)Lossy Compression of Remote Sensing and Dental Images Corrupted by Spatially Correlated NoiseIntegrated Computer Technologies in Mechanical Engineering - 202110.1007/978-3-030-94259-5_77(1003-1014)Online publication date: 21-Feb-2022
https://doi.org/10.1007/978-3-030-94259-5_77
Hwang GCho KHan COh HYoon YLee S(2021)Lossless Decompression Accelerator for Embedded Processor with GUIMicromachines10.3390/mi1202014512:2(145)Online publication date: 31-Jan-2021
https://doi.org/10.3390/mi12020145
Ledwon MCockburn BHan J(2020)High-Throughput FPGA-Based Hardware Accelerators for Deflate Compression and Decompression Using High-Level SynthesisIEEE Access10.1109/ACCESS.2020.29841918(62207-62217)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.2984191
Fang JChen JLee JAl-Ars ZHofstee H(2020)An Efficient High-Throughput LZ77-Based Decompressor in Reconfigurable LogicJournal of Signal Processing Systems10.1007/s11265-020-01547-w92:9(931-947)Online publication date: 28-May-2020
https://doi.org/10.1007/s11265-020-01547-w
Ji SZhao YYi QPalumbo FBecchi MSchulz MSato K(2019)Accelerating parallel graph computing with speculationProceedings of the 16th ACM International Conference on Computing Frontiers10.1145/3310273.3323049(115-124)Online publication date: 30-Apr-2019
https://dl.acm.org/doi/10.1145/3310273.3323049
Wang ZWang HLi J(2019)A Speculative Parallel Optimization Method for Industrial Big Data Algorithms2019 IEEE International Conference on Industrial Internet (ICII)10.1109/ICII.2019.00077(417-422)Online publication date: Nov-2019
https://doi.org/10.1109/ICII.2019.00077
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten