Architecture for parallel marker-free variable length streams decoding

Baroud, Yousef; Mariños Velarde, José Manuel; Wang, Zhe; Kieß, Steffen; Najmabadi, Seyyed Mahdi; Guhathakurta, Jajnabalkya; Simon, Sven

doi:10.1007/s11554-017-0715-2

Architecture for parallel marker-free variable length streams decoding

Original Research Paper
Published: 19 September 2017

Volume 16, pages 2127–2146, (2019)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Yousef Baroud¹,
José Manuel Mariños Velarde¹,
Zhe Wang¹,
Steffen Kieß¹,
Seyyed Mahdi Najmabadi¹,
Jajnabalkya Guhathakurta¹ &
…
Sven Simon¹

181 Accesses
1 Citation
Explore all metrics

Abstract

Due to throughput requirements above 1 gigapixel/sec for the real-time compression of modern image and video data streams, parallelism for encoding and decoding is inevitable. To achieve parallel decoding, a well-established technique is to insert markers into the variable length code (VLC) stream. By locating markers, it is then possible to extract the sub-streams that are, in turn, decoded in parallel. The use of markers adversely affects compression especially when a high degree of parallelism is required. In this paper, we propose an architecture of a marker-free parallel decoding approach of VLC streams. Instead of multiple local entropy decoders, the proposed architecture is based on using a single parallel entropy decoder in conjunction with a novel format to construct the VLC stream. The approach runs at high clock rates supporting parallelism to a high number of decoders. A synthesized clock frequency well above 110 MHz is achieved for up to 20 decoders on a medium-sized FPGA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-Speed Variable-Length Decoder Using Segmented Code book

CAVLCU: an efficient GPU-based implementation of CAVLC

Article Open access 29 November 2021

Antonio Fuentes-Alventosa, Juan Gómez-Luna, … R. Medina-Carnicer

Real-time motion estimation diamond search algorithm for the new high efficiency video coding on FPGA

Article 10 November 2017

Randa Khemiri, Hassan Kibeya, … Nouri Masmoudi

References

Recommendation ITU-R BT.2020-2: Parameter values for ultra-high definition television systems for production and international programme exchange (2015)
ITU-T Recommendation H.264 : Advanced video coding for generic audiovisual services. http://www.itu.int/rec/T-REC-H.264-200711-I/en (2007)
ITU-T Recommendation ITU-T H.265: High efficiency video coding. http://handle.itu.int/11.1002/1000/11885 (2013)
Meenderinck, C., Azevedo, A., Juurlink, B., Alvarez Mesa, M., Ramirez, A.: Parallel scalability of video decoders. J. Signal Process. Syst. 57(2), 173–194 (2009). doi:10.1007/s11265-008-0256-9
Article Google Scholar
Wu, N., Wen, M., Ren, H.S.J., Zhang, C.: A parallel H.264 encoder with CUDA: mapping and evaluation. In: 2012 IEEE 18th International Conference on Parallel and Distributed Systems (ICPADS), pp. 276–283 (2012). doi:10.1109/ICPADS.2012.46
Lu, Y., Zhang, Q., Wei, B.: Real-time CPU based H.265/HEVC encoding solution with x86 platform technology. In: 2015 International Conference on Computing, Networking and Communications (ICNC), pp. 418–421 (2015). doi:10.1109/ICCNC.2015.7069380
Saponara, S., Martina, M., Casula, M., Fanucci, L., Masera, G.: Motion estimation and CABAC VLSI co-processors for real-time high-quality H.264/AVC video coding. Microprocess. Microsyst. Embed. Hardw. Des. 34(7–8), 316–328 (2010). doi:10.1016/j.micpro.2010.06.003
Article Google Scholar
Mei-Hua, X., Yu-Lan, C., Feng, R., Zhang-Jin, C.: Optimizing design and FPGA implementation for CABAC decoder. In: 2007 International Symposium on High Density packaging and Microsystem Integration, pp. 1–5 (2007). doi:10.1109/HDP.2007.4283645
Nunez, J.L., Chouliaras, V.A.: High-performance arithmetic coding VLSI macro for the H264 video compression standard. IEEE Trans. Consum. Electron. 51(1), 144–151 (2005). doi:10.1109/TCE.2005.1405712
Article Google Scholar
Yang, Y.C., Guo, J.I.: High-throughput H.264/AVC high-profile CABAC decoder for HDTV applications. IEEE Trans. Circuits Syst. Video Technol. 19(9), 1395–1399 (2009). doi:10.1109/TCSVT.2009.2020340
Article Google Scholar
Sze, V., Chandrakasan, A.P.: Joint algorithm-architecture optimization of CABAC. J. Signal Process. Syst. 69(3), 239–252 (2012). doi:10.1007/s11265-012-0678-2
Article Google Scholar
Liao, T.T., Shen, C.A., Tseng, Y.H.: The algorithm and VLSI architecture of a high efficient motion estimation with adaptive search range for HEVC systems. J. Real-Time Image Process. (2017). doi:10.1007/s11554-017-0697-0
Article Google Scholar
Lung, C.Y., Shen, C.A.: Design and implementation of a highly efficient fractional motion estimation for the HEVC encoder. J. Real-Time Image Process. (2016). doi:10.1007/s11554-016-0663-2
Article Google Scholar
Varma, K.C.R.C., Kumar, M.V.P., Mahapatra, S.: Search range reduction for uni-prediction and bi-prediction in HEVC. J. Real-Time Image Process. (2016). doi:10.1007/s11554-016-0636-5
Article Google Scholar
Sze, V., Budagavi, M.: Parallelization of cabac transform coefficient coding for hevc. In: 2012 Picture Coding Symposium, pp. 509–512 (2012). doi:10.1109/PCS.2012.6213266
Ono, F., Rucklidge, W., Arps, R., Constantinescu, C.: JBIG2—the ultimate bi-level image coding standard. In: ICIP, pp. 140–143 (2000). http://dblp.uni-trier.de/db/conf/icip/icip2000.html#OnoRAC00
Wallace, G.K.: The JPEG still picture compression standard. Commun. ACM 34(4), 30–44 (1991)
Article Google Scholar
Weinberger, M.J., Seroussi, G., Sapiro, G.: The LOCO-I lossless image compression algorithm: principles and standardization into JPEG-LS. IEEE Trans. Image Process. 9(8), 1309–1324 (2000)
Article Google Scholar
Singh, S., Bhasin, A., Saha, K.: Parallelization of variable length decoding. http://www.google.com/patents/US8520958 (2013). US Patent 8,520,958
Korodi, G., He, D., Yang, E., Martin-Cocher, G.: Methods and devices for load balancing in parallel entropy coding and decoding. http://www.google.com/patents/US8730071 (2014). US Patent 8,730,071
Ebrahimi, T., Horne, C.: MPEG-4 natural video coding—an overview. In: Signal Processing: Image Communication, vol. 14. Elsevier, Amsterdam, Netherlands, pp. 365–385 (2000)
ITU: ISO/IEC 10918-1: 1993(E) CCIT Recommendation T.81. http://www.w3.org/Graphics/JPEG/itu-t81.pdf (1993)
Moussalli, R., Najjar, W.A., Luo, X., Khan, A.: A high throughput no-stall Golomb-rice hardware decoder. In: 21st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2013, Seattle, WA, USA, 28–30 April 2013, pp. 65–72. IEEE Computer Society (2013). doi:10.1109/FCCM.2013.9
Altera: White paper: video and image processing design using fpgas systems. Tech. Rep. WP-VIDEO0306-1.1, Altera Corporation (2007)
Bailey, D.: Design for Embedded Image Processing on FPGAs. Wiley, New York (2011). https://books.google.de/books?id=ynSYGQdsgIAC
Book Google Scholar
Baroud, Y., Lê, N., Wang, Z., Kieß, S., Najmabadi, S.M., Simon, S.: A parallel codec architecture for marker-free variable length code streams. In: Proceedings of the 10th HiPEAC Workshop on Reconfigurable Computing (WRC) (2016)
Baroud, Y., Velarde, J.M.M., Simon, S.: Architecture for parallelizing decoding of marker-free variable length code streams. In: 2016 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), pp. 270–275 (2016). doi:10.1109/SPA.2016.7763626
Fimoff, M., Laud, T., Lee, R.: Method of processing variable size blocks of data by storing numbers representing size of data blocks in a fifo. http://www.google.tl/patents/USRE41569 (2010). US Patent RE41,569
Kwon, O.: Apparatus for parallel encoding/decoding of digital video signals. http://www.google.co.ug/patents/EP0720372A1?cl=en (1996). EP Patent App. EP19,940,120,951
Lei, S., Sun, M.T.: An entropy coding system for digital hdtv applications. IEEE Trans. Circuits Syst. Video Technol. 1(1), 147–155 (1991). http://dblp.uni-trier.de/db/journals/tcsv/tcsv1.html#LeiS91
Article Google Scholar
Boliek, M., Allen, J.D., Schwarz, E.L., Gormish, M.J.: Very high speed entropy coding. In: ICIP, vol. 3 (1994)
Lin, H.D., Messerschmitt, D.: Designing a high-throughput VLC decoder. I. Parallel decoding methods. IEEE Trans. Circuits Syst. Video Technol. 2(2), 197–206 (1992). doi:10.1109/76.143419
Article Google Scholar
Sevcenco, A.M., Lu, W.S.: Adaptive down-scaling techniques for JPEG-based low bit-rate image coding. In: 2006 IEEE International Symposium on Signal Processing and Information Technology, pp. 349–354 (2006). doi:10.1109/ISSPIT.2006.270824
Lin, W., Dong, L.: Adaptive downsampling to improve image compression at low bit rates. IEEE Trans. Image Process. 15(9), 2513–2521 (2006). doi:10.1109/TIP.2006.877415
Article Google Scholar
Ahangar, A.I., Agarwal, R., Lakhotia, K.: Real time low complexity VLSI decoder for prefix coded images. In: 2016 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1694–1697 (2016). doi:10.1109/ISCAS.2016.7538893
Lee, E.S., Lee, K.C., Son, K.J., Moon, S.P., Chang, T.G.: Multi-symbol accessing Huffman decoding method for MPEG-2 AAC. J. Electr. Eng. Technol. 4(4) (2014). doi:10.5370/JEET.2014.9.4.1411
Article Google Scholar
Nikara, J., Vassiliadis, S., Takala, J., Sima, M., Liuha, P.: Parallel multiple-symbol variable-length decoding. In: Werner, B. (ed.) IEEE International Conference on Computer Design, pp. 126–131. IEEE Computer Society Press, 10662 Los Vaqueros Circle, P.O. Box 3014, Los Alamitos, CA 90720-1314, Freiburg, Germany (2002). ISBN: 0-7695-1700-5
Howard, P.G., Vitter, J.S.: Fast and efficient lossless image compression. In: Proceedings of the 1993 Data Compression Conference, (Snowbird), pp. 351–360 (1993)

Download references

Acknowledgements

This work is part of the project Intelligenter Optischer Sensor zur 2D/3D Objekt-Erfassung und dimensionellen Messtechnik (IOS23) which is financed by the Baden-Württemberg-Stiftung gGmbH.

Author information

Authors and Affiliations

Institut für Parallele und Verteilte Systeme, University of Stuttgart, Stuttgart, Germany
Yousef Baroud, José Manuel Mariños Velarde, Zhe Wang, Steffen Kieß, Seyyed Mahdi Najmabadi, Jajnabalkya Guhathakurta & Sven Simon

Authors

Yousef Baroud
View author publications
You can also search for this author in PubMed Google Scholar
José Manuel Mariños Velarde
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Wang
View author publications
You can also search for this author in PubMed Google Scholar
Steffen Kieß
View author publications
You can also search for this author in PubMed Google Scholar
Seyyed Mahdi Najmabadi
View author publications
You can also search for this author in PubMed Google Scholar
Jajnabalkya Guhathakurta
View author publications
You can also search for this author in PubMed Google Scholar
Sven Simon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yousef Baroud.

Appendices

Appendix A: bit stream construction unit functionality in C#

Appendix B: symbol distribution unit functionality in C#

Rights and permissions

Reprints and permissions

About this article

Cite this article

Baroud, Y., Mariños Velarde, J.M., Wang, Z. et al. Architecture for parallel marker-free variable length streams decoding. J Real-Time Image Proc 16, 2127–2146 (2019). https://doi.org/10.1007/s11554-017-0715-2

Download citation

Received: 27 February 2017
Accepted: 22 August 2017
Published: 19 September 2017
Issue Date: December 2019
DOI: https://doi.org/10.1007/s11554-017-0715-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Architecture for parallel marker-free variable length streams decoding

Abstract

Access this article

Similar content being viewed by others

High-Speed Variable-Length Decoder Using Segmented Code book

CAVLCU: an efficient GPU-based implementation of CAVLC

Real-time motion estimation diamond search algorithm for the new high efficiency video coding on FPGA

References

Acknowledgements