Abstract
Due to throughput requirements above 1 gigapixel/sec for the real-time compression of modern image and video data streams, parallelism for encoding and decoding is inevitable. To achieve parallel decoding, a well-established technique is to insert markers into the variable length code (VLC) stream. By locating markers, it is then possible to extract the sub-streams that are, in turn, decoded in parallel. The use of markers adversely affects compression especially when a high degree of parallelism is required. In this paper, we propose an architecture of a marker-free parallel decoding approach of VLC streams. Instead of multiple local entropy decoders, the proposed architecture is based on using a single parallel entropy decoder in conjunction with a novel format to construct the VLC stream. The approach runs at high clock rates supporting parallelism to a high number of decoders. A synthesized clock frequency well above 110 MHz is achieved for up to 20 decoders on a medium-sized FPGA.
Similar content being viewed by others
References
Recommendation ITU-R BT.2020-2: Parameter values for ultra-high definition television systems for production and international programme exchange (2015)
ITU-T Recommendation H.264 : Advanced video coding for generic audiovisual services. http://www.itu.int/rec/T-REC-H.264-200711-I/en (2007)
ITU-T Recommendation ITU-T H.265: High efficiency video coding. http://handle.itu.int/11.1002/1000/11885 (2013)
Meenderinck, C., Azevedo, A., Juurlink, B., Alvarez Mesa, M., Ramirez, A.: Parallel scalability of video decoders. J. Signal Process. Syst. 57(2), 173–194 (2009). doi:10.1007/s11265-008-0256-9
Wu, N., Wen, M., Ren, H.S.J., Zhang, C.: A parallel H.264 encoder with CUDA: mapping and evaluation. In: 2012 IEEE 18th International Conference on Parallel and Distributed Systems (ICPADS), pp. 276–283 (2012). doi:10.1109/ICPADS.2012.46
Lu, Y., Zhang, Q., Wei, B.: Real-time CPU based H.265/HEVC encoding solution with x86 platform technology. In: 2015 International Conference on Computing, Networking and Communications (ICNC), pp. 418–421 (2015). doi:10.1109/ICCNC.2015.7069380
Saponara, S., Martina, M., Casula, M., Fanucci, L., Masera, G.: Motion estimation and CABAC VLSI co-processors for real-time high-quality H.264/AVC video coding. Microprocess. Microsyst. Embed. Hardw. Des. 34(7–8), 316–328 (2010). doi:10.1016/j.micpro.2010.06.003
Mei-Hua, X., Yu-Lan, C., Feng, R., Zhang-Jin, C.: Optimizing design and FPGA implementation for CABAC decoder. In: 2007 International Symposium on High Density packaging and Microsystem Integration, pp. 1–5 (2007). doi:10.1109/HDP.2007.4283645
Nunez, J.L., Chouliaras, V.A.: High-performance arithmetic coding VLSI macro for the H264 video compression standard. IEEE Trans. Consum. Electron. 51(1), 144–151 (2005). doi:10.1109/TCE.2005.1405712
Yang, Y.C., Guo, J.I.: High-throughput H.264/AVC high-profile CABAC decoder for HDTV applications. IEEE Trans. Circuits Syst. Video Technol. 19(9), 1395–1399 (2009). doi:10.1109/TCSVT.2009.2020340
Sze, V., Chandrakasan, A.P.: Joint algorithm-architecture optimization of CABAC. J. Signal Process. Syst. 69(3), 239–252 (2012). doi:10.1007/s11265-012-0678-2
Liao, T.T., Shen, C.A., Tseng, Y.H.: The algorithm and VLSI architecture of a high efficient motion estimation with adaptive search range for HEVC systems. J. Real-Time Image Process. (2017). doi:10.1007/s11554-017-0697-0
Lung, C.Y., Shen, C.A.: Design and implementation of a highly efficient fractional motion estimation for the HEVC encoder. J. Real-Time Image Process. (2016). doi:10.1007/s11554-016-0663-2
Varma, K.C.R.C., Kumar, M.V.P., Mahapatra, S.: Search range reduction for uni-prediction and bi-prediction in HEVC. J. Real-Time Image Process. (2016). doi:10.1007/s11554-016-0636-5
Sze, V., Budagavi, M.: Parallelization of cabac transform coefficient coding for hevc. In: 2012 Picture Coding Symposium, pp. 509–512 (2012). doi:10.1109/PCS.2012.6213266
Ono, F., Rucklidge, W., Arps, R., Constantinescu, C.: JBIG2—the ultimate bi-level image coding standard. In: ICIP, pp. 140–143 (2000). http://dblp.uni-trier.de/db/conf/icip/icip2000.html#OnoRAC00
Wallace, G.K.: The JPEG still picture compression standard. Commun. ACM 34(4), 30–44 (1991)
Weinberger, M.J., Seroussi, G., Sapiro, G.: The LOCO-I lossless image compression algorithm: principles and standardization into JPEG-LS. IEEE Trans. Image Process. 9(8), 1309–1324 (2000)
Singh, S., Bhasin, A., Saha, K.: Parallelization of variable length decoding. http://www.google.com/patents/US8520958 (2013). US Patent 8,520,958
Korodi, G., He, D., Yang, E., Martin-Cocher, G.: Methods and devices for load balancing in parallel entropy coding and decoding. http://www.google.com/patents/US8730071 (2014). US Patent 8,730,071
Ebrahimi, T., Horne, C.: MPEG-4 natural video coding—an overview. In: Signal Processing: Image Communication, vol. 14. Elsevier, Amsterdam, Netherlands, pp. 365–385 (2000)
ITU: ISO/IEC 10918-1: 1993(E) CCIT Recommendation T.81. http://www.w3.org/Graphics/JPEG/itu-t81.pdf (1993)
Moussalli, R., Najjar, W.A., Luo, X., Khan, A.: A high throughput no-stall Golomb-rice hardware decoder. In: 21st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2013, Seattle, WA, USA, 28–30 April 2013, pp. 65–72. IEEE Computer Society (2013). doi:10.1109/FCCM.2013.9
Altera: White paper: video and image processing design using fpgas systems. Tech. Rep. WP-VIDEO0306-1.1, Altera Corporation (2007)
Bailey, D.: Design for Embedded Image Processing on FPGAs. Wiley, New York (2011). https://books.google.de/books?id=ynSYGQdsgIAC
Baroud, Y., Lê, N., Wang, Z., Kieß, S., Najmabadi, S.M., Simon, S.: A parallel codec architecture for marker-free variable length code streams. In: Proceedings of the 10th HiPEAC Workshop on Reconfigurable Computing (WRC) (2016)
Baroud, Y., Velarde, J.M.M., Simon, S.: Architecture for parallelizing decoding of marker-free variable length code streams. In: 2016 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), pp. 270–275 (2016). doi:10.1109/SPA.2016.7763626
Fimoff, M., Laud, T., Lee, R.: Method of processing variable size blocks of data by storing numbers representing size of data blocks in a fifo. http://www.google.tl/patents/USRE41569 (2010). US Patent RE41,569
Kwon, O.: Apparatus for parallel encoding/decoding of digital video signals. http://www.google.co.ug/patents/EP0720372A1?cl=en (1996). EP Patent App. EP19,940,120,951
Lei, S., Sun, M.T.: An entropy coding system for digital hdtv applications. IEEE Trans. Circuits Syst. Video Technol. 1(1), 147–155 (1991). http://dblp.uni-trier.de/db/journals/tcsv/tcsv1.html#LeiS91
Boliek, M., Allen, J.D., Schwarz, E.L., Gormish, M.J.: Very high speed entropy coding. In: ICIP, vol. 3 (1994)
Lin, H.D., Messerschmitt, D.: Designing a high-throughput VLC decoder. I. Parallel decoding methods. IEEE Trans. Circuits Syst. Video Technol. 2(2), 197–206 (1992). doi:10.1109/76.143419
Sevcenco, A.M., Lu, W.S.: Adaptive down-scaling techniques for JPEG-based low bit-rate image coding. In: 2006 IEEE International Symposium on Signal Processing and Information Technology, pp. 349–354 (2006). doi:10.1109/ISSPIT.2006.270824
Lin, W., Dong, L.: Adaptive downsampling to improve image compression at low bit rates. IEEE Trans. Image Process. 15(9), 2513–2521 (2006). doi:10.1109/TIP.2006.877415
Ahangar, A.I., Agarwal, R., Lakhotia, K.: Real time low complexity VLSI decoder for prefix coded images. In: 2016 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1694–1697 (2016). doi:10.1109/ISCAS.2016.7538893
Lee, E.S., Lee, K.C., Son, K.J., Moon, S.P., Chang, T.G.: Multi-symbol accessing Huffman decoding method for MPEG-2 AAC. J. Electr. Eng. Technol. 4(4) (2014). doi:10.5370/JEET.2014.9.4.1411
Nikara, J., Vassiliadis, S., Takala, J., Sima, M., Liuha, P.: Parallel multiple-symbol variable-length decoding. In: Werner, B. (ed.) IEEE International Conference on Computer Design, pp. 126–131. IEEE Computer Society Press, 10662 Los Vaqueros Circle, P.O. Box 3014, Los Alamitos, CA 90720-1314, Freiburg, Germany (2002). ISBN: 0-7695-1700-5
Howard, P.G., Vitter, J.S.: Fast and efficient lossless image compression. In: Proceedings of the 1993 Data Compression Conference, (Snowbird), pp. 351–360 (1993)
Acknowledgements
This work is part of the project Intelligenter Optischer Sensor zur 2D/3D Objekt-Erfassung und dimensionellen Messtechnik (IOS23) which is financed by the Baden-Württemberg-Stiftung gGmbH.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: bit stream construction unit functionality in C#
Appendix B: symbol distribution unit functionality in C#
Rights and permissions
About this article
Cite this article
Baroud, Y., Mariños Velarde, J.M., Wang, Z. et al. Architecture for parallel marker-free variable length streams decoding. J Real-Time Image Proc 16, 2127–2146 (2019). https://doi.org/10.1007/s11554-017-0715-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-017-0715-2