Skip to main content
Log in

Architectural Decomposition of Video Decoders by Meansof an Intermediate Data Stream Format

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

The microprocessor industry trend towards many-core architectures introduced the necessity of devising appropriately scalable applications. While implementing software based video decoding, the main challenges are the optimized partitioning of decoder operations, efficient tracking of dependencies and resource synchronization for multiple parallel units. The same applies for hardware implementations of video decoders where monolithic approaches anticipate scalability of the design and reusability of already implemented core components.In this paper, we propose an intermediate data stream format (Meta Format Stream) which is suited for architectural decomposition of video decoding by replacing the conventional monolithic decoder architecture design with a pipelined structure. The Meta Format is forward-oriented and self contained and multistandard capable, so that processing of Meta Streams is independent of the originating bit stream. Our approach does not require special coding settings and is applicable to accelerated decoding of any standards-compliant bit stream. A H.264/AVC multiprocessing proposal is presented as a case study for the potential our our concept. The case study combines coarse grained frame-level parallel decoding of the bit stream with fine-grained macroblock level parallelism in the image processing stage.The proposed H.264 decoder achieved speedup factors of up to 7.6 on an 8 core machine with 2-way SMT. We are reporting actual decoding speeds of up to 150 frames per second in 2160p-resolution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21
Figure 22
Figure 23
Figure 24
Figure 25
Figure 26
Figure 27
Figure 28

Similar content being viewed by others

Notes

  1. In interlace coding, the reference entry might also be a field pair.

  2. in display order IBBPBBP…or IBBBPBBBP…

  3. IDR = Instantaneous Decoder Refresh; Intra with invalidated DPB

  4. typ. 120 macroblocks per job in 1920×1080 resolution

  5. Blue Sky, Pedestrian Area, Riverbed provided by Taurus Media Technik; Into Tree, Parkjoy provided by SVT; all freely available

References

  1. Ungerer, T., Robič, B., Šilc, J. (2003). A survey of processors with explicit multithreading. ACM Computing Surveys, 35, 29–63. doi:10.1145/641865.641867.

    Article  Google Scholar 

  2. Marr, D.T., Binns, F., Hill, D.L., Hinton, G., Koufaty, D.A., Miller, J.A., Upton, M. (2002). Hyper-threading technology architecture and microarchitecture. Intel Technology Journal, 6(1), 1–12. http://download.intel.com/technology/itj/2002/volume06issue01/art01_hyper/vol6iss1_art01.pdf

    Google Scholar 

  3. Wiegand, T., Sullivan, G., Bjontegaard, G., Luthra, A. (2003). Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology, 13(7), 560–576.

    Article  Google Scholar 

  4. Marpe, D., Gordon, S., Wiegand, T. (2005). H.264/MPEG4-AVC fidelity range extensions: tools, profiles, performance, and application areas. In ICIP 2005. Genova, Italy.

  5. Schöffmann, K., Fauster, M., Lampl, O., Böszörmenyi, L. (2007). An evaluation of parallelization concepts for baseline-profile compliant H.264/avc decoders. In Lecture notes in computer science 4641 (pp. 782–791). Berlin/Heidelberg: Springer.

    Google Scholar 

  6. Gurhanlia, A., Chen, C.C.-P., Hung, S.-H. (2010). Coarse grain parallelization of H.264 video decoder and memory bottleneck in multicore architectures In Online Preprint. [Online]. Available: http://grid.ntu.edu.tw/html/projects/pro106/pro106.pdf .

  7. Meenderinck, C., Azevedo, A., Juurlink, B., Mesa, M.A., Ramirez, A. (2008). Parallel scalability of video decoders. Journal Sign Process Systems, 57(2), 173-194.

    Article  Google Scholar 

  8. Sihn, K.-H., Baik, H., Kim, J.-T., Bae, S., Song, H.J. (2009). Novel approaches to parallel H.264 decoder on symmetric multicore systems. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2017–2020.

  9. Chen, Y.-K., Li, E.Q., Zhou, X., Ge, S. (2006). Implementation of H.264 encoder and decoder on personal computers. Journal of Visual Communication and Image Representation, 17(2), 509–532. http://www.sciencedirect.com/science/article/pii/S1047320305000398

    Article  Google Scholar 

  10. Roitzsch, M. (2007). Slice-balancing H.264 video encoding for improved scalability of multicore decoding. In Proceedings of the 7th ACM & IEEE international conference on embedded software, ser. EMSOFT ’07 (pp. 269–278). New York, ACM, 2007. doi:10.1145/1289927.1289969.

  11. van der Tol, E.B., Jaspers, E.G., Gelderblom, R.H. (2003). Mapping of H.264 decoding on a multiprocessor architecture. In B. Vasudev, T.R. Hsing, A.G. Tescher, T. Ebrahimi (Eds.), Image and video communications and processing 2003 (pp. 707–718).

  12. Chi, C.C., Juurlink, B., Meenderinck, C. (2010). Evaluation of parallel H.264 decoding strategies for the cell broadband engine. In Proceedings of the 24th ACM international conference on supercomputing, ser. ICS ’10 (pp. 105–114). New York, ACM, [Online]. Available: doi:10.1145/1810085.1810102.

  13. Seitner, F.H., Schreier, R.M., Bleyer, M., Gelautz, M. (2008). Evaluation of data-parallel splitting approaches for H.264 decoding. In Proceedings of the 6th International Conference on Advances in Mobile Computing and Multimedia, ser. MoMM ’08 (pp. 40–49) New York, ACM. doi:10.1145/1.497185.1497198.

  14. Chong, J., Satish, N., Catanzaro, B., Ravindran, K., Keutzer, K. (2007). Efficient parallelization of H.264 decoding with macro block level scheduling. 2007 IEEE international conference on multimedia and expo (pp. 1874–1877).

  15. Mesa, M., Ramirez, A., Azevedo, A., Meenderinck, C., Juurlink, B., Valero, M. (2009). Scalability of macroblock-level parallelism for H.264 decoding. In 2009 15th international conference on parallel and distributed systems (ICPADS) (pp. 236–243).

  16. Hoogerbrugge, J., & Terechko, A. (2011). A multithreaded multicore system for embedded media processing In P. Stenström (Ed.), Transactions on high-performance embedded architectures and compilers III (Vol. 6590, pp. 154–173). Ser. Lecture Notes in Computer Science. Berlin / Heidelberg: Springer. doi:10.1007/978-3-642-19448-1_9.

    Chapter  Google Scholar 

  17. il Kim, Y., Kim, J.-T., Bae, S., Baik, H., Song, H.J. (2008). H.264/AVC decoder parallelization and optimization on asymetric multicore platform using dynamic load balancing. In 2008 IEEE international conference on multimedia and expo (pp. 1001–1004).

  18. Baker, M.A., Dalale, P., Chatha, K.S., Vrudhula, S.B. (2009). A scalable parallel H.264 decoder on the cell broadband engine architecture. In Proceedings of the 7th IEEE/ACM international conference on hardware/software codesign and system synthesis, ser. CODES+ISSS ’09 (pp. 353–362). New York, ACM, 2009. doi:10.1145/1629435.1629484.

  19. Nishihara, K., Hatabu, A., Moriyoshi, T. (2008). Parallelization of H.264 video decoder for embedded multicore processor. In 2008 IEEE international conference on multimedia and expo (pp. 329–332).

  20. Cho, Y., Kim, S., Lee, J., Shin, H. (2010). Parallelizing the H.264 decoder on the cell BE architecture. In Proceedings of the tenth ACM international conference on embedded software, ser. EMSOFT ’10 (pp. 49–58). New York, ACM. doi:10.1145/1879021.1879029.

  21. Chi, C.C., & Juurlink, B. (2011). A QHD-capable parallel H.264 decoder. In Proceedings of the international conference on supercomputing, ser. ICS ’11 (pp. 317–326). New York, ACM. doi:10.1145/1.995896.1995945.

  22. Richter, H., & Müller, E. (2007). Multistandard video decompression based on a uniform meta format stream. In Proceedings of 26th picture coding symposium (PCS’07).

  23. ITU Telecom (1997). Standardization sector of ITU. Video coding for low bitrate communication. Draft ITU-T Recommendation H.263 Version 2.

  24. ISO/IEC MPEG and ITU-T VCEG (2000). Information technology – Generic coding of moving pictures and associated audio information – Part2: Video (ISO/IEC 13818-2:2000 | ITU-T Rec. H.262).

  25. List, P., Joch, A., Lainema, J., Bjøntegaard, G., Karczewicz, M. (2003). Adaptive deblocking filter. IEEE Transactions Circuits Systematic Video Technology, 13(7), 614–619.

    Article  Google Scholar 

  26. Malvar, H.S., Hallapuro, A., Karczewicz, M., Kerofsky, L. (2003). Low-complexity transform and quantization in H.264/AVC. IEEE Transactions Circuits Systematic Video Technology, 13(7), 598–603.

    Article  Google Scholar 

  27. Ostermann, J., Bormans, J., List, P., Marpe, D., Narroschke, M., Pereira, F., Stockhammer, T., Wedi, T. (2004). Video coding with H.264/AVC: tools, performance, and complexity. IEEE Circuits and Systems Magazine, 4(1), 7–28.

    Article  Google Scholar 

  28. Horowitz, M., Joch, A., Kossentini, F., Hallapuro, A. (2003). H.264/AVC baseline profile decoder complexity analysis. IEEE Transactions Circuits Systematic Video Technology, 13(7), 704–716.

    Article  Google Scholar 

  29. Richter, H., Stabernack, B., Müller, E. (2005). Realtime optimization techniques for processor based H.264 intra frame compression. In Proceedings of GSPx 2005 pervasive signal processing conference.

  30. Seitner, F.H., Schreier, R.M., Bleyer, M., Gelautz, M. (2008). A high-level simulator for the H.264/AVC decoding process in multi-core systems. In Proceedings of SPIE, multimedia on mobile devices. 2008, ser. SPIE IS & T electronic imaging conference (pp. 5–16). San Jose. doi:10.1117/12.766423.

  31. Richter, H., Stabernack, B., Müller, E. (2009). Adaptive multithreaded H.264/AVC decoding. In Proceedings of the 43rd Asilomar conference on signals, systems and computers, ser. asilomar’09 (pp. 886–890). Piscataway, IEEE Press, Available: http://portal.acm.org/citation.cfm?id=1843565.1843760.

  32. Anderson, T. (1990). The performance of spin lock alternatives for shared-memory multiprocessors. IEEE Transactions Parallel Distribution Systematic, 01(1), 6–16.

    Article  Google Scholar 

  33. Sühring, K. (2011). JVT reference software model, version JM18.0. http://iphome.hhi.de/suehring/tml/index.htm.

  34. Aimer, L., Merrit, L., Petit, E.X264 – a free H.264/AVC encoder. Available http://www.videolan.org/developers/x264.html.

  35. Hübert, H., Stabernack, B., Richter, H. (2004). Tool-aided performance analysis and optimization of an H.264 decoder for embedded systems. In The eighth IEEE international symposium on consumer electronics (ISCE 2004).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Henryk Richter.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Richter, H., Stabernack, B. & Kühn, V. Architectural Decomposition of Video Decoders by Meansof an Intermediate Data Stream Format. J Sign Process Syst 75, 65–84 (2014). https://doi.org/10.1007/s11265-013-0792-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-013-0792-9

Keywords

Navigation