Abstract
The microprocessor industry trend towards many-core architectures introduced the necessity of devising appropriately scalable applications. While implementing software based video decoding, the main challenges are the optimized partitioning of decoder operations, efficient tracking of dependencies and resource synchronization for multiple parallel units. The same applies for hardware implementations of video decoders where monolithic approaches anticipate scalability of the design and reusability of already implemented core components.In this paper, we propose an intermediate data stream format (Meta Format Stream) which is suited for architectural decomposition of video decoding by replacing the conventional monolithic decoder architecture design with a pipelined structure. The Meta Format is forward-oriented and self contained and multistandard capable, so that processing of Meta Streams is independent of the originating bit stream. Our approach does not require special coding settings and is applicable to accelerated decoding of any standards-compliant bit stream. A H.264/AVC multiprocessing proposal is presented as a case study for the potential our our concept. The case study combines coarse grained frame-level parallel decoding of the bit stream with fine-grained macroblock level parallelism in the image processing stage.The proposed H.264 decoder achieved speedup factors of up to 7.6 on an 8 core machine with 2-way SMT. We are reporting actual decoding speeds of up to 150 frames per second in 2160p-resolution.
Similar content being viewed by others
Notes
In interlace coding, the reference entry might also be a field pair.
in display order IBBPBBP…or IBBBPBBBP…
IDR = Instantaneous Decoder Refresh; Intra with invalidated DPB
typ. 120 macroblocks per job in 1920×1080 resolution
Blue Sky, Pedestrian Area, Riverbed provided by Taurus Media Technik; Into Tree, Parkjoy provided by SVT; all freely available
References
Ungerer, T., Robič, B., Šilc, J. (2003). A survey of processors with explicit multithreading. ACM Computing Surveys, 35, 29–63. doi:10.1145/641865.641867.
Marr, D.T., Binns, F., Hill, D.L., Hinton, G., Koufaty, D.A., Miller, J.A., Upton, M. (2002). Hyper-threading technology architecture and microarchitecture. Intel Technology Journal, 6(1), 1–12. http://download.intel.com/technology/itj/2002/volume06issue01/art01_hyper/vol6iss1_art01.pdf
Wiegand, T., Sullivan, G., Bjontegaard, G., Luthra, A. (2003). Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology, 13(7), 560–576.
Marpe, D., Gordon, S., Wiegand, T. (2005). H.264/MPEG4-AVC fidelity range extensions: tools, profiles, performance, and application areas. In ICIP 2005. Genova, Italy.
Schöffmann, K., Fauster, M., Lampl, O., Böszörmenyi, L. (2007). An evaluation of parallelization concepts for baseline-profile compliant H.264/avc decoders. In Lecture notes in computer science 4641 (pp. 782–791). Berlin/Heidelberg: Springer.
Gurhanlia, A., Chen, C.C.-P., Hung, S.-H. (2010). Coarse grain parallelization of H.264 video decoder and memory bottleneck in multicore architectures In Online Preprint. [Online]. Available: http://grid.ntu.edu.tw/html/projects/pro106/pro106.pdf .
Meenderinck, C., Azevedo, A., Juurlink, B., Mesa, M.A., Ramirez, A. (2008). Parallel scalability of video decoders. Journal Sign Process Systems, 57(2), 173-194.
Sihn, K.-H., Baik, H., Kim, J.-T., Bae, S., Song, H.J. (2009). Novel approaches to parallel H.264 decoder on symmetric multicore systems. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2017–2020.
Chen, Y.-K., Li, E.Q., Zhou, X., Ge, S. (2006). Implementation of H.264 encoder and decoder on personal computers. Journal of Visual Communication and Image Representation, 17(2), 509–532. http://www.sciencedirect.com/science/article/pii/S1047320305000398
Roitzsch, M. (2007). Slice-balancing H.264 video encoding for improved scalability of multicore decoding. In Proceedings of the 7th ACM & IEEE international conference on embedded software, ser. EMSOFT ’07 (pp. 269–278). New York, ACM, 2007. doi:10.1145/1289927.1289969.
van der Tol, E.B., Jaspers, E.G., Gelderblom, R.H. (2003). Mapping of H.264 decoding on a multiprocessor architecture. In B. Vasudev, T.R. Hsing, A.G. Tescher, T. Ebrahimi (Eds.), Image and video communications and processing 2003 (pp. 707–718).
Chi, C.C., Juurlink, B., Meenderinck, C. (2010). Evaluation of parallel H.264 decoding strategies for the cell broadband engine. In Proceedings of the 24th ACM international conference on supercomputing, ser. ICS ’10 (pp. 105–114). New York, ACM, [Online]. Available: doi:10.1145/1810085.1810102.
Seitner, F.H., Schreier, R.M., Bleyer, M., Gelautz, M. (2008). Evaluation of data-parallel splitting approaches for H.264 decoding. In Proceedings of the 6th International Conference on Advances in Mobile Computing and Multimedia, ser. MoMM ’08 (pp. 40–49) New York, ACM. doi:10.1145/1.497185.1497198.
Chong, J., Satish, N., Catanzaro, B., Ravindran, K., Keutzer, K. (2007). Efficient parallelization of H.264 decoding with macro block level scheduling. 2007 IEEE international conference on multimedia and expo (pp. 1874–1877).
Mesa, M., Ramirez, A., Azevedo, A., Meenderinck, C., Juurlink, B., Valero, M. (2009). Scalability of macroblock-level parallelism for H.264 decoding. In 2009 15th international conference on parallel and distributed systems (ICPADS) (pp. 236–243).
Hoogerbrugge, J., & Terechko, A. (2011). A multithreaded multicore system for embedded media processing In P. Stenström (Ed.), Transactions on high-performance embedded architectures and compilers III (Vol. 6590, pp. 154–173). Ser. Lecture Notes in Computer Science. Berlin / Heidelberg: Springer. doi:10.1007/978-3-642-19448-1_9.
il Kim, Y., Kim, J.-T., Bae, S., Baik, H., Song, H.J. (2008). H.264/AVC decoder parallelization and optimization on asymetric multicore platform using dynamic load balancing. In 2008 IEEE international conference on multimedia and expo (pp. 1001–1004).
Baker, M.A., Dalale, P., Chatha, K.S., Vrudhula, S.B. (2009). A scalable parallel H.264 decoder on the cell broadband engine architecture. In Proceedings of the 7th IEEE/ACM international conference on hardware/software codesign and system synthesis, ser. CODES+ISSS ’09 (pp. 353–362). New York, ACM, 2009. doi:10.1145/1629435.1629484.
Nishihara, K., Hatabu, A., Moriyoshi, T. (2008). Parallelization of H.264 video decoder for embedded multicore processor. In 2008 IEEE international conference on multimedia and expo (pp. 329–332).
Cho, Y., Kim, S., Lee, J., Shin, H. (2010). Parallelizing the H.264 decoder on the cell BE architecture. In Proceedings of the tenth ACM international conference on embedded software, ser. EMSOFT ’10 (pp. 49–58). New York, ACM. doi:10.1145/1879021.1879029.
Chi, C.C., & Juurlink, B. (2011). A QHD-capable parallel H.264 decoder. In Proceedings of the international conference on supercomputing, ser. ICS ’11 (pp. 317–326). New York, ACM. doi:10.1145/1.995896.1995945.
Richter, H., & Müller, E. (2007). Multistandard video decompression based on a uniform meta format stream. In Proceedings of 26th picture coding symposium (PCS’07).
ITU Telecom (1997). Standardization sector of ITU. Video coding for low bitrate communication. Draft ITU-T Recommendation H.263 Version 2.
ISO/IEC MPEG and ITU-T VCEG (2000). Information technology – Generic coding of moving pictures and associated audio information – Part2: Video (ISO/IEC 13818-2:2000 | ITU-T Rec. H.262).
List, P., Joch, A., Lainema, J., Bjøntegaard, G., Karczewicz, M. (2003). Adaptive deblocking filter. IEEE Transactions Circuits Systematic Video Technology, 13(7), 614–619.
Malvar, H.S., Hallapuro, A., Karczewicz, M., Kerofsky, L. (2003). Low-complexity transform and quantization in H.264/AVC. IEEE Transactions Circuits Systematic Video Technology, 13(7), 598–603.
Ostermann, J., Bormans, J., List, P., Marpe, D., Narroschke, M., Pereira, F., Stockhammer, T., Wedi, T. (2004). Video coding with H.264/AVC: tools, performance, and complexity. IEEE Circuits and Systems Magazine, 4(1), 7–28.
Horowitz, M., Joch, A., Kossentini, F., Hallapuro, A. (2003). H.264/AVC baseline profile decoder complexity analysis. IEEE Transactions Circuits Systematic Video Technology, 13(7), 704–716.
Richter, H., Stabernack, B., Müller, E. (2005). Realtime optimization techniques for processor based H.264 intra frame compression. In Proceedings of GSPx 2005 pervasive signal processing conference.
Seitner, F.H., Schreier, R.M., Bleyer, M., Gelautz, M. (2008). A high-level simulator for the H.264/AVC decoding process in multi-core systems. In Proceedings of SPIE, multimedia on mobile devices. 2008, ser. SPIE IS & T electronic imaging conference (pp. 5–16). San Jose. doi:10.1117/12.766423.
Richter, H., Stabernack, B., Müller, E. (2009). Adaptive multithreaded H.264/AVC decoding. In Proceedings of the 43rd Asilomar conference on signals, systems and computers, ser. asilomar’09 (pp. 886–890). Piscataway, IEEE Press, Available: http://portal.acm.org/citation.cfm?id=1843565.1843760.
Anderson, T. (1990). The performance of spin lock alternatives for shared-memory multiprocessors. IEEE Transactions Parallel Distribution Systematic, 01(1), 6–16.
Sühring, K. (2011). JVT reference software model, version JM18.0. http://iphome.hhi.de/suehring/tml/index.htm.
Aimer, L., Merrit, L., Petit, E.X264 – a free H.264/AVC encoder. Available http://www.videolan.org/developers/x264.html.
Hübert, H., Stabernack, B., Richter, H. (2004). Tool-aided performance analysis and optimization of an H.264 decoder for embedded systems. In The eighth IEEE international symposium on consumer electronics (ISCE 2004).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Richter, H., Stabernack, B. & Kühn, V. Architectural Decomposition of Video Decoders by Meansof an Intermediate Data Stream Format. J Sign Process Syst 75, 65–84 (2014). https://doi.org/10.1007/s11265-013-0792-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-013-0792-9