Skip to main content

Abstract

Developing parallel applications that can harness and efficiently use future many-core architectures is the key challenge for scalable computing systems. We contribute to this challenge by presenting a parallel implementation of H.264 that scales to a large number of cores. The algorithm exploits the fact that independent macroblocks (MBs) can be processed in parallel, but whereas a previous approach exploits only intra-frame MB-level parallelism, our algorithm exploits intra-frame as well as inter-frame MB-level parallelism. It is based on the observation that inter-frame dependencies have a limited spatial range. The algorithm has been implemented on a many-core architecture consisting of NXP TriMedia TM3270 embedded processors. This required to develop a subscription mechanism, where MBs are subscribed to the kick-off lists associated with the reference MBs. Extensive simulation results show that the implementation scales very well, achieving a speedup of more than 54 on a 64-core processor, in which case the previous approach achieves a speedup of only 23. Potential drawbacks of the 3D-Wave strategy are that the memory requirements increase since there can be many frames in flight, and that the frame latency might increase. Scheduling policies to address these drawbacks are also presented. The results show that these policies combat memory and latency issues with a negligible effect on the performance scalability. Results analyzing the impact of the memory latency, L1 cache size, and the synchronization and thread management overhead are also presented. Finally, we present performance requirements for entropy (CABAC) decoding.

This work was performed while the fourth author was with NXP Semiconductors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Okano, F., Kanazawa, M., Mitani, K., Hamasaki, K., Sugawara, M., Seino, M., Mochimaru, A., Doi, K.: Ultrahigh-Definition Television System With 4000 Scanning Lines. In: Proc. of NAB Broadcast Engineering Conf., pp. 437–440 (2004)

    Google Scholar 

  2. Drose, M., Clemens, C., Sikora, T.: Extending Single-View Scalable Video Coding to Multi-View Based on H. 264/AVC. In: 2006 IEEE Inter. Conf. on Image Processing, pp. 2977–2980 (2006)

    Google Scholar 

  3. Meenderinck, C., Azevedo, A., Juurlink, B., Alvarez, M., Ramirez, A.: Parallel Scalability of Video Decoders. Journal of Signal Processing Systems (August 2008)

    Google Scholar 

  4. Rodriguez, A., Gonzalez, A., Malumbres, M.P.: Hierarchical Parallelization of an H.264/AVC Video Encoder. In: Proc. Int. Symp. on Parallel Computing in Electrical Engineering, pp. 363–368 (2006)

    Google Scholar 

  5. Chen, Y.K., Li, E.Q., Zhou, X., Ge, S.: Implementation of H.264 Encoder and Decoder on Personal Computers. Journal of Visual Communications and Image Representation 17 (2006)

    Google Scholar 

  6. van der Tol, E., Jaspers, E., Gelderblom, R.: Mapping of H.264 Decoding on a Multiprocessor Architecture. In: Proc. SPIE Conf. on Image and Video Communications and Processing (2003)

    Google Scholar 

  7. International Standard of Joint Video Specification (ITU-T Rec. H. 264— ISO/IEC 14496-10 AVC) (2005)

    Google Scholar 

  8. Oelbaum, T., Baroncini, V., Tan, T.K., Fenimore, C.: Subjective Quality Assessment of the Emerging AVC/H.264 Video Coding Standard. In: Int. Broadcast Conf., IBC (2004)

    Google Scholar 

  9. Alvarez, M., Salami, E., Ramirez, A., Valero, M.: A Performance Characterization of High Definition Digital Video Decoding using H.264/AVC. In: Proc. IEEE Int. Workload Characterization Symp., pp. 24–33 (2005)

    Google Scholar 

  10. Ostermann, J., Bormans, J., List, P., Marpe, D., Narroschke, M., Pereira, F., Stockhammer, T., Wedi, T.: Video Coding with H.264/AVC: Tools, Performance, and Complexity. IEEE Circuits and Systems Magazine 4(1), 7–28 (2004)

    Article  Google Scholar 

  11. van de Waerdt, J., Vassiliadis, S., Das, S., Mirolo, S., Yen, C., Zhong, B., Basto, C., van Itegem, J., Amirtharaj, D., Kalra, K., et al.: The TM3270 Media-Processor. In: MICRO 2005: Proc. of the 38th Inter. Symp. on Microarchitecture, pp. 331–342 (November 2005)

    Google Scholar 

  12. X264. A Free H.264/AVC Encoder

    Google Scholar 

  13. Alvarez, M., Salami, E., Ramirez, A., Valero, M.: HD-VideoBench: A Benchmark for Evaluating High Definition Digital Video Applications. In: Proc. IEEE Int. Symp. on Workload Characterization (2007)

    Google Scholar 

  14. Hoogerbrugge, J., Terechko, A.: A Multithreaded Multicore System for Embedded Media Processing. Trans. on High-Performance Embedded Architectures and Compilers 4(2) (2009)

    Google Scholar 

  15. Alvarez, M., Ramirez, A., Valero, M., Meenderinck, C., Azevedo, A., Juurlink, B.: Performance Evaluation of Macroblock-level Parallelization of H.264 Decoding on a CC-NUMA Multiprocessor Architecture. In: Proc. of the 4CCC: 4th Colombian Computing Conf. (April 2009)

    Google Scholar 

  16. Osorio, R.R., Bruguera, J.D.: An FPGA Architecture for CABAC Decoding in Manycore Systems. In: Proc. of IEEE Application-Specific Systems, Architectures and Processors, pp. 293–298 (July 2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Azevedo, A. et al. (2011). A Highly Scalable Parallel Implementation of H.264. In: Stenström, P. (eds) Transactions on High-Performance Embedded Architectures and Compilers IV. Lecture Notes in Computer Science, vol 6760. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24568-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24568-8_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24567-1

  • Online ISBN: 978-3-642-24568-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics