skip to main content
10.1145/1629435.1629484acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections

A scalable parallel H.264 decoder on the cell broadband engine architecture

Published: 11 October 2009 Publication History


The H.264 video codec provides exceptional video compression while imposing dramatic increases in computational complexity over previous standards. While exploiting parallelism in H.264 is notoriously difficult, successful parallel implementations promise substantial performance gains, particularly as High Definition (HD) content penetrates a widening variety of applications. We present a highly scalable parallelization scheme implemented on IBM's multicore Cell Broadband Engine (CBE) and based on FFmpeg's open source H.264 video decoder. We address resource limitations and complex data dependencies to achieve nearly ideal decoding speedup for the parallelizable portion of the encoded stream. Our decoder achieves better performance than previous implementations, and is deeply scalable for large format video. We discuss architecture and codec specific performance optimizations, code overlays, data structures, memory access scheduling, and vectorization.


Cell Broadband Engine Architecture. IBM Systems and Technology Group, 2007.
M. Alvarez, A. Ramirez, X. Martorell, E. Ayguade, and M. Valero. Scalability of Macroblock-level Parallelism for H.264 Decoding. In ACACES. Technical University of Catalonia (UPC), 2008.
M. Alvarez, E. Salami, A. Ramirez, and M. Valero. A Performance Characterization of High Definition Digital Video Decoding using H.264/AVC. In HiPEAC. Universitat Politecnica de Catalunya, 2005.
H. Baik, K.-H. Sihn, Y. il Kim, S. Bae, N. Han, and H. J. Song. Analysis and Parallelization of H.264 decoder on Cell >Broadband Engine Architecture. In Signal Processing and Information Technology, pages 791--795. Samsung Electron. Co., Ltd., Suwon, Korea, 2007.
F. Bellard. FFmpeg.
J. Chong, N. Satish, B. Catanzaro, K. Ravindran, and K. Keutzer. Efficient Parallelization of H.264 Decoding With Macro Block Level Scheduling. In IEEE International Conference on Multimedia, pages 1874--1877. University of California, Berkeley, USA, 2007.
M. Horowitz, A. Joch, F. Kossentini, and A. Hallapuro. H.264/AVC Baseline Profile Decoder Complexity Analysis. IEEE Transactions on Circuits and Systems for Video Technology, 13(7):704--716, July 2003.
H. Hwang, T. Oh, and S. H. Hyunuk Jung. Conversion of Reference C Code to Dataflow Model: H.264 Encoder Case Study. In IEEE. Seoul National University KOREA, 2006.
I. Y. Lee, I.-H. Park, D.-W. Lee, and K.-Y. Choi. Implementation of the H.264/AVC Decoder Using the Nios II Processor. In Altera Nioso II Embedded Processor Design Contest. Seoul National University, 2005.
E. Q. Li and Y.-K. Chen. Implementation of H.264 Encoder on General-Purpose Processors with Hyper-Threading Technology. In SPIE. Intel China Research Center, Beijing, 2004.
C. Meenderinck, A. Azevedo, M. Alvarez, B. Juurlink, M. A. Mesa, and A. Ramirez. Parallel Scalability of Video Decoders. Delft University of Technology, 2008.
C. Meenderinck, A. Azevedo, M. Alvarez, B. Juurlink, and A. Ramirez. Parallel Scalability of H.264. Delft University of Technology, 2008.
Microsoft Corporation. WMV HD Content Showcase.
J. Park and S. Ha. Performance Analysis of Parallel Execution of H.264 Encoder on the Cell Processor. In IEEE, ESTIMedia. Seoul National University, 2007.
D. Pham et. al. Overview of the Architecture, Circuit Design, and Physical Implementation of a First-generation Cell Processor. In IEEE Journal of Solid-State Circuits, volume 41, pages 179--196. IBM, January 2006.
I. E. G. Richardson. H.264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia. Wiley, The Robert Gordon University, Aberdeen, UK, 2003.
A. Rodriguez, A. Gonzalez, and M. P. Malumbres. Hierarchical Parallelization of an H.264/AVC Video Encoder. In International Symposium on Parallel Computing in Electrical Engineering, pages 363--368. Technical University of Valencia, 2006.
E. B. van der Tol, E. G. Jaspers, and R. H. Gelderblom. Mapping of H.264 Decoding on a Multiprocessor Architecture. In Proceedings of the SPIE, Image and Video Communications and Processing, pages 1874--1877, 2003.
VideoLAN. x264 -- a free h264/AVC encoder.
T. Wiegand, G. J. Sullivan, G. Bjntegaard, and A. Luthra. Overview of the H.264/AVC Video Coding Standard. IEEE Transactions on Circuits and Systems for Video Technology, 13(7):560--576, July 2003.
Y. Yuan, R. Yan, H. Li, X. Liu, and S. Xu. High Definition H.264 Decoding on Cell Broadband Engine. In Proceedings of the 15th international conference on Multimedia, pages 459--460. IBM China Research Lab, 2007.
Z. Zhao and P. Liang. Data Partition for Wavefront Parallelization of H.264 Video Encoder. In ISCAS. University of California, Riverside, 2006.

Cited By

View all
  • (2014)Parallel optimization of motion estimation for video coding on cell BE processors2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)10.1109/ICMEW.2014.6890651(1-6)Online publication date: Jul-2014
  • (2014)Architectural Decomposition of Video Decoders by Meansof an Intermediate Data Stream FormatJournal of Signal Processing Systems10.1007/s11265-013-0792-975:1(65-84)Online publication date: 1-Apr-2014
  • (2013)HD video decoding scheme based on mobile heterogeneous system architecture2013 IEEE International Conference on Acoustics, Speech and Signal Processing10.1109/ICASSP.2013.6638159(2761-2765)Online publication date: May-2013
  • Show More Cited By

Index Terms

  1. A scalable parallel H.264 decoder on the cell broadband engine architecture



    Information & Contributors


    Published In

    cover image ACM Conferences
    CODES+ISSS '09: Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
    October 2009
    498 pages
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]



    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 October 2009


    Request permissions for this article.

    Check for updates

    Author Tags

    1. H.264
    2. MPEG4
    3. cell broadband engine
    4. code overlay
    5. multicore
    6. parallel
    7. scalable
    8. video


    • Research-article


    ESWeek '09
    ESWeek '09: Fifth Embedded Systems Week
    October 11 - 16, 2009
    Grenoble, France

    Acceptance Rates

    Overall Acceptance Rate 280 of 864 submissions, 32%


    Other Metrics

    Bibliometrics & Citations


    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Feb 2025

    Other Metrics


    Cited By

    View all
    • (2014)Parallel optimization of motion estimation for video coding on cell BE processors2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)10.1109/ICMEW.2014.6890651(1-6)Online publication date: Jul-2014
    • (2014)Architectural Decomposition of Video Decoders by Meansof an Intermediate Data Stream FormatJournal of Signal Processing Systems10.1007/s11265-013-0792-975:1(65-84)Online publication date: 1-Apr-2014
    • (2013)HD video decoding scheme based on mobile heterogeneous system architecture2013 IEEE International Conference on Acoustics, Speech and Signal Processing10.1109/ICASSP.2013.6638159(2761-2765)Online publication date: May-2013
    • (2012)AVid: Annotation driven video decoding for hybrid memories2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia10.1109/ESTIMedia.2012.6507022(2-11)Online publication date: Oct-2012
    • (2011)A QHD-capable parallel H.264 decoderProceedings of the international conference on Supercomputing10.1145/1995896.1995945(317-326)Online publication date: 31-May-2011
    • (2011)Multi-ASIP based parallel and scalable implementation of motion estimation kernel for high definition videos2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia10.1109/ESTIMedia.2011.6088526(56-65)Online publication date: Oct-2011
    • (2010)Scheduling of synchronous data flow models on scratchpad memory based embedded processorsProceedings of the International Conference on Computer-Aided Design10.5555/2133429.2133471(205-212)Online publication date: 7-Nov-2010
    • (2010)Parallelizing the H.264 decoder on the cell BE architectureProceedings of the tenth ACM international conference on Embedded software10.1145/1879021.1879029(49-58)Online publication date: 24-Oct-2010
    • (2010)An elastic software cache with fast prefetching for motion compensation in video decodingProceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis10.1145/1878961.1878967(23-32)Online publication date: 24-Oct-2010
    • (2010)Evaluation of parallel H.264 decoding strategies for the Cell Broadband EngineProceedings of the 24th ACM International Conference on Supercomputing10.1145/1810085.1810102(105-114)Online publication date: 2-Jun-2010
    • Show More Cited By

    View Options

    Login options

    View options


    View or Download as a PDF file.



    View online with eReader.







    Share this Publication link

    Share on social media