skip to main content
10.1145/1810085.1810102acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Evaluation of parallel H.264 decoding strategies for the Cell Broadband Engine

Published: 02 June 2010 Publication History

Abstract

How to develop efficient and scalable parallel applications is the key challenge for emerging many-core architectures. We investigate this question by implementing and comparing two parallel H.264 decoders on the Cell architecture. It is expected that future many-cores will use a Cell-like local store memory hierarchy, rather than a non-scalable shared memory. The two implemented parallel algorithms, the Task Pool (TP) and the novel Ring-Line (RL) approach, both exploit macroblock-level parallelism. The TP implementation follows the master-slave paradigm and is very dynamic so that in theory perfect load balancing can be achieved. The RL approach is distributed and more predictable in the sense that the mapping of macroblocks to processing elements is fixed. This allows to better exploit data locality, to overlap communication with computation, and to reduce communication and synchronization overhead. While TP is more scalable in theory, the actual scalability favors RL. Using 16 SPEs, RL obtains a scalability of 12x, while TP achieves only 10.3x. More importantly, the absolute performance of RL is much higher. Using 16 SPEs, RL achieves a throughput of 139.6 frames per second (fps) while TP achieves only 76.6 fps. A large part of the additional performance advantage is due to hiding the memory latency. From the results we conclude that in order to fully leverage the performance of future many-cores, a centralized master should be avoided and the mapping of tasks to cores should be predictable in order to be able to hide the memory latency.

References

[1]
International Standard of Joint Video Specification (ITU-T Rec. H.264| ISO/IEC 14496-10 AVC), 2005.
[2]
M. Alvarez, A. Ramirez, A. Azevedo, C. Meenderinck, B. Juurlink, and M. Valero. Scalability of Macroblock-level Parallelism for H.264 Decoding. In Proc. Int. Conf. on Parallel and Distributed Systems, 2009.
[3]
M. Alvarez, E. Salami, A. Ramirez, and M. Valero. HD-VideoBench: A Benchmark for Evaluating High Definition Digital Video Applications. In Proc. IEEE Int. Symp. on Workload Characterization, 2007.
[4]
H. Baik, K. Sihn, Y. Kim, S. Bae, N. Han, and H. Song. Analysis and Parallelization of H.264 Decoder on Cell Broadband Engine Architecture. In Proc. Int. Symp. on Signal Processing and Information Technology. Samsung Electron. Co., 2007.
[5]
M. Baker, P. Dalale, K. Chatha, and S. Vrudhula. A Scalable Parallel H.264 Decoder on the Cell Broadband Engine Architecture. In Proc. IEEE/ACM Int. Conf. on Hardware/Software Codesign and System Synthesis, volume 7, 2009.
[6]
T. Chen, R. Raghavan, J. Dale, and E. Iwata. Cell Broadband Engine Architecture and its First Implementation: a Performance View. IBM Journal of Research and Development, 51(5), 2007.
[7]
Y. Chen, X. Tian, S. Ge, and M. Girkar. Towards Efficient Multi-Level Threading of H.264 Encoder on Intel Hyper-Threading Architectures. In Proc. Int. Parallel and Distributed Processing Symposium, volume 18, 2004.
[8]
The FFmpeg Libavcodec. http://ffmpeg.org.
[9]
A. Gulati and G. Campbell. Efficient Mapping of the H.264 Encoding Algorithm onto Multiprocessor DSPs. In Proc. SPIE Conf. on Embedded Processors for Multimedia and Communications, 2005.
[10]
J. Hoogerbrugge and A. Terechko. A Multithreaded Multicore System for Embedded Media Processing. Transactions on High-Performance Embedded Architectures and Compilers, 3(2), 2008.
[11]
F. Khunjush and N. Dimopoulos. Extended Characterization of DMA Transfers on the Cell BE processor. In Proc. 13th Int. Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS-08), held in conjunction with IPDPS, 2008.
[12]
C. Meenderinck, A. Azevedo, B. Juurlink, M. Alvarez Mesa, and A. Ramirez. Parallel Scalability of Video Decoders. Journal of Signal Processing Systems, 57(2), 2009.
[13]
T. Oelbaum, V. Baroncini, T. Tan, and C. Fenimore. Subjective Quality Assessment of the Emerging AVC/H.264 Video Coding Standard. In Proc. Int. Broadcast Conf., 2004.
[14]
D. Pham et al. The Design and Implementation of a First-Generation CELL Processor. In Proc. IEEE Int. Solid-State Circuits Conference (ISSCC), 2005.
[15]
A. Rodriguez, A. Gonzalez, and M. Malumbres. Hierarchical Parallelization of an H.264/AVC Video Encoder. In Proc. Int. Symp. on Parallel Computing in Electrical Engineering, 2006.
[16]
M. Roitzsch. Slice-Balancing H.264 Video Encoding for Improved Scalability of Multicore Decoding. In Proc. IEEE Real-Time Systems Symposium, volume 27, 2006.
[17]
E. van der Tol, E. Jaspers, and R. Gelderblom. Mapping of H.264 Decoding on a Multiprocessor Architecture. In Proc. SPIE Conf. on Image and Video Communications and Processing, 2003.
[18]
T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra. Overview of the H.264/AVC Video Coding Standard. IEEE Transactions on Circuits and Systems for Video Technology, 13(7):560--576, July 2003.
[19]
X264. A Free H.264/AVC Encoder. http://www.videolan.org/developers/x264.html.
[20]
L. Zhao, R. Iyer, S. Makineni, J. Moses, R. Illikkal, and D. Newell. Performance, Area and Bandwidth Implications on Large-Scale CMP Cache Design. Proc. Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2007.
[21]
X. Zhou, E. Q. Li, and Y.-K. Chen. Implementation of H.264 Decoder on General-Purpose Processors with Media Instructions. In Proc. SPIE Conf. on Image and Video Communications and Processing, 2003.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '10: Proceedings of the 24th ACM International Conference on Supercomputing
June 2010
365 pages
ISBN:9781450300186
DOI:10.1145/1810085
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 June 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. H.264
  2. cell
  3. decoding
  4. parallel
  5. programming
  6. video

Qualifiers

  • Research-article

Conference

ICS'10
Sponsor:
ICS'10: International Conference on Supercomputing
June 2 - 4, 2010
Ibaraki, Tsukuba, Japan

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2022)High-Throughput Content-Based Video Analysis TechnologiesJournal of Engineering Studies10.3724/SP.J.1224.2014.0029406:03(294-306)Online publication date: 13-Oct-2022
  • (2019)Highly Parallel Line-Based Image Coding for Many CoresIEEE Transactions on Image Processing10.1109/TIP.2011.215998621:1(196-206)Online publication date: 1-Jan-2019
  • (2019)FunctionFlowFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-016-6286-813:1(73-85)Online publication date: 1-Feb-2019
  • (2018)Architectural Decomposition of Video Decoders by Meansof an Intermediate Data Stream FormatJournal of Signal Processing Systems10.1007/s11265-013-0792-975:1(65-84)Online publication date: 27-Dec-2018
  • (2015)A Frame-Parallel 2 Gpixel/s Video Decoder Chip for UHDTV and 3-DTV/FTV ApplicationsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2014.238578023:12(2768-2781)Online publication date: Dec-2015
  • (2015)Dynamic macroblock wavefront parallelism for parallel video codingJournal of Visual Communication and Image Representation10.1016/j.jvcir.2015.01.00528:C(36-43)Online publication date: 1-Apr-2015
  • (2013)HD video decoding scheme based on mobile heterogeneous system architecture2013 IEEE International Conference on Acoustics, Speech and Signal Processing10.1109/ICASSP.2013.6638159(2761-2765)Online publication date: May-2013
  • (2012)Efficient Parallel Framework for H.264/AVC Deblocking Filter on Many-Core PlatformIEEE Transactions on Multimedia10.1109/TMM.2012.219039114:3(510-524)Online publication date: 1-Jun-2012
  • (2012)Hardware-Based Task Dependency Resolution for the StarSs Programming ModelProceedings of the 2012 41st International Conference on Parallel Processing Workshops10.1109/ICPPW.2012.53(367-374)Online publication date: 10-Sep-2012
  • (2012)Exploiting Parallelism: the 2D-WaveScalable Parallel Programming Applied to H.264/AVC Decoding10.1007/978-1-4614-2230-3_4(35-52)Online publication date: 2-May-2012
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media