An efficient parallelization technique for x264 encoder on heterogeneous platforms consisting of CPUs and GPUs

Ko, Youngsub; Yi, Youngmin; Ha, Soonhoi

doi:10.1007/s11554-012-0317-y

An efficient parallelization technique for x264 encoder on heterogeneous platforms consisting of CPUs and GPUs

Special Issue
Published: 11 January 2013

Volume 9, pages 5–18, (2014)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Youngsub Ko¹,
Youngmin Yi² &
Soonhoi Ha¹

678 Accesses
7 Citations
Explore all metrics

Abstract

H.264/AVC video encoders have been widely used for its high coding efficiency. Since the computational demand proportional to the frame resolution is constantly increasing, it has been of great interest to accelerate H.264/AVC by parallel processing. Recently, graphics processing units (GPUs) have emerged as a viable target for accelerating general purpose applications by exploiting fine-grain data parallelisms. Despite extensive research efforts to use GPUs to accelerate the H.264/AVC algorithm, it has not been successful to achieve any speed-up over the x264 algorithm that is known as the fastest CPU implementation, mainly due to significant communication overhead between the host CPU and the GPU and intra-frame dependency in the algorithm. In this paper, we propose a novel motion-estimation (ME) algorithm tailored for NVIDIA GPU implementation. It is accompanied by a novel pipelining technique, called sub-frame ME processing, to effectively hide the communication overhead between the host CPU and the GPU. Further, we incorporate frame-level parallelization technique to improve the overall throughput. Experimental results show that our proposed H.264 encoder has higher performance than x264 encoder.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

GPU-Based Heterogeneous Coding Architecture for HEVC

Heterogeneous CPU plus GPU approaches for HEVC

Article 02 April 2018

Gabriel Cebrián-Márquez, Vicente Galiano, … Otoniel López-Granado

Multi-level Parallelization of Advanced Video Coding on Hybrid CPU+GPU Platforms

References

JM. http://iphome.hhi.de/suehring/tml/index.html
x264. http://www.videolan.org/developers/x264.html
Merritt et al.: x264: a high performance H.264/AVC encoder. http://neuron2.net/library/avc/overview_x264_v8_5.pdf
Swaroop, K., Rao, K.: Performance analysis and comparison of JM 15.1 and Intel IPP H.264 encoder and decoder. In: Proceedings of the southeastern symposium on system theory, SSST, Mar 2010
Chen, W., Hang, H.: H.264/AVC motion estimation implementation on compute unified device architecture (CUDA). In: Proceedings of the international conference on multimedia and expo (ICME), Apr 2008
Rodriguez-Sanchez, R., Luis Martinez, J., Fernandez-Escribano, G.: A fast GPU-based motion estimation algorithm for H.264/AVC. In: Proceedings of the 18th international conference on Advances in Multimedia Modeling (MMM), pp. 551–562. Klagenfurt, Austria (2012)
Moteiro, E., Vizzotto, B., Diniz, C.: Parallelization of full search motion estimation algorithm for parallel and distributed platforms. Int. J. Parallel Prog. (2012). doi:10.1007/s10766-012-0216-7
Google Scholar
Chen, Z., Ji, J., Li, R.: Asynchronous parallel computing model of global motion estimation with CUDA. J. Comput. 7(2), 341–348 (2012)
Google Scholar
Kung, M., Au, O., Wong, P., Liu, C.: Block based parallel motion estimation using programmable graphic hardware. In: Proceedings of the international conference on audio, language and image processing (ICALIP), Jul 2008
Schwalb, M., Ewerth, R., Freisleben, B.: Fast motion estimation on graphics hardware for H.264 video encoding. IEEE Trans. Multimed. 11(1), 1–10 (2009)
Google Scholar
Chen, M., Chiang, Y., Li H., Chi M.: Efficient multi-frame motion estimation algorithms for MPEG-4 AVC/JVT/H.264. In: Proceedings of the international symposium on circuits and systems (ISCAS), May 2004
Zhou, Z., Sun, M., Hsu, Y.: Fast variable block-size motion estimation algorithms based on merge and split procedures for H.264/MPEG-4 AVC. In: Proceedings of the international symposium on circuits and systems (ISCAS), May 2004
Chen, Z., Xu, J., He, Y., Zheng, Z.: Fast integer-pel and fractional-pel motion estimation for H.264/AVC. Vis. Commun. Image. Represent. 17(2), 264–290 (2006)
Article Google Scholar
Zhu, S., Ma, K.: A new diamond search algorithm for fast block-matching motion estimation. IEEE Trans. Image Process. 9(2), 287–290 (2000)
Article MathSciNet Google Scholar
Gui-guang, D., Bao-long, G.: Motion vector estimation using line-square search block matching algorithm for video sequences. EURASIP J. Appl. Signal Process. 2004(11), 1750–1756 (2004)
Article Google Scholar
Cheung, N., Au, O.C., Kung, M.: Highly parallel rate-distortion optimized intra-mode decision on multicore graphics processors. IEEE Trans. Circuits Syst. Video Technol. 19(11), 1692–1703 (2009)
Google Scholar
Su, H., Wen, M., Ren, J.: High-efficient parallel CAVLC encoders on heterogeneous multicore architectures. Radio Eng. 21(1), 46–55 (2012)
Google Scholar
Pieters, B., Hollemeersch, C.J., De Cock, J.: Parallel deblocking filtering in MPEG-4 AVC/H.264 on massively parallel architectures. IEEE Trans. Circuits Syst. Video Technol. 21(1), (2011)
NVCUVENC. http://developer.nvidia.com/cuda/nvidia-codec-libraries
Wittenbrink, C.N., Kilgariff, E., Prabhu, A.: FERMI GF100 GPU architecture. IEEE Micro 31(2), (2011)
Nickolls, J., Buck, I., Garland, M., Skadron, K.: Scalable parallel programming with CUDA. ACM Queue 6(2), 40–53 (2008)
Article Google Scholar
OpenCL with ARM Mali. http://blogs.arm.com/multimedia/775-opencl-with-arm-mali-gpu-computingwith-no-compromises/
Open Computing Language. http://www.khronos.org/opencl/

Download references

Acknowledgments

This research was supported by the MKE (The Ministry of Knowledge Economy), Korea, under the ITRC (Information Technology Research Center) support program supervised by the NIPA (National IT Industry Promotion Agency) (NIPA-2012-H0301-12-1011). Also, this work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No. 2011-0013479).

Author information

Authors and Affiliations

School of EECS, Seoul National University, Seoul, Korea
Youngsub Ko & Soonhoi Ha
School of ECE, University of Seoul, Seoul, Korea
Youngmin Yi

Authors

Youngsub Ko
View author publications
You can also search for this author in PubMed Google Scholar
Youngmin Yi
View author publications
You can also search for this author in PubMed Google Scholar
Soonhoi Ha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Youngmin Yi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ko, Y., Yi, Y. & Ha, S. An efficient parallelization technique for x264 encoder on heterogeneous platforms consisting of CPUs and GPUs. J Real-Time Image Proc 9, 5–18 (2014). https://doi.org/10.1007/s11554-012-0317-y

Download citation

Received: 23 March 2012
Accepted: 15 December 2012
Published: 11 January 2013
Issue Date: March 2014
DOI: https://doi.org/10.1007/s11554-012-0317-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

An efficient parallelization technique for x264 encoder on heterogeneous platforms consisting of CPUs and GPUs

Abstract

Access this article

Similar content being viewed by others

GPU-Based Heterogeneous Coding Architecture for HEVC

Heterogeneous CPU plus GPU approaches for HEVC

Multi-level Parallelization of Advanced Video Coding on Hybrid CPU+GPU Platforms

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An efficient parallelization technique for x264 encoder on heterogeneous platforms consisting of CPUs and GPUs

Abstract

Access this article

Similar content being viewed by others

GPU-Based Heterogeneous Coding Architecture for HEVC

Heterogeneous CPU plus GPU approaches for HEVC

Multi-level Parallelization of Advanced Video Coding on Hybrid CPU+GPU Platforms

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation