Skip to main content
Log in

An efficient parallelization technique for x264 encoder on heterogeneous platforms consisting of CPUs and GPUs

  • Special Issue
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

H.264/AVC video encoders have been widely used for its high coding efficiency. Since the computational demand proportional to the frame resolution is constantly increasing, it has been of great interest to accelerate H.264/AVC by parallel processing. Recently, graphics processing units (GPUs) have emerged as a viable target for accelerating general purpose applications by exploiting fine-grain data parallelisms. Despite extensive research efforts to use GPUs to accelerate the H.264/AVC algorithm, it has not been successful to achieve any speed-up over the x264 algorithm that is known as the fastest CPU implementation, mainly due to significant communication overhead between the host CPU and the GPU and intra-frame dependency in the algorithm. In this paper, we propose a novel motion-estimation (ME) algorithm tailored for NVIDIA GPU implementation. It is accompanied by a novel pipelining technique, called sub-frame ME processing, to effectively hide the communication overhead between the host CPU and the GPU. Further, we incorporate frame-level parallelization technique to improve the overall throughput. Experimental results show that our proposed H.264 encoder has higher performance than x264 encoder.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

References

  1. JM. http://iphome.hhi.de/suehring/tml/index.html

  2. x264. http://www.videolan.org/developers/x264.html

  3. Merritt et al.: x264: a high performance H.264/AVC encoder. http://neuron2.net/library/avc/overview_x264_v8_5.pdf

  4. Swaroop, K., Rao, K.: Performance analysis and comparison of JM 15.1 and Intel IPP H.264 encoder and decoder. In: Proceedings of the southeastern symposium on system theory, SSST, Mar 2010

  5. Chen, W., Hang, H.: H.264/AVC motion estimation implementation on compute unified device architecture (CUDA). In: Proceedings of the international conference on multimedia and expo (ICME), Apr 2008

  6. Rodriguez-Sanchez, R., Luis Martinez, J., Fernandez-Escribano, G.: A fast GPU-based motion estimation algorithm for H.264/AVC. In: Proceedings of the 18th international conference on Advances in Multimedia Modeling (MMM), pp. 551–562. Klagenfurt, Austria (2012)

  7. Moteiro, E., Vizzotto, B., Diniz, C.: Parallelization of full search motion estimation algorithm for parallel and distributed platforms. Int. J. Parallel Prog. (2012). doi:10.1007/s10766-012-0216-7

    Google Scholar 

  8. Chen, Z., Ji, J., Li, R.: Asynchronous parallel computing model of global motion estimation with CUDA. J. Comput. 7(2), 341–348 (2012)

    Google Scholar 

  9. Kung, M., Au, O., Wong, P., Liu, C.: Block based parallel motion estimation using programmable graphic hardware. In: Proceedings of the international conference on audio, language and image processing (ICALIP), Jul 2008

  10. Schwalb, M., Ewerth, R., Freisleben, B.: Fast motion estimation on graphics hardware for H.264 video encoding. IEEE Trans. Multimed. 11(1), 1–10 (2009)

    Google Scholar 

  11. Chen, M., Chiang, Y., Li H., Chi M.: Efficient multi-frame motion estimation algorithms for MPEG-4 AVC/JVT/H.264. In: Proceedings of the international symposium on circuits and systems (ISCAS), May 2004

  12. Zhou, Z., Sun, M., Hsu, Y.: Fast variable block-size motion estimation algorithms based on merge and split procedures for H.264/MPEG-4 AVC. In: Proceedings of the international symposium on circuits and systems (ISCAS), May 2004

  13. Chen, Z., Xu, J., He, Y., Zheng, Z.: Fast integer-pel and fractional-pel motion estimation for H.264/AVC. Vis. Commun. Image. Represent. 17(2), 264–290 (2006)

    Article  Google Scholar 

  14. Zhu, S., Ma, K.: A new diamond search algorithm for fast block-matching motion estimation. IEEE Trans. Image Process. 9(2), 287–290 (2000)

    Article  MathSciNet  Google Scholar 

  15. Gui-guang, D., Bao-long, G.: Motion vector estimation using line-square search block matching algorithm for video sequences. EURASIP J. Appl. Signal Process. 2004(11), 1750–1756 (2004)

    Article  Google Scholar 

  16. Cheung, N., Au, O.C., Kung, M.: Highly parallel rate-distortion optimized intra-mode decision on multicore graphics processors. IEEE Trans. Circuits Syst. Video Technol. 19(11), 1692–1703 (2009)

    Google Scholar 

  17. Su, H., Wen, M., Ren, J.: High-efficient parallel CAVLC encoders on heterogeneous multicore architectures. Radio Eng. 21(1), 46–55 (2012)

    Google Scholar 

  18. Pieters, B., Hollemeersch, C.J., De Cock, J.: Parallel deblocking filtering in MPEG-4 AVC/H.264 on massively parallel architectures. IEEE Trans. Circuits Syst. Video Technol. 21(1), (2011)

  19. NVCUVENC. http://developer.nvidia.com/cuda/nvidia-codec-libraries

  20. Wittenbrink, C.N., Kilgariff, E., Prabhu, A.: FERMI GF100 GPU architecture. IEEE Micro 31(2), (2011)

  21. Nickolls, J., Buck, I., Garland, M., Skadron, K.: Scalable parallel programming with CUDA. ACM Queue 6(2), 40–53 (2008)

    Article  Google Scholar 

  22. OpenCL with ARM Mali. http://blogs.arm.com/multimedia/775-opencl-with-arm-mali-gpu-computingwith-no-compromises/

  23. Open Computing Language. http://www.khronos.org/opencl/

Download references

Acknowledgments

This research was supported by the MKE (The Ministry of Knowledge Economy), Korea, under the ITRC (Information Technology Research Center) support program supervised by the NIPA (National IT Industry Promotion Agency) (NIPA-2012-H0301-12-1011). Also, this work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No. 2011-0013479).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Youngmin Yi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ko, Y., Yi, Y. & Ha, S. An efficient parallelization technique for x264 encoder on heterogeneous platforms consisting of CPUs and GPUs. J Real-Time Image Proc 9, 5–18 (2014). https://doi.org/10.1007/s11554-012-0317-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-012-0317-y

Keywords

Navigation