Parallelization of Full Search Motion Estimation Algorithm for Parallel and Distributed Platforms

Monteiro, Eduarda; Vizzotto, Bruno; Diniz, Cláudio; Maule, Marilena; Zatt, Bruno; Bampi, Sergio

doi:10.1007/s10766-012-0216-7

Parallelization of Full Search Motion Estimation Algorithm for Parallel and Distributed Platforms

Published: 21 August 2012

Volume 42, pages 239–264, (2014)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Eduarda Monteiro¹,
Bruno Vizzotto¹,
Cláudio Diniz¹,
Marilena Maule¹,
Bruno Zatt¹ &
…
Sergio Bampi¹

568 Accesses
19 Citations
Explore all metrics

Abstract

This work presents an efficient method to map the Full Search algorithm for Motion Estimation (ME) onto General Purpose Graphic Processing Unit (GPGPU) architectures using Compute Unified Device Architecture (CUDA) programming model. Our method jointly exploits the massive parallelism available in current GPGPU devices and the parallelism potential of Full Search algorithm. Our main goal is to evaluate the feasibility of video codecs implementation using GPGPUs and its advantages and drawbacks compared to other platforms. Therefore, for comparison reasons, three solutions were developed using distinct programming paradigms for distinct underlying hardware architectures: (i) a sequential solution for general-purpose processor (GPP); (ii) a parallel solution for multi-core GPP using OpenMP library; (iii) a distributed solution for cluster/grid machines using Message Passing Interface (MPI) library. The CUDA-based solution for GPGPUs achieves speed-up compatible to the indicated by the theoretical model for different search areas. Our GPGPU Full Search Motion Estimation provides 2×, 20× and 1664× speed-up when compared to MPI, OpenMP and sequential implementations, respectively. Compared to state-of-the-art, our solution reaches up to 17× speed-up.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Prototyping Methodology with Motion Estimation Algorithm

Multi-level Parallelization of Advanced Video Coding on Hybrid CPU+GPU Platforms

Improving the Performance of the CamShift Algorithm Using Dynamic Parallelism on GPU

References

ITU-T Recommendation H.261: Video Codec for Audiovisual Services at p×64 kbit/s, Version 1, ITU-T (1990)
ITU-T Recommendation H.264/AVC (03/10): Advanced Video Coding for Generic Audiovisual Services (2010)
Bhaskaran V., Konstantinides K.: Image and Video Compression Standards: Algorithms and Architectures, 2nd edn. Kluwer, Boston (1999)
Google Scholar
Lin, C., Leou, J.: An adaptative fast full search motion estimation algorithm for H.264. In: Proceedings of the [S.l.]: IEEE, ISCAS 2005-IEEE International Symposium Circuits and Systems, pp. 1493–1496 (2005)
Huang Y-W., Chen C-Y., Tsai C-H., Shen C-F., Chen L-G.: “Survey on Block Matching Motion Estimation Algorithms and Architectures with New Results”. The Journal of VLSI Signal Processing 42(3), 297–320 (2006)
Article MATH Google Scholar
Yang, S., Lin, T., Chien, S.: Real-time motion estimation for 1080p videos on graphics processing units with shared memory optimization. In: IEEE Workshop on Signal Processing Systems, 2009, SiPS 2009, pp. 297–302, 7–9 Oct (2009)
Tan M., Siegel J.M., Siegel H.J.: Parallel Implementations of Block-Based Motion Vector Estimation for Video Compression on Four Parallel Processing Systems. International Journal of Parallel Programming 27(3), 195–225 (1999)
Article Google Scholar
Baglietto, P., Maresca, M., Migliaro, A., Migliardi, M.: Parallel implementation of the full search block matching algorithm for motion estimation. In: International Conference on Application Specific Array Processors, pp. 182–192, July (1995)
GPGPU: General purpose computation on graphics hardware. http://gpgpu.org. Accessed Mar 2012
Nvidia Fermi: NVIDIA’s next generation CUDA^TM compute architecture, Fermi. http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf (2009). Accessed 14 Dec 201
Nvidia: NVIDIA Corporation. http://www.nvidia.com. Accessed 14 Dec (2011)
Nvidia Cuda: NVIDIA CUDA Programming Guide. http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/NVIDIA_CUDA_ProgrammingGuide.pdf (2011). Accessed 14 Dec 2011
OpenMP: The OpenMP API specification for parallel programming. Available at http://openmp.org/wp/
MPI: The Message Passing Interface (MPI) standard. http://www.mcs.anl.gov/research/projects/mpi/. Accessed 14 Dec 2011
Kuhn, P.: Algorithms, Complexity Analysis and VLSI Architectures for MPEG4 Motion Estimation. Kluwer, Boston, p. 239, ISBN:0-7923-8516-0 (1999)
Suhring, K.: JM H.264/AVC Reference Software version 14.2: http://iphome.hhi.de/suehring/tml/download/. Accessed 14 Dec 2011
x264 codec: http://www.videolan.org/developers/x264.html. Accessed 14 Dec 2011
Chen, W.-N., Hang, H.-M.: H.264/AVC motion estimation implementation on compute unified device architecture (CUDA). In: IEEE International Conference on Multimedia and Expo (ICME), pp. 697–700 (2008)
Lin, Y.-C., Li, P.-L, Chang, C.-H., Wu, C.-L., Tsao, Y.-M., Chien, S.-Y.: Multi-pass algorithm of motion estimation in video encoding for generic GPU. In: IEEE International Symposium on Circuits and Systems (ISCAS), pp. 4451–4454 (2006)
Lee, C.-Y., Lin, Y.-C., Wu, C.-L., Chang, C.-H., Tsao, Y.-M., Chien, S.-Y.: Multi-pass and frame parallel algorithms of motion estimation in H.264/AVC for Generic GPU. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 1603–1606 (2007)
Kung, M.C., Au, O.C, Wong, P.H.W., Chun, L.H.: Block based parallel motion estimation using programmable graphics hardware. In: Proceedings of IEEE International Conference on Audio, Language and Image Processing (ICALIP), pp. 7–9, Shanghai, China (2008)
Cheng, R., Yang, E. Liu, T.: speeding up motion estimation algorithms on CUDA technology. In: Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics (PrimeAsia), 2010, pp. 93–96, 22–24 September (2010)
Colic, A., Kalva, H., Furht, B.: Exploring NVIDIA-CUDA for video coding. In: Proceedings of the First Annual ACM SIGMM Conference on Multimedia systems (MMSys ’10), pp. 13–22. ACM, New York, NY, USA (2010)
Yang K.-M., Sun M.-T., Wu L.: A family of VLSI designs for the motion compensation block-matching algorithm. IEEE Transactions on Circuits and Systems 36(10), 1317–1325 (1989)
Article Google Scholar
Xiru Cluster: Xiru Cluster member of Grid’5000. http://gppd.inf.ufrgs.br/cms/gppd/?q=en/resources-list. Accessed Mar 2012
Thrust: Thrust-Code at the speed of light. http://code.google.com/p/thrust/wiki/QuickStartGuide. Accessed 14 Dec 2011
Grid’5000: http://www.grid5000.fr/. Accessed 14 Dec 2011
GPU Direct: Nvidia GPU Direct. http://developer.nvidia.com/gpudirect. Accessed June 2012

Download references

Author information

Authors and Affiliations

Informatics Institute, PPGC, PGMICRO, Federal University of Rio Grande do Sul (UFRGS), Porto Alegre, Brazil
Eduarda Monteiro, Bruno Vizzotto, Cláudio Diniz, Marilena Maule, Bruno Zatt & Sergio Bampi

Authors

Eduarda Monteiro
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Vizzotto
View author publications
You can also search for this author in PubMed Google Scholar
Cláudio Diniz
View author publications
You can also search for this author in PubMed Google Scholar
Marilena Maule
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Zatt
View author publications
You can also search for this author in PubMed Google Scholar
Sergio Bampi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eduarda Monteiro.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Monteiro, E., Vizzotto, B., Diniz, C. et al. Parallelization of Full Search Motion Estimation Algorithm for Parallel and Distributed Platforms. Int J Parallel Prog 42, 239–264 (2014). https://doi.org/10.1007/s10766-012-0216-7

Download citation

Received: 14 December 2011
Accepted: 03 August 2012
Published: 21 August 2012
Issue Date: April 2014
DOI: https://doi.org/10.1007/s10766-012-0216-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Parallelization of Full Search Motion Estimation Algorithm for Parallel and Distributed Platforms

Abstract

Access this article

Similar content being viewed by others

Prototyping Methodology with Motion Estimation Algorithm

Multi-level Parallelization of Advanced Video Coding on Hybrid CPU+GPU Platforms

Improving the Performance of the CamShift Algorithm Using Dynamic Parallelism on GPU

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Parallelization of Full Search Motion Estimation Algorithm for Parallel and Distributed Platforms

Abstract

Access this article

Similar content being viewed by others

Prototyping Methodology with Motion Estimation Algorithm

Multi-level Parallelization of Advanced Video Coding on Hybrid CPU+GPU Platforms

Improving the Performance of the CamShift Algorithm Using Dynamic Parallelism on GPU

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation