skip to main content
research-article

A Hardware-Efficient Block Matching Algorithm and Its Hardware Design for Variable Block Size Motion Estimation in Ultra-High-Definition Video Encoding

Published: 10 January 2019 Publication History

Abstract

Variable block size motion estimation has contributed greatly to achieving an optimal interframe encoding, but involves high computational complexity and huge memory access, which is the most critical bottleneck in ultra-high-definition video encoding. This article presents a hardware-efficient block matching algorithm with an efficient hardware design that is able to reduce the computational complexity of motion estimation while providing a sustained and steady coding performance for high-quality video encoding. A three-level memory organization is proposed to reduce memory bandwidth requirement while supporting a predictive common search window. By applying multiple search strategies and early termination, the proposed design provides 1.8 to 3.7 times higher hardware efficiency than other works. Furthermore, on-chip memory has been reduced by 96.5% and off-chip bandwidth requirement has been reduced by 39.4% thanks to the proposed three-level memory organization. The corresponding power consumption is only 198mW at the highest working frequency of 500MHz. The proposed design is attractive for high-quality video encoding in real-time applications with low power consumption.

References

[1]
Jens Rainer Ohm, Gary J. Sullivan, Heiko Schwarz, Thiow Keng Tan, and Thomas Wiegand. 2012. Comparison of the coding efficiency of video coding standards—Including high efficiency video coding (HEVC). IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (Dec. 2012), 1669--1684.
[2]
T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra. 2003. Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology 13, 7 (July 2003), 560--576.
[3]
B. F. Wu, H. Y. Peng, and T. L. Yu. 2008. Efficient hierarchical motion estimation algorithm and its VLSI architecture. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 16, 10 (Oct. 2008), 1385--1398.
[4]
G. J. Sullivan, J. Ohm, Woo Jin Han, and T. Wiegand. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (Dec. 2012), 1649--1668.
[5]
Renxiang Li, Bing Zeng, and M. L. Liou. 1994. A new three-step search algorithm for block motion estimation. IEEE Transactions on Circuits and Systems for Video Technology 4, 4 (Aug. 1994), 438--442.
[6]
Lai Man Po and Wing Chung Ma. 1996. A novel four-step search algorithm for fast block motion estimation. IEEE Transactions on Circuits and Systems for Video Technology 6, 3 (June 1996), 313--317.
[7]
Shan Zhu and Kai Kuang Ma. 2000. A new diamond search algorithm for fast block-matching motion estimation. IEEE Transactions on Image Processing 9, 2 (Feb. 2000), 287--290.
[8]
Xiaomin Wu, Weizhang Xu, Nanhao Zhu, and Zhanxin Yang. 2010. A fast motion estimation algorithm for H.264. In 2010 International Conference on Signal Acquisition and Processing (ICSAP’10). IEEE, 112--116.
[9]
Y. Ismail, J. B. McNeely, M. Shaaban, H. Mahmoud, and M. A. Bayoumi. 2012. Fast motion estimation system using dynamic models for H.264/AVC Video coding. IEEE Transactions on Circuits and Systems for Video Technology 22, 1 (Jan. 2012), 28--42.
[10]
C. Chien, C. Chien, J. Chu, J. Guoa, and C. Cheng. 2009. A 252Kgates/4.9Kbytes SRAM/71mW multistandard video decoder for high defnition video applications. ACM Transactions on Design Automation of Electronic Systems 14, 1 (Jan. 2009), 1--17.
[11]
X. Bao, D. Zhou, P. Liu, and S. Goto. 2012. An advanced hierarchical motion estimation scheme with lossless frame recompression and early-level termination for beyond high-definition video coding. IEEE Transactions on Multimedia 14, 2 (April 2012), 237--249.
[12]
L. F. Ding, W. Y. Chen, P. K. Tsung, T. D. Chuang, P. H. Hsiao, Y. H. Chen, H. K. Chiu, S. Y. Chien, and L. G. Chen. 2010. A 212 MPixels/s 4096 2160p multiview video encoder chip for 3D/quad full HDTV applications. IEEE Journal of Solid-State Circuits 45, 1 (Jan. 2010), 46--58.
[13]
Chen-Hsuan Lin, Lu Wan, and Deming Chen. 2017. C-mine: Data mining of logic common cases for improved timing error resilience with energy efficiency. ACM Transactions on Design Automation of Electronic Systems 23, 2 (Nov. 2017), 20:1--20:23.
[14]
D. Zhou, J. Zhou, G. He, and S. Goto. 2014. A 1.59 Gpixel/s motion estimation processor with 211 to +211 search range for UHDTV video encoder. IEEE Journal of Solid-State Circuits 49, 4 (April 2014), 827--837.
[15]
Swee Yeow Yap and J. V. McCanny. 2004. A VLSI architecture for variable block size video motion estimation. IEEE Transactions on Circuits and Systems II: Express Briefs 51, 7 (July 2004), 384--389.
[16]
A. C. Tsai, K. Bharanitharan, J. F. Wang, and K. I. Lee. 2012. Effective search point reduction algorithm and its VLSI design for HDTV H.264/AVC variable block size motion estimation. IEEE Transactions on Circuits and Systems for Video Technology 22, 7 (July 2012), 981--988.
[17]
Jen Chieh Tuan, Tian Sheuan Chang, and Chein Wei Jen. 2002. On the data reuse and memory bandwidth analysis for full-search block-matching VLSI architecture. IEEE Transactions on Circuits and Systems for Video Technology 12, 1 (Jan. 2002), 61--72.
[18]
Ching-Yeh Chen, Chao-Tsung Huang, Yi-Hau Chen, and Liang-Gee Chen. 2006. Level C+ data reuse scheme for motion estimation with corresponding coding orders. IEEE Transactions on Circuits and Systems for Video Technology 16, 4 (April 2006), 553--558.
[19]
C. Y. Kao and Y. L. Lin. 2010. A memory-efficient and highly parallel architecture for variable block size integer motion estimation in H.264/AVC. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 18, 6 (June 2010), 866--874.
[20]
H. Yin, H. Jia, H. Qi, X. Ji, X. Xie, and W. Gao. 2010. A hardware-efficient multi-resolution block matching algorithm and its VLSI architecture for high definition MPEG-like video encoders. IEEE Transactions on Circuits and Systems for Video Technology 20, 9 (Sept. 2010), 1242--1254.
[21]
X. Zhang, X. Liu, A. Ramachandran, C. Zhuge, S. Tang, P. Ouyang, Z. Cheng, K. Rupnow, and D. Chen. 2017. High-performance video content recognition with long-term recurrent convolutional network for FPGA. In 27th International Conference on Field Programmable Logic and Applications (FPL’17). IEEE, 1--4.
[22]
Libo Yang, K. Yu, Jiang Li, and Shipeng Li. 2005. An effective variable block-size early termination algorithm for H.264 video coding. IEEE Transactions on Circuits and Systems for Video Technology 15, 6 (June 2005), 784--788.
[23]
M. G. Sarwer and J. Wu. 2009. Adaptive variable block-size early motion estimation termination algorithm for H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology 19, 8 (Aug. 2009), 1196--1201.
[24]
K. Rupnow, Y. Liang, Y. Li, D. Min, M. Do, and D. Chen. 2011. High level synthesis of stereo matching: Productivity, performance, and software constraints. In International Conference on Field-Programmable Technology. IEEE, 1--8.
[25]
E. A. AlQaralleh, Y. A. Alqudah, and B. H. Sababha. 2015. Hardware efficient early termination mechanism in motion estimation for H.264/AVC. In 5th International Conference on Digital Information and Communication Technology and Its Applications (DICTAP’15). IEEE, 13--17.
[26]
Yuanzhi Zhang and Chao Lu. 2018. Efficient algorithm adaptations and fully-parallel hardware architecture for HEVC intra encoder. IEEE Transactions on Circuits and Systems for Video Technology, early access (2018).
[27]
W. Lin, K. Panusopone, D. Baylon, and M. T. Sun. 2009. A new class-based early termination method for fast motion estimation in video coding. In IEEE International Symposium on Circuits and Systems. IEEE, 625--628.
[28]
A. Mirtar I, S. Deya, and A. Raghunathan. 2015. An application adaptation approach to mitigate the impact of dynamic thermal management on video encoding. ACM Transactions on Design Automation of Electronic Systems 20, 4 (Sep. 2015), 50--77.
[29]
Yao Nie and Kai-Kuang Ma. 2002. Adaptive rood pattern search for fast block-matching motion estimation. IEEE Transactions on Image Processing 11, 12 (Dec. 2002), 1442--1449.
[30]
Yang Song, Zhenyu Liu, Satoshi Goto, and Takeshi Ikenaga. 2006. Scalable VLSI architecture for variable block size integer motion estimation in H.264/AVC. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E89-A, 4 (April 2006), 979--987.
[31]
Ke Chen, Zhong Zhou, and Wei Wu. 2012. Clustering based search algorithm for motion estimation. In IEEE International Conference on Multimedia and Expo (ICME’12). IEEE, 622--627.
[32]
Ching Lung Su, Tse Min Chen, and Chih Yang Huang. 2014. Cluster-based motion estimation algorithm with low memory and bandwidth requirements for H.264/AVC scalable extension. IEEE Transactions on Circuits and Systems for Video Technology 24, 6 (June 2014), 1016--1024.
[33]
Ching-Yeh Chen, Shao-Yi Chien, Yu-Wen Huang, Tung-Chien Chen, Tu-Chih Wang, and Liang-Gee Chen. 2006. Analysis and architecture design of variable block-size motion estimation for H.264/AVC. IEEE Transactions on Circuits and Systems I: Regular Papers 53, 3 (March 2006), 578--593.
[34]
JVT. 2017. Joint Video Team Reference Software JM 18.0 {Online}. http://iphome.hhi.de/suehring/tml/download. (2017).
[35]
Tsz-Kwan Lee, Yui-Lam Chan, and Wan-Chi Siu. 2017. Adaptive search range for HEVC motion estimation based on depth information. IEEE Transactions on Circuits and Systems for Video Technology 27, 10 (Oct. 2017), 2216--2230.
[36]
Xiph.org. 2017. Derf’s Test Media Collection {Online}. Retrieved December 5, 2018 from http://media.xiph.org/video/derf/.
[37]
G. Bjontegaard. 2001. Calculation of average PSNR differerces between RD-curves. In 13th VCEG-M33 Meeting. IUT-T, 1--5.

Cited By

View all
  • (2024)Fast Linear Equation Solving Algorithm and its Pipelined Hardware Architecture Design for VVC Affine Motion EstimationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.341442234:11_Part_1(11229-11240)Online publication date: 14-Jun-2024
  • (2024)An efficient hardware architecture of integer motion estimation based on early termination and data reuse for Versatile video codingExpert Systems with Applications10.1016/j.eswa.2023.122706242(122706)Online publication date: May-2024
  • (2023)Hardware-efficient algorithm and architecture design with memory and complexity reduction for semi-global matchingIntegration, the VLSI Journal10.1016/j.vlsi.2023.05.00592:C(99-105)Online publication date: 1-Sep-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems
ACM Transactions on Design Automation of Electronic Systems  Volume 24, Issue 2
March 2019
287 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/3306156
  • Editor:
  • Naehyuck Chang
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 10 January 2019
Accepted: 01 October 2018
Revised: 01 September 2018
Received: 01 June 2018
Published in TODAES Volume 24, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Motion estimation
  2. hardware architecture
  3. hardware efficiency
  4. memory organization
  5. variable block size
  6. video encoding

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)2
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Fast Linear Equation Solving Algorithm and its Pipelined Hardware Architecture Design for VVC Affine Motion EstimationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.341442234:11_Part_1(11229-11240)Online publication date: 14-Jun-2024
  • (2024)An efficient hardware architecture of integer motion estimation based on early termination and data reuse for Versatile video codingExpert Systems with Applications10.1016/j.eswa.2023.122706242(122706)Online publication date: May-2024
  • (2023)Hardware-efficient algorithm and architecture design with memory and complexity reduction for semi-global matchingIntegration, the VLSI Journal10.1016/j.vlsi.2023.05.00592:C(99-105)Online publication date: 1-Sep-2023
  • (2023)A survey on motion estimation and de-hazing algorithms and architecturesDigital Signal Processing10.1016/j.dsp.2023.104130140(104130)Online publication date: Aug-2023
  • (2022)Design and Implementation of Gray-Coded Bit-Plane Based Reconfigurable Motion Estimation Architecture Using Binary Content Addressable Memory for Video EncoderIEEE Transactions on Consumer Electronics10.1109/TCE.2021.313994468:1(85-92)Online publication date: Feb-2022
  • (2022)A Real-Time Low-Power Coding Bit-Rate Control Scheme for High-Efficiency Video Coding in a Multiprocessor System-on-ChipIEEE Systems Journal10.1109/JSYST.2021.306947716:1(264-274)Online publication date: Mar-2022
  • (2021)An efficient field-programmable gate array-based hardware oriented block motion estimation algorithm based on diamond adaptive rood pattern search algorithm for multi-standard video codecTransactions of the Institute of Measurement and Control10.1177/0142331221104303543:16(3672-3685)Online publication date: 27-Sep-2021
  • (2020)A 4K$\times$ 2K@60fps Multifunctional Video Display Processor for High Perceptual Image QualityIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2019.292194367:2(451-463)Online publication date: Feb-2020
  • (2020)Digital image processing systems based on functional-oriented processors with a homogeneous structureJournal of Physics: Conference Series10.1088/1742-6596/1680/1/0120341680(012034)Online publication date: 22-Dec-2020
  • (2020)Design and Implementation of an Efficient Mixed Parallel-Pipeline SAD Architecture for HEVC Motion EstimationAdvances in VLSI, Communication, and Signal Processing10.1007/978-981-15-6840-4_50(605-621)Online publication date: 15-Oct-2020

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media