Abstract
A novel full search motion estimation co-processor architecture design is presented in this paper. The proposed architecture efficiently reuses search area data to minimize memory I/O while fully utilizing the hardware resources. A smart processing element (PE) and an efficient simple internal memory are the main components of the proposed co-processor. An efficient algorithm is used for loading both the current block and the search area inside the PE array. The search area data flow horizontally while the current block data are stationary. As a result, the speed of the co-processor is improved in terms of the throughput and the operating frequency compared to the state-of-the-art techniques. A smart local memory and PE design guarantees a simple and a regular data flow. The design of the local memory is implemented using only registers and a simple counter. This simplifies the design by avoiding the use of complicated addressing to write or read into/from the local memory. The proposed architecture is implemented using both the FPGA and the ASIC flow design tools. For a search range of 32 × 32 and block size of 16 × 16, the architecture can perform motion estimation for 30 fps of HDTV video at 350 MHz and easily outperforms many fast full search architectures.
Similar content being viewed by others
References
Wiegand, T., Sullivan, G.J., Bjontegaard, G., Luthra, A.: Overview of the H.264/AVC video coding standard. IEEE Trans. Circ. Syst. Video Technol. 13, 560–576 (2003)
Eckart, S., Fogg, C.: ISO/IEC MPEG-2 software video codec. Proc. SPIE 2419, 100–118 (1995)
ITU-T Rec. H.263: Video coding for low bit rate communication (1998)
ISO/IEC 14496-2 (MPEG-4 Video): Information technology—Coding of audio visual objects (1999)
ITU-T Rec. H.264 and ISO/IEC 14496-10 AVC: Advanced video coding for generic audiovisual services (2003)
Woong, I.L.C., Byeungwoo, J., Jechang, J.: Fast motion estimation with modified diamond search for variable motion block sizes. In: Proceedings. 2003 International Conference on Image Processing, 2003. ICIP 2003, vol. 3, pp. II-371-4 (2003)
Goel, S., Ismail, Y., Bayoumi, M.A.: Adaptive search window size algorithm for fast motion estimation in H.264/AVC standard. In: 48th Midwest Symposium on Circuits and Systems, 2005, vol. 2, pp. 1557–1560 (2005)
Goel, S., Ismail, Y., Devulapalli, P., McNeely, J., Bayoumi, M.A.: An efficient data reuse motion estimation engine. In: IEEE Workshop on Signal Processing Systems Design and Implementation, 2006. SIPS ‘06, pp. 383–386 (2006)
Ahmed, A., Shahid, M.U., Martina, M., Magli, E., Masera, G.: VLSI architecture for low-complexity motion estimation in H.264 multiview video coding. In: 2013 Euromicro Conference on Digital System Design (DSD), pp. 288–292 (2013)
Huong, H., Klepko, R., Nam, N., Demin, W.: A high performance hardware architecture for multi-frame hierarchical motion estimation. IEEE Trans. Consum. Electron. 57, 794–801 (2011)
Pastuszak, G., Jakubowski, M.: Adaptive computationally scalable motion estimation for the hardware H.264/AVC encoder. IEEE Trans. Circ. Syst. Video Technol. 23, 802–812 (2013)
Koga, T., Iinuma, K., Iijima, A.: Motion-compensated interframe coding for video conferencing. In: Proceedings of NTC81, pp. C9.6–9.6.5, New Orleans, LA (1981)
Li, R., Zeng, B., Liou, M.L.: A new three-step search algorithm for block motion estimation. IEEE Trans. Circ. Syst. Video Technol. 4, 438–442 (1994)
Po, L.M., Ma, W.C.: A novel four-step search algorithm for fast block motion estimation. IEEE Trans. Circ. Syst. Video Technol. 6, 313–317 (1996)
Zhu, S., Ma, K–.K.: A new diamond search algorithm for fast block matching motion estimation. IEEE Trans. Image Process. 9, 287–290 (2000)
Cheung, C.H., Po, L.M.: A novel cross-diamond search algorithm for fast block motion estimation. IEEE Trans. Circ. Syst. Video Technol. 12, 1168–1177 (2002)
Sahani, S.K., Adhikari, G., Das, B.K.: Fast template matching based on multilevel successive elimination algorithm. In: 2012 International Conference on Signal Processing and Communications (SPCOM), pp. 1–5 (2012)
Tae Gyoung, A., Yong Ho, M., Jae-Ho, K.: Fast full-search motion estimation based on multilevel successive elimination algorithm. IEEE Trans. Circ. Syst. Video Technol. 14, 1265–1269 (2004)
Ce, Z., Wei-Song, Q., Ser, W.: Predictive fine granularity successive elimination for fast optimal block-matching motion estimation. Image Process. IEEE Trans. 14, 213–221 (2005)
Ismail, Y., McNeely, J.B., Shaaban, M., Mahmoud, H., Bayoumi, M.A.: Fast motion estimation system using dynamic models for H.264/AVC video coding. IEEE Trans. Circ. Syst. Video Technol. 22, 28–42 (2012)
Luheng, J., Au, O.C., Chi-ying, T., Yongfang, S., Rui, M., Hong, Z.: A diamond search window based adaptive search range algorithm. In: 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 1–4 (2013)
Komarek, T., Pirsch, P.: Array architectures for block matching algorithms. IEEE Trans. Circ. Syst. 36, 1301–1308 (1989)
Pirsch, P., Demassieux, N., Gehrke, W.: VLSI architectures for video compression—a survey. Proc. IEEE 83, 220–246 (1995)
Minmin, S., Li, L., Jiawen, W., Hongbing, P., Wei, L.: A horizontal data reuse approach for fractional motion estimation in H.264/AVC encoder. In: 2012 International Conference on Computer Science and Information Processing (CSIP), pp. 821–825 (2012)
Kaijin, W., Rongwei, Z., Shanghang, Z., Huizhu, J., Don, X., Wen, G.: An optimized hardware video encoder for AVS with Level C + data reuse scheme for motion estimation. In: 2012 IEEE International Conference on Multimedia and Expo (ICME), pp. 1055–1060 (2012)
Tuan, J.-C., Chang, T.-S., Jen, C.-W.: On the data reuse and memory bandwidth analysis for full-search block-matching VLSI architecture. Trans. Circ. Syst. Video Technol. 12, 61–72 (2002)
Goel, S., Ismail, Y., Bayoumi, M.: High-speed motion estimation architecture for real-time video transmission. Computer J. 55, 35–46 (2011)
Minho, K., Ingu, H., Soo-Ik, C.: A fast VLSI architecture for full-search variable block size motion estimation in MPEG-4 AVC/H.264. In: Proceedings of the ASP-DAC 2005. Asia and South Pacific Design Automation Conference, vol. 1, pp. 631–634 (2005)
Vos, L.D., Schobinger, M.: VLSI architecture for a flexible block matching processor. IEEE Trans. Circ. Syst. Video Technol. 5(5), 417–428 (1995)
Lai, Y.-K., Chen, L.-F.: A high data-reuse architecture with double-slice processing for full-search block-matching algorithm. In: Proceedings of ISCAS 2003, vol. 2, pp. II-716–II-719 (2003)
Swee Yeow,Y., McCanny, J.V.: A VLSI architecture for advanced video coding motion estimation. In: Proceedings. IEEE International Conference on Application-Specific Systems, Architectures, and Processors, pp. 293–301 (2003)
Borkar, S.: Design challenges of technology scaling. Micro IEEE, pp. 23–29 (1999)
Joint Video Team: Reference Software JM12.4. http://iphome.hhi.de/suehring/tml/download/ (2014)
Azadfar, M.M.: Implementation of a optimized systolic array architecture for FSBMA using FPGA for real-time applications. IJCSNS Int. J. Comput. Sci. Netw. Sec. 8(3), 46–51 (2008)
Wang, B.-M., Yen, J.-C., Chang, S.: Zero waiting-cycle hierarchical block matching algorithm and its array architecture. IEEE Trans. Circ. Syst. Video Technol. 4, 18–28 (1994)
Jehng, Y.-S., Chen, L.-G., Chiueh, T.-D.: An efficient and simple VLSI tree architecture for motion estimation algorithms. IEEE Trans. Signal Process. 41, 889–900 (1993)
Nunez-Yanez, J.L., Nabina, A., Hung, E., Vafiadis, G.: Cogeneration of fast motion estimation processors and algorithms for advanced video coding. IEEE Trans Very Large Scale Integr (VLSI) Syst 20(Issue3), 437–448 (2012)
Zhang, J., Nezan, J.-F., Cousin, J.-G.: Implementation of motion estimation based on heterogeneous parallel computing system with open CL. In: IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), High Performance Computing and Communication, pp. 41–45 (2012)
González, D., Botella, G., García, C., Prieto, M., Tirado, F.: Acceleration of blockmatching algorithms using a custom instruction-based paradigm on a Nios II microprocessor. EURASIP J. Adv. Signal Process. 2013, 118 (2013). doi:10.1186/1687-6180-2013-118
González, D., Botella, G., Meyer-Baese, U., Garcí, C., Sanz, C., Prieto-Matías, M., Tirado, F.: A low cost matching motion estimation sensor based on the NIOS II microprocessor. Sens. Basel 12, 13126–13149 (2012). doi:10.3390/s121013126
Monteiro, E., Maule, M., Sampaio, F., Diniz, C., Zatt, B., Bampi, S.: Real-time block matching motion estimation onto GPGPU. In: 19th IEEE International Conference on Image Processing (ICIP), pp. 1693–1696 (2012)
Acknowledgments
The authors acknowledge the support of the Deanship of Scientific Research—University of Bahrain—Bahrain for their financial support to finalize this work.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ismail, Y., El-Medany, W., Al-Junaid, H. et al. High performance architecture for real-time HDTV broadcasting. J Real-Time Image Proc 11, 633–644 (2016). https://doi.org/10.1007/s11554-014-0430-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-014-0430-1