Abstract
In H.264/AVC, the motion estimation (ME) routine supports variable block size and involves highly parallel sum of absolute difference (SAD) computations. In this study, we introduce a bit serial hybrid-grained processing element (PE) based 2D architecture that has both early termination and intensive data reuse capabilities. PEs operate on most significant bit-first arithmetic for early termination and the 2D architecture enables on-chip data reuse between neighboring PEs in a bit-by-bit pipelined fashion. Hybrid-grained PEs reduce the hardware overhead of conventional adder tree structures used for implementing the variable block size ME. Our design reduces the gate count by 7x compared to its ASIC counterpart, operates at a comparable frequency while sustaining 30 fps and 60 fps; and outperforms bit parallel and bit serial architectures in terms of throughput and performance per gate for various video formats.















Similar content being viewed by others
References
Wiegand, T., Sullivan, G. J., Bjontegaard, G., & Luthra, A. (2003). Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology, 13(7), 560–576.
Rhee, I., et. al. (2000). Quadtree-structured variable-size block-matching motion estimation with minimal error. IEEE Transactions on Circuits and Systems for Video Technology, 10, 42–50.
Li, B. M. H., & Leong, P. H. W. (2008). Serial and parallel FPGA-based variable block size motion estimation processors. Journal of VLSI Signal Processing, 51(1), 77–98.
Su, C.-L., & Jen, C.-W. (2000). Motion estimation using on-line arithmetic. In IEEE international symposium on circuits and systems (Vol. 1).
Olivares, J., Hormigo, J., Villalba, J., Benavides, I., & Zapata, E. L. (2006). SAD computation based on online arithmetic for motion estimation. Microprocessors and Microsystems, 30(5), 250–258.
Marshall, A., et al. (1999). A reconfigurable arithmetic array for multimedia applications. In Proc. ACM/SIGDA FPGA’99, Monterey, 21–23 Feb. 1999.
Ebeling, C., Cronquist, D. C., Franklin, P., & Fisher, C. (1996). RaPiD—a configurable computing architecture for compute-intensive applications. University of Washington Department of Computer Science & Engineering Tech Report, TR-96-11-03.
Verma, R., & Akoglu, A. (2007). A coarse grained reconfigurable architecture for variable size block motion estimation. In IEEE international conference on field-programmable technology 2007 (ICFPT’07) (pp. 81–88). Kitakyushu, Japan.
Chen, C. Y., Chien, S. Y., Huang, Y. W., Chen, T. C., Wang, T. C., & Chen, L. G. (2006). Analysis and architecture design of variable block-size motion estimation for H.264/AVC. IEEE Transactions on Circuits and Systems for Video Technology, 53(2), 578–593.
Chen, T. C., Chen, Y. H., Tsai, S. F., Chien, S. Y., & Chen, L. G. (2007). Fast algorithm and architecture design of low-power integer motion estimation for H.264/AVC. IEEE Transactions on Circuits and Systems for Video Technology, 17(5), 568–577.
Chen, T. C., Chien, S. Y., Huang, Y. W., Tsai, C. H., Chen, C. Y., Chen, T. W., et al. (2006). Analysis and architecture design of an HDTV720p 30frames/s H.264/AVC encoder. IEEE Transactions on Circuits and Systems for Video Technology, 16(6), 673–688.
Chen, T. C., Fang, H. C., Lian, C. J. Tsai, C. H., Huang, Y. W., Chen, T. W., et al. (2006). Algorithm analysis and architecture design for HDTV applications—a look at the H.264/AVC video compressor system. IEEE Transactions on Circuits and Systems for Video Technology, 22(3), 22–31.
Kim, M., Hwang, I., & Chae, S. I. (2005). A fast VLSI architecture for full-search variable block size motion estimation in MPEG-4 AVC/H.264. In Proc. ASP-DAC (Vol. 1, pp. 631–634).
Lappalainen, V., Hailapuro, A., Hamalainen, T. D., & Nokia Res. Center, Tampere (2002). Performance of H.26L video encoder on general-purpose processor. The Journal of VLSI Signal Processing.
Reader, S., & Meng, T. (1999). Performance evaluation of motion estimation algorithms for digital signal processors. Tech. Report, Stanford University.
Kuhn, P. M. (1999). Fast MPEG-4 motion estimation: Processor based and flexible VLSI implementations. Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology, 23, 67–92.
Shen, J. F., et al. (2001). A Novel low-power full-search block-matching motion-estimation design for H.263+. IEEE Transactions on Circuits and Systems for Video Technology, 11(7), 890–897.
de Vos, L., & Schobinger, M. (1995). VLSI architecture for a flexible block matching processor. IEEE Transactions Circuits and Systems for Video Technology, 5, 417–428.
Yap, S. Y., & McCanny, J. V. (2004). A VLSI architecture for variable block size video motion estimation. IEEE Transactions on CAS II, 51(7), 384–389.
Ou, C.-M., Le, C.-F., & Hwang, W.-J. (2005). An efficient VLSI architecture for H.264 variable block size motion estimation. IEEE Transaction on Consumer Electronics, 51(4), 1291–1299.
Yap, S. Y., & McCanny, J. V. (2003). A VLSI architecture for advanced video coding motion estimation. In Proc. IEEE intl. conf. applications-specific systems, arch., processors (pp. 293–301).
Soohoo, A. (2005). FPGA co-processing architectures for video compression. Altera Corporation.
Waingold, E., et al. (1997). Baring it all to software: RAW machines. IEEE Computer, 30(9), 86–93.
Mirsky, E., & DeHon, A. (1996). MATRIX: A reconfigurable computing architecture with configurable instruction distribution and deployable resources. In Proc. IEEE FCCM’96, Napa, CA, USA, 17–19 April 1996.
Yang, K. M., Sun, M. T., & Wu, L. (1989). A family of VLSI designs for the motion compensation block-matching algorithm. IEEE Transactions on Circuits and Systems for Video Technology, 36(10), 1317–1325.
Lai, Y. K., & Chen, L. G. (1998). A data-interlacing architecture with two dimensional data-reuse for full-search block-matching algorithm. IEEE Transactions on Circuits and Systems for Video Technology, 8(2), 124–127.
Yeo, H., & Hu, Y. H. (1995). A novel modular systolic array architecture for full-search block matching motion estimation. IEEE Transactions on Circuits and Systems for Video Technology, 5(5), 407–416.
Ercegovac, M. D., & Lang, T. (1989). On-line arithmetic for DSP applications. In 32nd Midwest symposium on circuits and systems, Urbana.
Avizienis, A. (1961). Signed-digit number representations for fast parallel arithmetic. IRE Transactions on Electronic Computers, EC-10(9), 389–400.
Acknowledgements
The authors would like to thank Gregory Striemer for his contributions to this paper during the analysis of the results.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Song, Y., Akoglu, A. Bit-by-Bit Pipelined and Hybrid-Grained 2D Architecture for Motion Estimation of H.264/AVC. J Sign Process Syst 68, 49–62 (2012). https://doi.org/10.1007/s11265-010-0575-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-010-0575-5