Abstract
Real-time digital video coding became a mandatory feature in current consumer electronic devices due to the popularization of video applications. However, efficiently encoding videos is an extremely processing/energy-demanding task, especially at high resolutions and frame rates. Thus, the limited energy resources and the dynamically varying system status (such as workload, battery level, user settings, etc.) require energy-efficient solutions capable to support run-time energy-quality scalability. In this work, we present an energy-quality scalable SAD Unit hardware architecture for the HEVC intra-frame prediction targeting real-time processing of UHD 8K (7680 × 4320) videos at 60 frames per second. Approximate computing is used to provide energy-quality scalability by employing configurable imprecise operators. The proposed Energy-Quality scalable architecture supports four operation points: precise computing, and 3-bit, 5-bit or 7-bit imprecision. When implemented in a 45-nm technology using Nangate standard cells library and running at 269 MHz, the proposed architecture consumes from 8.42 to 7.38 mJ to process each UHD 8K frame, according to the selected imprecision level. As a drawback, the coding efficiency (measured in BD rate) is reduced from 0.28 to 1.72%. Compared to the related works, this is the only intra-frame prediction SAD unit able to provide energy-quality scalability.
Similar content being viewed by others
References
Cisco Visual Networking Index: Forecast and Trends, 2017–2022. Cisco Systems. San Jose, USA [Online]. https://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/white-paper-c11-741490.html. Accessed 23 Apr 2019
Information Technology.: High efficiency coding and media delivery in heterogeneous environments—part 2: high efficiency video coding, ISO/IEC 23008-2 (2013)
Series H.: Audiovisual and multimedia systems infrastructure of audio-visual services–advanced coding of moving video advanced video coding for generic audiovisual services, recommendation ITU-T H.264 (06/2011), (2011)
Correa, G., Assuncao, P., Agostini, L., Cruz, L.: Performance and computational complexity assessment of high-efficiency video encoders. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1899–1909 (2012). https://doi.org/10.1109/TCSVT.2012.2223411
Alcocer, E., Gutierez, R., Lopez-Granado, O., Malumbres, M.: Design and implementation of an efficient hardware integer motion estimator for an HEVC video encoder. J. Real Time Image Proc. 16(2), 547–557 (2019). https://doi.org/10.1007/s11554-016-0572-4
Lung, C.-Y., Shen, C.-A.: Design and implementation of a highly efficient fractional motion estimation for the HEVC encoder. J. Real Time Image Process. 16, 1–17 (2016). https://doi.org/10.1007/s11554-016-0663-2
Paim, G., Penny, W., Goebel, J., Afonso, V., Susin, A., Porto, M., Zatt, B., Agostini, L.: An efficient sub-sample interpolator hardware for VP9-10 standards. In: IEEE International Conference on Image Processing, pp. 2167–2171. Phoenix, USA (2016). https://doi.org/10.1109/icip.2016.7532742
Liu, C., Shen, W., Ma, T., Fan, Y., Zeng, X.: A highly pipelined VLSI architecture for all modes and block sizes intra prediction in HEVC encoder. In: IEEE International Conference on ASIC, pp. 1–4. Shenzhen, China (2013). https://doi.org/10.1109/asicon.2013.6811849
Zhou, N., Ding, D., Yu, L.: On hardware architecture and processing order of HEVC intra prediction module. In: Picture Coding Symposium, pp. 101–104. San Jose, USA (2013). https://doi.org/10.1109/pcs.2013.6737693
Palomino, D., Sampaio, F., Agostini, L., Bampi, S., Susin, A.: A memory aware and multiplierless VLSI architecture for the complete intra prediction of the HEVC emerging standard. In: IEEE International Conference on Image Processing, pp. 201–204. Lake Buena Vista, USA (2012). https://doi.org/10.1109/icip.2012.6466830
Jridi, M., Alfalou, A., Meher, P.: Efficient approximate core transform and its reconfigurable architectures for HEVC. J. Real Time Image Process. (2018). https://doi.org/10.1007/s11554-018-0768-x
Braatz, L., Agostini, L., Zatt, B., Porto, M.: A multiplierless parallel HEVC quantization hardware for real-time UHD 8K video coding. In: IEEE International Conference on Circuits and Systems, pp. 1–4. Baltimore, USA (2017). https://doi.org/10.1109/iscas.2017.8050704
Goebel, J., Paim, G., Agostini, L., Zatt, B., Porto, M.: An HEVC multi-size DCT hardware with constant throughput and supporting heterogeneous CUs. In: IEEE International Conference on Circuits and Systems, pp. 2202–2205. Montreal, Canada (2016). https://doi.org/10.1109/iscas.2016.7539019
Jo, H., Park, S., Sim, D.: Parallelized deblocking filtering of HEVC decoders based on complexity estimation. J. Real Time Image Proc. 12(2), 369–382 (2016). https://doi.org/10.1007/s11554-015-0556-9
Shen, W., Fan, Y., Bai, Y., Huang, L., Shang, Q., Liu, C., Zeng, X.: A combined deblocking filter and SAO hardware architecture for HEVC. IEEE Trans. Multimed. 18(6), 1022–1033 (2016). https://doi.org/10.1109/TMM.2016.2532606
Rediess, F., Agostini, L., Cristani, C., Dall’Oglio, P., Porto, M.: High throughput hardware design for the adaptive loop filter of the emerging HEVC video coding. In: Symposium on Integrated Circuits and Systems Design, pp. 1–5. Brasília, Brazil (2012). https://doi.org/10.1109/sbcci.2012.6344446
Choi, J.-A., Ho, Y.-S.: High throughput entropy coding in the HEVC standard. J. Signal Process. Syst. 81(1), 59–69 (2015). https://doi.org/10.1007/s11265-014-0900-5
Sun, H., Zhou, L., Xu, H., Sun, T., Wang, Y.: A high-efficiency HEVC entropy decoding hardware architecture. In: International Conference on Advanced Communication Technology (ICACT), pp. 186–190. Seoul, South Korea (2015). https://doi.org/10.1109/icact.2015.7224781
Ramos, F., Goebel, J., Zatt, B., Porto, M., Bampi, S.: Low-power hardware design for the HEVC binary arithmetic encoder targeting 8K videos. In: Symposium on Integrated Circuits and Systems Design, pp. 1–6. Belo Horizonte, Brazil (2016). https://doi.org/10.1109/sbcci.2016.7724044
Afonso, V., Maich, H., Agostini, L., Franco, D.: Low cost and high throughput FME interpolation for the HEVC emerging video coding standard. In: Latin American Symposium on Circuits and Systems, pp. 1–4. Cusco, Peru (2013). https://doi.org/10.1109/lascas.2013.6519017
He, G., et al.: High-throughput power-efficient VLSI architecture of fractional motion estimation for ultra-HD HEVC video encoding. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 23(12), 3138–3142 (2015). https://doi.org/10.1109/tvlsi.2014.2386897
He, Z., Tsui, C., Chan, K., Liou, M.: Low-power VLSI design for motion estimation using adaptive pixel truncation. IEEE Trans. Circuits Syst. Video Technol. 10(5), 669–678 (2000). https://doi.org/10.1109/76.856445
Yang, Y., Zheng, J.: Edge-guided depth map resampling for HEVC 3D video coding. In: International Conference on Virtual Reality and Visualization, pp. 132–137. Xi’an, China (2013). https://doi.org/10.1109/icvrv.2013.29
Masera, M., Martina, M., Masera, G.: Adaptive approximated DCT architectures for HEVC. IEEE Trans. Circuits Syst. Video Technol. 27(12), 2714–2725 (2017). https://doi.org/10.1109/tcsvt.2016.2595320
El-Harouni, W., et al.: Embracing approximate computing for energy-efficient motion estimation in high efficiency video coding. In: Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 1384–1389. Lausanne, Switzerland (2017). https://doi.org/10.23919/date.2017.7927209
Porto, R., Agostini, L., Zatt, B., Porto, M., Roma, N., Sousa, L.: Energy-efficient motion estimation with approximate arithmetic. In: International Workshop on Multimedia Signal Processing, pp. 1–6. Luton, UK (2017). https://doi.org/10.1109/mmsp.2017.8122248
Bjontegaard, G.: Calculation of average PSNR differences between RD-curves. In: Document VCEG-M33. ITU—Telecommunications Standardization Sector—STUDY GROUP 16 Question 6—Video Coding Experts Group (VCEG) (2001). http://wftp3.itu.int/av-arch/video-site/0104_Aus/VCEG-M33.doc. Accessed 29 Mar 2019
Raha, A., Jayakumar, H., Raghunathan, V.: A power efficient video encoder using reconfigurable approximate arithmetic units. In: International Conference on VLSI Design and 2014 13th International Conference on Embedded Systems, pp. 324–329. Mumbai, India (2014). https://doi.org/10.1109/vlsid.2014.62
Jridi, M., Meher, P.: Scalable approximate DCT architectures for efficient HEVC-compliant video coding. IEEE Trans. Circuits Syst. Video Technol. 27(8), 1815–1825 (2017). https://doi.org/10.1109/tcsvt.2016.2556578
Lainema, J., Bossen, F., Han, W., Min, J., Ugur, K.: Intra coding of the HEVC standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1792–1801 (2012). https://doi.org/10.1109/tcsvt.2012.2221525
Corrêa, M., Zatt, B., Porto, M., Agostini, L.: High-throughput HEVC intrapicture prediction hardware design targeting UHD 8K videos. In: IEEE International Symposium on Circuits and Systems, pp. 1–4. Baltimore, USA (2017). https://doi.org/10.1109/iscas.2017.8050702
Wien, M.: High Efficiency Video Coding: Coding Tools and Specification, pp. 63–65. Springer, New York (2014)
Bossen, F.: Common test conditions and software reference configurations. In: “Document JCTVC-L1100 of JCT-VC”, Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, Jan. 23 (2013). http://phenix.it-sudparis.eu/jct/doc_end_user/current_document.php?id=7281. Accessed 29 Mar 2019
“HEVC Reference Software”. Fraunhofer Heinrich Hertz Institute. Berlin, Germany [Online]. https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/ Accessed 23 Apr 2019
Sullivan, G., Ohm, J., Han, W., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012). https://doi.org/10.1109/TCSVT.2012.2221191
Zhou, J., Zhou, D., Sun, H., Goto, S.: VLSI architecture of HEVC intra prediction for 8K UHDTV applications. In: IEEE International Conference on Image Processing, pp. 1273–1277. Paris, France (2014). https://doi.org/10.1109/icip.2014.7025254
Piao, Y., Min, J., Chen, J.: Encoder improvement of unified intra prediction. In: “Document JCTVC-C207”, Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, Oct. (2010). https://phenix.int-evry.fr/jct/doc_end_user/documents/3_Guangzhou/wg11/JCTVC-C207-m18245-v2-JCTVC-C207.zip. Accessed 29 Mar 2019
Kahng, A., Kang, S.: Accuracy-configurable adder for approximate arithmetic designs. In: ACM/EDAC/IEEE Annual Design Automation Conference, pp. 820–825. San Francisco, USA (2012). https://doi.org/10.1145/2228360.2228509
Camus, V., Schlachter, J., Enz, C.: A low-power carry cut-back approximate adder with fixed-point implementation and floating-point precision. In: ACM/EDAC/IEEE Design Automation Conference, pp. 1–6. Austin, USA (2016). https://doi.org/10.1145/2897937.2897964
Zhu, N., Goh, W., Zhang, W., Yeo, K., Kong, Z.: Design of low-power high-speed truncation-error-tolerant adder and its application in digital signal processing. IEEE Trans. Very Large Scale Int. Syst. 18(8), 1225–1229 (2010). https://doi.org/10.1109/tvlsi.2009.2020591
Zhu, N., Goh, W., Wang, G., Yeo, K.: Enhanced low-power high-speed adder for error-tolerant application. In: IEEE International SOC Design Conference, pp. 323–327. Incheon, South Korea (2010). https://doi.org/10.1109/socdc.2010.5682905
Shafique, M., Ahmad, W., Hafiz, R., Henkel, J.: A low latency generic accuracy configurable adder. In: ACM/EDAC/IEEE Design Automation Conference, pp. 1–6. San Francisco, USA (2015). https://doi.org/10.1145/2744769.2744778
Mahdiani, H.R., Ahmadi, A., Fakhraie, S.M., Lucas, C.: Bio-inspired imprecise computational blocks for efficient VLSI implementation of soft-computing applications. IEEE Trans. Circuits Syst. I Reg. Pap. 57(4), 850–862 (2010). https://doi.org/10.1109/tcsi.2009.2027626
Desoete, B., De Vos Alexis, A.: A reversible carry-look-ahead adder using control gates. Integr. VLSI J. 33(1), 89–104 (2002)
Banerjee, N., et al.: Novel low-overhead operand isolation techniques for low-power datapath synthesis. In: Computer Design: VLSI in Computers and Processors, 2005. ICCD 2005. Proceedings. 2005 IEEE International Conference on IEEE (2005). https://doi.org/10.1109/iccd.2005.80
NanGate FreePDK45 Open Cell Library, Nangate [Online]. http://www.nangate.com/?page_id=2325. Accessed 29 Mar 2019
Zhou, D., et al.: 14.7 A 4G pixel/s 8/10b H.265/HEVC video decoder chip for 8K ultra HD applications. In: 2016 IEEE International Solid-State Circuits Conference (ISSCC), IEEE (2016). https://doi.org/10.1109/ISSCC.2016.7418009
Chuang, T.-D., et al.: A 59.5 mW scalable/multi-view video decoder chip for quad/3D full HDTV and video streaming applications. In: 2010 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), IEEE (2010). https://doi.org/10.1109/ISSCC.2010.5433908
Huang, C.-T., et al.: A 249 M pixel/s HEVC video-decoder chip for Quad Full HD applications. In: 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), IEEE (2013). https://doi.org/10.1109/ISSCC.2013.6487682
Tsai, C.-H., et al.: A 446.6 K-gates 0.55–1.2 V H. 265/HEVC decoder for next generation video applications. In: 2013 IEEE Asian Solid-State Circuits Conference (A-SSCC), IEEE (2013). https://doi.org/10.1109/ASSCC.2013.6691043
Ju, C.-C., et al.: A 0.2 nJ/pixel 4K 60 fps Main-10 HEVC decoder with multi-format capabilities for UHD-TV applications. In: ESSCIRC 2014-40th European Solid State Circuits Conference (ESSCIRC), IEEE (2014). https://doi.org/10.1109/esscirc.2014.6942055
Fang, H., Chen, H., Chang, T.: Fast intra prediction algorithm and design for high efficiency video coding. In: IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1770–1773. Montreal, Canada (2016). https://doi.org/10.1109/iscas.2016.7538911
Lu, W., Yu, N., Nan, J., Wang, D.: A hardware structure of HEVC intra prediction. In: 2015 2nd International Conference on Information Science and Control Engineering, pp. 555–559. Shanghai, China (2015). https://doi.org/10.1109/icisce.2015.129
Liu, Z., Wang, D., Zhu, H., Huang, X.: 41.7 BN-pixels/s reconfigurable intra prediction architecture for HEVC 2560 × 1600 encoder. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2634–2638. Vancouver, Canada (2013). https://doi.org/10.1109/icassp.2013.6638133
Khan, M., Shafique, M., Grellert, M., Henkel, J.: Hardware-software collaborative complexity reduction scheme for the emerging HEVC intra encoder. In: Proceedings of the conference on design, automation and test in Europe, pp. 125–128. Grenoble, France (2013). https://doi.org/10.7873/date.2013.039
Li, F., Shi, G., Wu, F.: An efficient VLSI architecture for 4 × 4 intra prediction in the High Efficiency Video Coding (HEVC) standard. In: 2011 18th IEEE International Conference on Image Processing, pp. 373–376. Brussels, Belgium (2011). https://doi.org/10.1109/icip.2011.6116526
Vanne, J., et al.: A high-performance sum of absolute difference implementation for motion estimation. IEEE Trans. Circuits Syst. Video Technol. 16(7), 876–883 (2006). https://doi.org/10.1109/TCSVT.2006.877150
Yufei, L., Xiubo, F., Qin, W.: A high-performance low cost SAD architecture for video coding. IEEE Trans. Consum. Electron. 53(2), 535–541 (2007). https://doi.org/10.1109/TCE.2007.381726
Liu, Z., et al.: Hardware-efficient propagate partial sad architecture for variable block size motion estimation in H. 264/AVC. In: Proceedings of the 17th ACM Great Lakes symposium on VLSI, pp. 160–163. ACM (2007). https://doi.org/10.1145/1228784.1228826
Acknowledgements
This work is partly financed by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES) Finance Code 001, by FCT projects PTDC/EEI-HAC/30485/2017 and UID/CEC/50021/2019, and also by CNPq and FAPERGS Brazilian research support agencies.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Porto, R., Correa, M., Goebel, J. et al. UHD 8K energy-quality scalable HEVC intra-prediction SAD unit hardware using optimized and configurable imprecise adders. J Real-Time Image Proc 17, 1685–1701 (2020). https://doi.org/10.1007/s11554-019-00934-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-019-00934-2