UHD 8K energy-quality scalable HEVC intra-prediction SAD unit hardware using optimized and configurable imprecise adders

Porto, Roger; Correa, Marcel; Goebel, Jones; Zatt, Bruno; Roma, Nuno; Agostini, Luciano; Porto, Marcelo

doi:10.1007/s11554-019-00934-2

UHD 8K energy-quality scalable HEVC intra-prediction SAD unit hardware using optimized and configurable imprecise adders

Original Research Paper
Published: 11 December 2019

Volume 17, pages 1685–1701, (2020)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Roger Porto ORCID: orcid.org/0000-0001-7367-1575^1,2,
Marcel Correa^1,2,
Jones Goebel¹,
Bruno Zatt¹,
Nuno Roma^3,4,
Luciano Agostini¹ &
…
Marcelo Porto¹

286 Accesses
3 Citations
Explore all metrics

Abstract

Real-time digital video coding became a mandatory feature in current consumer electronic devices due to the popularization of video applications. However, efficiently encoding videos is an extremely processing/energy-demanding task, especially at high resolutions and frame rates. Thus, the limited energy resources and the dynamically varying system status (such as workload, battery level, user settings, etc.) require energy-efficient solutions capable to support run-time energy-quality scalability. In this work, we present an energy-quality scalable SAD Unit hardware architecture for the HEVC intra-frame prediction targeting real-time processing of UHD 8K (7680 × 4320) videos at 60 frames per second. Approximate computing is used to provide energy-quality scalability by employing configurable imprecise operators. The proposed Energy-Quality scalable architecture supports four operation points: precise computing, and 3-bit, 5-bit or 7-bit imprecision. When implemented in a 45-nm technology using Nangate standard cells library and running at 269 MHz, the proposed architecture consumes from 8.42 to 7.38 mJ to process each UHD 8K frame, according to the selected imprecision level. As a drawback, the coding efficiency (measured in BD rate) is reduced from 0.28 to 1.72%. Compared to the related works, this is the only intra-frame prediction SAD unit able to provide energy-quality scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quality-power configurable flexible coding order hardware design for real-time 3D-HEVC intra-frame prediction

Article 01 August 2022

Murilo R. Perleberg, Vladimir Afonso, … Marcelo Porto

Energy-aware scheme for the 3D-HEVC depth maps prediction

Article 30 April 2016

Mário Saldanha, Gustavo Sanchez, … Luciano Agostini

Fast algorithms and VLSI architecture design for HEVC intra-mode decision

Article 16 December 2015

Xiaofeng Huang, Huizhu Jia, … Wen Gao

References

Cisco Visual Networking Index: Forecast and Trends, 2017–2022. Cisco Systems. San Jose, USA [Online]. https://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/white-paper-c11-741490.html. Accessed 23 Apr 2019
Information Technology.: High efficiency coding and media delivery in heterogeneous environments—part 2: high efficiency video coding, ISO/IEC 23008-2 (2013)
Series H.: Audiovisual and multimedia systems infrastructure of audio-visual services–advanced coding of moving video advanced video coding for generic audiovisual services, recommendation ITU-T H.264 (06/2011), (2011)
Correa, G., Assuncao, P., Agostini, L., Cruz, L.: Performance and computational complexity assessment of high-efficiency video encoders. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1899–1909 (2012). https://doi.org/10.1109/TCSVT.2012.2223411
Article Google Scholar
Alcocer, E., Gutierez, R., Lopez-Granado, O., Malumbres, M.: Design and implementation of an efficient hardware integer motion estimator for an HEVC video encoder. J. Real Time Image Proc. 16(2), 547–557 (2019). https://doi.org/10.1007/s11554-016-0572-4
Article Google Scholar
Lung, C.-Y., Shen, C.-A.: Design and implementation of a highly efficient fractional motion estimation for the HEVC encoder. J. Real Time Image Process. 16, 1–17 (2016). https://doi.org/10.1007/s11554-016-0663-2
Article Google Scholar
Paim, G., Penny, W., Goebel, J., Afonso, V., Susin, A., Porto, M., Zatt, B., Agostini, L.: An efficient sub-sample interpolator hardware for VP9-10 standards. In: IEEE International Conference on Image Processing, pp. 2167–2171. Phoenix, USA (2016). https://doi.org/10.1109/icip.2016.7532742
Liu, C., Shen, W., Ma, T., Fan, Y., Zeng, X.: A highly pipelined VLSI architecture for all modes and block sizes intra prediction in HEVC encoder. In: IEEE International Conference on ASIC, pp. 1–4. Shenzhen, China (2013). https://doi.org/10.1109/asicon.2013.6811849
Zhou, N., Ding, D., Yu, L.: On hardware architecture and processing order of HEVC intra prediction module. In: Picture Coding Symposium, pp. 101–104. San Jose, USA (2013). https://doi.org/10.1109/pcs.2013.6737693
Palomino, D., Sampaio, F., Agostini, L., Bampi, S., Susin, A.: A memory aware and multiplierless VLSI architecture for the complete intra prediction of the HEVC emerging standard. In: IEEE International Conference on Image Processing, pp. 201–204. Lake Buena Vista, USA (2012). https://doi.org/10.1109/icip.2012.6466830
Jridi, M., Alfalou, A., Meher, P.: Efficient approximate core transform and its reconfigurable architectures for HEVC. J. Real Time Image Process. (2018). https://doi.org/10.1007/s11554-018-0768-x
Article Google Scholar
Braatz, L., Agostini, L., Zatt, B., Porto, M.: A multiplierless parallel HEVC quantization hardware for real-time UHD 8K video coding. In: IEEE International Conference on Circuits and Systems, pp. 1–4. Baltimore, USA (2017). https://doi.org/10.1109/iscas.2017.8050704
Goebel, J., Paim, G., Agostini, L., Zatt, B., Porto, M.: An HEVC multi-size DCT hardware with constant throughput and supporting heterogeneous CUs. In: IEEE International Conference on Circuits and Systems, pp. 2202–2205. Montreal, Canada (2016). https://doi.org/10.1109/iscas.2016.7539019
Jo, H., Park, S., Sim, D.: Parallelized deblocking filtering of HEVC decoders based on complexity estimation. J. Real Time Image Proc. 12(2), 369–382 (2016). https://doi.org/10.1007/s11554-015-0556-9
Article Google Scholar
Shen, W., Fan, Y., Bai, Y., Huang, L., Shang, Q., Liu, C., Zeng, X.: A combined deblocking filter and SAO hardware architecture for HEVC. IEEE Trans. Multimed. 18(6), 1022–1033 (2016). https://doi.org/10.1109/TMM.2016.2532606
Article Google Scholar
Rediess, F., Agostini, L., Cristani, C., Dall’Oglio, P., Porto, M.: High throughput hardware design for the adaptive loop filter of the emerging HEVC video coding. In: Symposium on Integrated Circuits and Systems Design, pp. 1–5. Brasília, Brazil (2012). https://doi.org/10.1109/sbcci.2012.6344446
Choi, J.-A., Ho, Y.-S.: High throughput entropy coding in the HEVC standard. J. Signal Process. Syst. 81(1), 59–69 (2015). https://doi.org/10.1007/s11265-014-0900-5
Article Google Scholar
Sun, H., Zhou, L., Xu, H., Sun, T., Wang, Y.: A high-efficiency HEVC entropy decoding hardware architecture. In: International Conference on Advanced Communication Technology (ICACT), pp. 186–190. Seoul, South Korea (2015). https://doi.org/10.1109/icact.2015.7224781
Ramos, F., Goebel, J., Zatt, B., Porto, M., Bampi, S.: Low-power hardware design for the HEVC binary arithmetic encoder targeting 8K videos. In: Symposium on Integrated Circuits and Systems Design, pp. 1–6. Belo Horizonte, Brazil (2016). https://doi.org/10.1109/sbcci.2016.7724044
Afonso, V., Maich, H., Agostini, L., Franco, D.: Low cost and high throughput FME interpolation for the HEVC emerging video coding standard. In: Latin American Symposium on Circuits and Systems, pp. 1–4. Cusco, Peru (2013). https://doi.org/10.1109/lascas.2013.6519017
He, G., et al.: High-throughput power-efficient VLSI architecture of fractional motion estimation for ultra-HD HEVC video encoding. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 23(12), 3138–3142 (2015). https://doi.org/10.1109/tvlsi.2014.2386897
Article Google Scholar
He, Z., Tsui, C., Chan, K., Liou, M.: Low-power VLSI design for motion estimation using adaptive pixel truncation. IEEE Trans. Circuits Syst. Video Technol. 10(5), 669–678 (2000). https://doi.org/10.1109/76.856445
Article Google Scholar
Yang, Y., Zheng, J.: Edge-guided depth map resampling for HEVC 3D video coding. In: International Conference on Virtual Reality and Visualization, pp. 132–137. Xi’an, China (2013). https://doi.org/10.1109/icvrv.2013.29
Masera, M., Martina, M., Masera, G.: Adaptive approximated DCT architectures for HEVC. IEEE Trans. Circuits Syst. Video Technol. 27(12), 2714–2725 (2017). https://doi.org/10.1109/tcsvt.2016.2595320
Article Google Scholar
El-Harouni, W., et al.: Embracing approximate computing for energy-efficient motion estimation in high efficiency video coding. In: Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 1384–1389. Lausanne, Switzerland (2017). https://doi.org/10.23919/date.2017.7927209
Porto, R., Agostini, L., Zatt, B., Porto, M., Roma, N., Sousa, L.: Energy-efficient motion estimation with approximate arithmetic. In: International Workshop on Multimedia Signal Processing, pp. 1–6. Luton, UK (2017). https://doi.org/10.1109/mmsp.2017.8122248
Bjontegaard, G.: Calculation of average PSNR differences between RD-curves. In: Document VCEG-M33. ITU—Telecommunications Standardization Sector—STUDY GROUP 16 Question 6—Video Coding Experts Group (VCEG) (2001). http://wftp3.itu.int/av-arch/video-site/0104_Aus/VCEG-M33.doc. Accessed 29 Mar 2019
Raha, A., Jayakumar, H., Raghunathan, V.: A power efficient video encoder using reconfigurable approximate arithmetic units. In: International Conference on VLSI Design and 2014 13th International Conference on Embedded Systems, pp. 324–329. Mumbai, India (2014). https://doi.org/10.1109/vlsid.2014.62
Jridi, M., Meher, P.: Scalable approximate DCT architectures for efficient HEVC-compliant video coding. IEEE Trans. Circuits Syst. Video Technol. 27(8), 1815–1825 (2017). https://doi.org/10.1109/tcsvt.2016.2556578
Article Google Scholar
Lainema, J., Bossen, F., Han, W., Min, J., Ugur, K.: Intra coding of the HEVC standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1792–1801 (2012). https://doi.org/10.1109/tcsvt.2012.2221525
Article Google Scholar
Corrêa, M., Zatt, B., Porto, M., Agostini, L.: High-throughput HEVC intrapicture prediction hardware design targeting UHD 8K videos. In: IEEE International Symposium on Circuits and Systems, pp. 1–4. Baltimore, USA (2017). https://doi.org/10.1109/iscas.2017.8050702
Wien, M.: High Efficiency Video Coding: Coding Tools and Specification, pp. 63–65. Springer, New York (2014)
Google Scholar
Bossen, F.: Common test conditions and software reference configurations. In: “Document JCTVC-L1100 of JCT-VC”, Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, Jan. 23 (2013). http://phenix.it-sudparis.eu/jct/doc_end_user/current_document.php?id=7281. Accessed 29 Mar 2019
“HEVC Reference Software”. Fraunhofer Heinrich Hertz Institute. Berlin, Germany [Online]. https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/ Accessed 23 Apr 2019
Sullivan, G., Ohm, J., Han, W., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012). https://doi.org/10.1109/TCSVT.2012.2221191
Article Google Scholar
Zhou, J., Zhou, D., Sun, H., Goto, S.: VLSI architecture of HEVC intra prediction for 8K UHDTV applications. In: IEEE International Conference on Image Processing, pp. 1273–1277. Paris, France (2014). https://doi.org/10.1109/icip.2014.7025254
Piao, Y., Min, J., Chen, J.: Encoder improvement of unified intra prediction. In: “Document JCTVC-C207”, Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, Oct. (2010). https://phenix.int-evry.fr/jct/doc_end_user/documents/3_Guangzhou/wg11/JCTVC-C207-m18245-v2-JCTVC-C207.zip. Accessed 29 Mar 2019
Kahng, A., Kang, S.: Accuracy-configurable adder for approximate arithmetic designs. In: ACM/EDAC/IEEE Annual Design Automation Conference, pp. 820–825. San Francisco, USA (2012). https://doi.org/10.1145/2228360.2228509
Camus, V., Schlachter, J., Enz, C.: A low-power carry cut-back approximate adder with fixed-point implementation and floating-point precision. In: ACM/EDAC/IEEE Design Automation Conference, pp. 1–6. Austin, USA (2016). https://doi.org/10.1145/2897937.2897964
Zhu, N., Goh, W., Zhang, W., Yeo, K., Kong, Z.: Design of low-power high-speed truncation-error-tolerant adder and its application in digital signal processing. IEEE Trans. Very Large Scale Int. Syst. 18(8), 1225–1229 (2010). https://doi.org/10.1109/tvlsi.2009.2020591
Article Google Scholar
Zhu, N., Goh, W., Wang, G., Yeo, K.: Enhanced low-power high-speed adder for error-tolerant application. In: IEEE International SOC Design Conference, pp. 323–327. Incheon, South Korea (2010). https://doi.org/10.1109/socdc.2010.5682905
Shafique, M., Ahmad, W., Hafiz, R., Henkel, J.: A low latency generic accuracy configurable adder. In: ACM/EDAC/IEEE Design Automation Conference, pp. 1–6. San Francisco, USA (2015). https://doi.org/10.1145/2744769.2744778
Mahdiani, H.R., Ahmadi, A., Fakhraie, S.M., Lucas, C.: Bio-inspired imprecise computational blocks for efficient VLSI implementation of soft-computing applications. IEEE Trans. Circuits Syst. I Reg. Pap. 57(4), 850–862 (2010). https://doi.org/10.1109/tcsi.2009.2027626
Article MathSciNet Google Scholar
Desoete, B., De Vos Alexis, A.: A reversible carry-look-ahead adder using control gates. Integr. VLSI J. 33(1), 89–104 (2002)
Article Google Scholar
Banerjee, N., et al.: Novel low-overhead operand isolation techniques for low-power datapath synthesis. In: Computer Design: VLSI in Computers and Processors, 2005. ICCD 2005. Proceedings. 2005 IEEE International Conference on IEEE (2005). https://doi.org/10.1109/iccd.2005.80
NanGate FreePDK45 Open Cell Library, Nangate [Online]. http://www.nangate.com/?page_id=2325. Accessed 29 Mar 2019
Zhou, D., et al.: 14.7 A 4G pixel/s 8/10b H.265/HEVC video decoder chip for 8K ultra HD applications. In: 2016 IEEE International Solid-State Circuits Conference (ISSCC), IEEE (2016). https://doi.org/10.1109/ISSCC.2016.7418009
Chuang, T.-D., et al.: A 59.5 mW scalable/multi-view video decoder chip for quad/3D full HDTV and video streaming applications. In: 2010 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), IEEE (2010). https://doi.org/10.1109/ISSCC.2010.5433908
Huang, C.-T., et al.: A 249 M pixel/s HEVC video-decoder chip for Quad Full HD applications. In: 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), IEEE (2013). https://doi.org/10.1109/ISSCC.2013.6487682
Tsai, C.-H., et al.: A 446.6 K-gates 0.55–1.2 V H. 265/HEVC decoder for next generation video applications. In: 2013 IEEE Asian Solid-State Circuits Conference (A-SSCC), IEEE (2013). https://doi.org/10.1109/ASSCC.2013.6691043
Ju, C.-C., et al.: A 0.2 nJ/pixel 4K 60 fps Main-10 HEVC decoder with multi-format capabilities for UHD-TV applications. In: ESSCIRC 2014-40th European Solid State Circuits Conference (ESSCIRC), IEEE (2014). https://doi.org/10.1109/esscirc.2014.6942055
Fang, H., Chen, H., Chang, T.: Fast intra prediction algorithm and design for high efficiency video coding. In: IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1770–1773. Montreal, Canada (2016). https://doi.org/10.1109/iscas.2016.7538911
Lu, W., Yu, N., Nan, J., Wang, D.: A hardware structure of HEVC intra prediction. In: 2015 2nd International Conference on Information Science and Control Engineering, pp. 555–559. Shanghai, China (2015). https://doi.org/10.1109/icisce.2015.129
Liu, Z., Wang, D., Zhu, H., Huang, X.: 41.7 BN-pixels/s reconfigurable intra prediction architecture for HEVC 2560 × 1600 encoder. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2634–2638. Vancouver, Canada (2013). https://doi.org/10.1109/icassp.2013.6638133
Khan, M., Shafique, M., Grellert, M., Henkel, J.: Hardware-software collaborative complexity reduction scheme for the emerging HEVC intra encoder. In: Proceedings of the conference on design, automation and test in Europe, pp. 125–128. Grenoble, France (2013). https://doi.org/10.7873/date.2013.039
Li, F., Shi, G., Wu, F.: An efficient VLSI architecture for 4 × 4 intra prediction in the High Efficiency Video Coding (HEVC) standard. In: 2011 18th IEEE International Conference on Image Processing, pp. 373–376. Brussels, Belgium (2011). https://doi.org/10.1109/icip.2011.6116526
Vanne, J., et al.: A high-performance sum of absolute difference implementation for motion estimation. IEEE Trans. Circuits Syst. Video Technol. 16(7), 876–883 (2006). https://doi.org/10.1109/TCSVT.2006.877150
Article Google Scholar
Yufei, L., Xiubo, F., Qin, W.: A high-performance low cost SAD architecture for video coding. IEEE Trans. Consum. Electron. 53(2), 535–541 (2007). https://doi.org/10.1109/TCE.2007.381726
Article Google Scholar
Liu, Z., et al.: Hardware-efficient propagate partial sad architecture for variable block size motion estimation in H. 264/AVC. In: Proceedings of the 17th ACM Great Lakes symposium on VLSI, pp. 160–163. ACM (2007). https://doi.org/10.1145/1228784.1228826

Download references

Acknowledgements

This work is partly financed by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES) Finance Code 001, by FCT projects PTDC/EEI-HAC/30485/2017 and UID/CEC/50021/2019, and also by CNPq and FAPERGS Brazilian research support agencies.

Author information

Authors and Affiliations

Video Technology Research Group, Group of Architectures and Integrated Circuits, Federal University of Pelotas (UFPel), Pelotas, RS, 96010-900, Brazil
Roger Porto, Marcel Correa, Jones Goebel, Bruno Zatt, Luciano Agostini & Marcelo Porto
Sul-Rio-Grandense Federal Institute of Science and Technology (IFSul), Bagé, Brazil
Roger Porto & Marcel Correa
Instituto Superior Técnico (IST), Universidade de Lisboa, Lisbon, Portugal
Nuno Roma
INESC-ID, Lisbon, Portugal
Nuno Roma

Authors

Roger Porto
View author publications
You can also search for this author in PubMed Google Scholar
Marcel Correa
View author publications
You can also search for this author in PubMed Google Scholar
Jones Goebel
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Zatt
View author publications
You can also search for this author in PubMed Google Scholar
Nuno Roma
View author publications
You can also search for this author in PubMed Google Scholar
Luciano Agostini
View author publications
You can also search for this author in PubMed Google Scholar
Marcelo Porto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roger Porto.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Porto, R., Correa, M., Goebel, J. et al. UHD 8K energy-quality scalable HEVC intra-prediction SAD unit hardware using optimized and configurable imprecise adders. J Real-Time Image Proc 17, 1685–1701 (2020). https://doi.org/10.1007/s11554-019-00934-2

Download citation

Received: 30 April 2019
Accepted: 28 November 2019
Published: 11 December 2019
Issue Date: October 2020
DOI: https://doi.org/10.1007/s11554-019-00934-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

UHD 8K energy-quality scalable HEVC intra-prediction SAD unit hardware using optimized and configurable imprecise adders

Abstract

Access this article

Similar content being viewed by others

Quality-power configurable flexible coding order hardware design for real-time 3D-HEVC intra-frame prediction

Energy-aware scheme for the 3D-HEVC depth maps prediction

Fast algorithms and VLSI architecture design for HEVC intra-mode decision

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

UHD 8K energy-quality scalable HEVC intra-prediction SAD unit hardware using optimized and configurable imprecise adders

Abstract

Access this article

Similar content being viewed by others

Quality-power configurable flexible coding order hardware design for real-time 3D-HEVC intra-frame prediction

Energy-aware scheme for the 3D-HEVC depth maps prediction

Fast algorithms and VLSI architecture design for HEVC intra-mode decision

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation