Cross-layer efforts for energy-efficient computing: towards peta operations per second per watt

Hu, Xiaobo Sharon; Niemier, Michael

doi:10.1631/FITEE.1800466

Cross-layer efforts for energy-efficient computing: towards peta operations per second per watt

Perspective
Published: 28 November 2018

Volume 19, pages 1209–1223, (2018)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

256 Accesses
5 Citations
6 Altmetric
1 Mention
Explore all metrics

Abstract

As Moore’s law based device scaling and accompanying performance scaling trends are slowing down, there is increasing interest in new technologies and computational models for fast and more energy-efficient information processing. Meanwhile, there is growing evidence that, with respect to traditional Boolean circuits and von Neumann processors, it will be challenging for beyond-CMOS devices to compete with the CMOS technology. Exploiting unique characteristics of emerging devices, especially in the context of alternative circuit and architectural paradigms, has the potential to offer orders of magnitude improvement in terms of power, performance, and capability. To take full advantage of beyond-CMOS devices, cross-layer efforts spanning from devices to circuits to architectures to algorithms are indispensable. This study examines energy-efficient neural network accelerators for embedded applications in this context. Several deep neural network accelerator designs based on cross-layer efforts spanning from alternative device technologies, circuit styles, to architectures are highlighted. Application-level benchmarking studies are presented. The discussions demonstrate that cross-layer efforts indeed can lead to orders of magnitude gain towards achieving extreme-scale energy-efficient processing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hardware–Software Co-design of Deep Neural Architectures: From FPGAs and ASICs to Computing-in-Memories

Energy-Efficient Design of Advanced Machine Learning Hardware

Energy Complexity Model for Convolutional Neural Networks

References

Avci UE, Rios R, Kuhn K, et al., 2011. Comparison of performance, switching energy and process variations for the TFET and MOSFET in logic. Symp. on VLSI Technology, Digest of Technical Papers, p.124–125.
Google Scholar
Aziz A, Breyer ET, Chen A, et al., 2018. Computing with ferroelectric FETs: devices, models, systems, and applications. Proc Design, Automation & Test in Europe Conf Exhibition, p.1289–1298. https://doi.org/10.23919/DATE.2018.8342213
Google Scholar
Bottou L, 2010. Large–scale machine learning with stochastic gradient descent. Proc 19th Int Conf on Computational Statistics, p.177–186. https://doi.org/10.1007/978–3–7908–2604–3_16
Google Scholar
Chen XM, Yin XZ, Niemier M, et al., 2018. Design and optimization of FeFET–based crossbars for binary convolution neural networks. Proc Design, Automation & Test in Europe Conf Exhibition, p.1205–1210. https://doi.org/10.23919/DATE.2018.8342199
Book Google Scholar
Chen YH, Krishna T, Emer JS, et al., 2017. Eyeriss: an energy–efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Sol–State Circ, 52(1):127–138. https://doi.org/10.1109/JSSC.2016.2616357
Article Google Scholar
Chua LO, Roska T, 2002. Cellular Neural Networks and Visual Computing: Foundations and Applications. Cambridge University Press, New York, NY, USA.
Book Google Scholar
Chua LO, Yang L, 1988. Cellular neural networks: theory. IEEE Trans Circ Syst, 35(10):1257–1272. https://doi.org/10.1109/31.7600
Article MathSciNet MATH Google Scholar
Dahl GE, Sainath TN, Hinton GE, 2013. Improving deep neural networks for LVCSR using rectified linear units and dropout. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.8609–8613. https://doi.org/10.1109/ICASSP.2013.6639346
Book Google Scholar
Esmaeilzadeh H, Blem E, St. Amant R, et al., 2011. Dark silicon and the end of multicore scaling. Proc 38th Annual Int Symp on Computer Architecture, p.365–376. https://doi.org/10.1145/2024723.2000108
Google Scholar
Esmaeilzadeh H, Blem E, St. Amant R, et al., 2013. Power challenges may end the multicore era. Commun ACM, 56(2):93–102. https://doi.org/10.1145/2408776.2408797
Article Google Scholar
George S, Aziz A, Li XQ, et al., 2016a. Device circuit co design of FeFET based logic for low voltage processors. Proc IEEE Computer Society Annual Symp on VLSI, p.649–654. https://doi.org/10.1109/ISVLSI.2016.116
Google Scholar
George S, Ma KS, Aziz A, et al., 2016b. Nonvolatile memory design based on ferroelectric FETs. Proc 53rd Annual Design Automation Conf, Article 118. https://doi.org/10.1145/2897937.2898050
Google Scholar
Horváth A, Hillmer M, Lou QW, et al., 2017. Cellular neural network friendly convolutional neural networks—CNNs with CNNs. Proc Design, Automation & Test in Europe Conf & Exhibition, p.145–150. https://doi.org/10.23919/DATE.2017.7926973
Book Google Scholar
Ionescu AM, Riel H, 2011. Tunnel field–effect transistors as energy–efficient electronic switches. Nature, 479(7373):329–337. https://doi.org/10.1038/natur.10679
Article Google Scholar
Kam H, Liu TJK, Alon E, 2012. Design requirements for steeply switching logic devices. IEEE Trans Electron Dev, 59(2):326–334. https://doi.org/10.1109/TED.2011.2175484
Article Google Scholar
Khatami Y, Banerjee K, 2009. Steep subthreshold slope n–and p–type tunnel–FET devices for low–power and energy–efficient digital circuits. IEEE Trans Electron Dev, 56(11):2752–2761. https://doi.org/10.1109/TED.2009.2030831
Article Google Scholar
Kim K, Lee S, Kim JY, et al., 2008. A.125GOPS 583 mW network–on–chip based parallel processor with bioinspired visual attention engine. IEEE J Sol–State Circ, 44(1):136–147. https://doi.org/10.1109/JSSC.2008.2007157
Article Google Scholar
LeCun Y, Bottou L, Bengio Y, et al., 1998. Gradient–based learning applied to document recognition. Proc IEEE, 86(11):2278–2324. https://doi.org/10.1109/5.726791
Article Google Scholar
Li MO, Yan RS, Jena D, et al., 2016. Two–dimensional heterojunction interlayer tunnel FET (Thin–TFET): from theory to applications. Proc IEEE Int Electron Devices Meeting, p.504–507. https://doi.org/10.1109/IEDM.2016.7838451
Google Scholar
Liu HC, Datta S, Shoaran M, et al., 2014. Tunnel FETbased ultra–low power, low–noise amplifier design for bio–signal acquisition. Proc IEEE/ACM Int Symp on Low Power Electronics and Design, p.57–62. https://doi.org/10.1145/2627369.2627631
Google Scholar
Lou QW, Palit I, Horváth A, et al., 2015. TFET–based operational transconductance amplifier design for CNN systems. Proc 25th Edition on Great Lakes Symp on VLSI, p.277–282. https://doi.org/10.1145/2742060.2742089
Book Google Scholar
Lou QW, Pan CY, McGuinness J, et al., 2018. A mixed signal architecture for convolutional neural networks. To appear in arXiv.
Google Scholar
Molinar–Solis JE, Gomez–Castaneda F, Moreno–Cadenas JA, et al., 2007. Programmable CMOS CNN cell based on floating–gate inverter unit. J VLSI Signal Process Syst Signal Image Video Technol, 49(1):207–216. https://doi.org/10.1007/s11265–007–0056.7
Article Google Scholar
Moons B, Verhelst M, 2016. A 0.3–2.6 TOPS/W precisionscalable processor for real–time large–scale ConvNets. Proc IEEE Symp on VLSI Circuits, p.1–2. https://doi.org/10.1109/VLSIC.2016.7573525
Google Scholar
Nikonov DE, Young IA, 2013. Overview of beyond–CMOS devices and a uniform methodology for their benchmarking. Proc IEEE, 101(12):2498–2533. https://doi.org/10.1109/JPROC.2013.2252317
Article Google Scholar
Nikonov DE, Young IA, 2015. Benchmarking of beyond–CMOS exploratory devices for logic integrated circuits. IEEE J Explor Sol–State Comput Dev Circ, 1:3–11. https://doi.org/10.1109/JXCDC.2015.2418033
Google Scholar
Pan CY, Naeemi A, 2017a. Beyond–CMOS device benchmarking for Boolean and non–Boolean logic applications. http://cn.arxiv.org/abs/1711.04295
Google Scholar
Pan CY, Naeemi A, 2017b. Beyond–CMOS non–Boolean logic benchmarking: insights and future directions. Proc Design, Automation & Test in Europe Conf & Exhibition, p.133–138. https://doi.org/10.23919/DATE.2017.7926971
Google Scholar
Perricone R, Hu XS, Nahas J, et al., 2016. Can beyond–CMOS devices illuminate dark silicon? Design, Automation Test in Europe Conf Exhibition, p.13–18.
Book Google Scholar
Reagen B, Whatmough P, Adolf R, et al., 2016. Minerva: enabling low–power, highly–accurate deep neural network accelerators. Proc ACM/IEEE 43rd Annual Int Symp on Computer Architecture, p.267–278. https://doi.org/10.1109/ISCA.2016.32
Google Scholar
Reis D, Niemier M, Hu X, 2018. Computing in memory with FeFETs. Proc IEEE/ACM Int Symp on Low Power Electronics and Design, p.1–6. https://doi.org/10.1145/2627369.2627631
Google Scholar
Rodriguez–Vázquez A, Liñán–Cembrano G, Carranza L, et al., 2004. Ace16k: the third generation of mixed–signal SIMD–CNN ACE chips toward VSoCs. IEEE Trans Circ Syst I, 51(5):851–863. https://doi.org/10.1109/TCSI.2004.827621
Article Google Scholar
Salahuddin S, Datta S, 2008. Use of negative capacitance to provide voltage amplification for low power nanoscale devices. Nano Lett, 8(2):405–410. https://doi.org/10.1021/nl071804g
Article Google Scholar
Salmon L, 2017. A DARPA Perspective. https://www.src. org/calendar/e006128/agenda/salmon–darpa.pdf
Google Scholar
Scheutz M, McRaven J, Cserey G, 2004. Fast, reliable, adaptive, bimodal people tracking for indoor environments. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.1347–1352. https://doi.org/10.1109/IROS.2004.1389583
Book Google Scholar
Seabaugh AC, Zhang Q, 2010. Low–voltage tunnel transistors for beyond CMOS logic. Proc IEEE, 98(12):2095–2110. https://doi.org/10.1109/JPROC.2010.2070470
Article Google Scholar
Sedighi B, Hu XS, Liu HC, et al., 2015. Analog circuit design using tunnel–FETs. IEEE Trans Circ Syst I, 62(1):39–48. https://doi.org/10.1109/TCSI.2014.2342371
Google Scholar
Szegedy C, Vanhoucke V, Ioffe S, et al., 2016. Rethinking the inception architecture for computer vision. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.2818–2826. https://doi.org/10.1109/CVPR.2016.308
Book Google Scholar
Szolgay P, Szatmari I, Laszlo K, 1997. A fast fixed point learning method to implement associative memory on CNNs. IEEE Trans Circ Syst I, 44(4):362–366. https://doi.org/10.1109/81.563627
Article Google Scholar
Tang TQ, Xia LX, Li BX, et al., 2017. Binary convolutional neural network on RRAM. Proc 22nd Asia and South Pacific Design Automation Conf, p.782–787. https://doi.org/10.1109/ASPDAC.2017.7858419
Book Google Scholar
Wan L, Zeiler M, Zhang S, et al., 2013. Regularization of neural networks using dropconnect. Proc 30th Int Conf on Machine Learning, p.1058–1066.
Google Scholar
Wang L, de Gyvez JP, Sanchez–Sinencio E, 1998. Time multiplexed color image processing based on a CNN with cell–state outputs. IEEE Trans VLSI Syst, 6(2):314–322. https://doi.org/10.1109/92.678895
Article Google Scholar
Whatmough PN, Lee SK, Lee H, et al., 2017. 14.3 A 28nm SoC with a 1.2GHz 568nJ/prediction sparse deepneural–network engine with >0.1 timing error rate tolerance for IoT applications. Proc IEEE Int Solid–State Circuits Conf, p.242–243. https://doi.org/10.1109/ISSCC.2017.7870351
Google Scholar
Xu XW, Lu Q, Wang TC, et al., 2017. Edge segmentation: empowering mobile telemedicine with compressed cellular neural networks. Proc 36th Int Conf on Computer–Aided Design, p.880–887. https://doi.org/10.1109/ICCAD.2017.8203873
Google Scholar
Yin XZ, Aziz A, Nahas J, et al., 2016a. Exploiting ferroelectric FETs for low–power non–volatile logic–in–memory circuits. Proc IEEE/ACM Int Conf on Computer–Aided Design, p.1–8. https://doi.org/10.1145/2966986.2967037
Book Google Scholar
Yin XZ, Sedighi B, Niemier M, et al., 2016b. Design of latches and flip–flops using emerging tunneling devices. Proc Design, Automation & Test in Europe Conf & Exhibition, p.1150–1155. https://doi.org/10.3850/9783981537079.0669
Book Google Scholar
Yin XZ, Niemier M, Hu XS, 2017. Design and benchmarking of ferroelectric FET based TCAM. Proc Design, Automation & Test in Europe Conf & Exhibition, p.1448–1453. https://doi.org/10.23919/DATE.2017.7927219
Book Google Scholar
Zhao W, Cao Y, 2006. New generation of predictive technology model for sub–45 nm early design exploration. IEEE Trans Electron Dev, 53(11):2816–2823. https://doi.org/10.1109/TED.2006.884077
Article Google Scholar
Zhou G, Li R, Vasen T, et al., 2012. Novel gate–recessed vertical InAs/GaSb TFETs with record high ION of 180 μA/μm at VDS=0.5 V. Proc Int Electron Devices Meeting, p.777–780. https://doi.org/10.1109/IEDM.2012.6479154
Google Scholar

Download references

Acknowledgements

The authors would like to thank Mr. Qiu-wen LOU and Dr. Robert PERRICONE who have contributed several graphs and experimental data used in this study.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA
Xiaobo Sharon Hu & Michael Niemier

Authors

Xiaobo Sharon Hu
View author publications
You can also search for this author in PubMed Google Scholar
Michael Niemier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaobo Sharon Hu.

Additional information

Project supported by the Center for Low Energy Systems Technology (LEAST), one of the six centers of STARnet, a Semiconductor Research Corporation Program sponsored by MARCO and DARPA

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, X.S., Niemier, M. Cross-layer efforts for energy-efficient computing: towards peta operations per second per watt. Frontiers Inf Technol Electronic Eng 19, 1209–1223 (2018). https://doi.org/10.1631/FITEE.1800466

Download citation

Received: 05 August 2018
Revised: 09 September 2018
Accepted: 10 October 2018
Published: 28 November 2018
Issue Date: October 2018
DOI: https://doi.org/10.1631/FITEE.1800466

Key words

CLC number

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross-layer efforts for energy-efficient computing: towards peta operations per second per watt

Abstract

Access this article

Similar content being viewed by others

Hardware–Software Co-design of Deep Neural Architectures: From FPGAs and ASICs to Computing-in-Memories

Energy-Efficient Design of Advanced Machine Learning Hardware

Energy Complexity Model for Convolutional Neural Networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

CLC number

Navigation

Cross-layer efforts for energy-efficient computing: towards peta operations per second per watt

Abstract

Access this article

Similar content being viewed by others

Hardware–Software Co-design of Deep Neural Architectures: From FPGAs and ASICs to Computing-in-Memories

Energy-Efficient Design of Advanced Machine Learning Hardware

Energy Complexity Model for Convolutional Neural Networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Search

Navigation