skip to main content
10.1145/3195970.3196072acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Loom: exploiting weight and activation precisions to accelerate convolutional neural networks

Published: 24 June 2018 Publication History

Abstract

Loom (LM), a hardware inference accelerator for Convolutional Neural Networks (CNNs) is presented. In LM every bit of data precision that can be saved translates to proportional performance gains. For both weights and activations LM exploits profile-derived per layer precisions. However, at runtime LM further trims activation precisions at a much smaller than a layer granularity. On average, across several image classification CNNs and for a configuration that can perform the equivalent of 128 16b × 16b multiply-accumulate operations per cycle LM outperforms a state-of-the-art bit-parallel accelerator [3] by 3.19× without any loss in accuracy while being 2.59× more energy efficient. LM can trade-off accuracy for additional improvements in execution performance and energy efficiency and compares favorably to an accelerator that targeted only activation precisions.

References

[1]
Jorge Albericio, Alberto Delmás, Patrick Judd, Sayeh Sharify, Gerard O'Leary, Roman Genov, and Andreas Moshovos. 2017. Bit-pragmatic Deep Neural Network Computing. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-50 '17). 382--394.
[2]
Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing. In 2016 IEEE/ACM International Conference on Computer Architecture (ISCA).
[3]
Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and O. Temam. 2014. DaDianNao: A Machine-Learning Supercomputer. In Microarchitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium on. 609--622.
[4]
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA '16). IEEE Press, Piscataway, NJ, USA, 243--254.
[5]
Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, Raquel Urtasun, and Andreas Moshovos. 2015. Reduced-Precision Strategies for Bounded Memory in Deep Neural Nets. arXiv:1511.05236v4 {cs.LG} (2015).
[6]
Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor Aamodt, and Andreas Moshovos. 2016. Stripes: Bit-serial Deep Neural Network Computing. In Proc. of the 49th Annual IEEE/ACM Intl' Symposium on Microarchitecture.
[7]
Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor M Aamodt, Natalie Enright Jerger, and Andreas Moshovos. Proteus: Exploiting numerical precision variability in deep neural networks. In Proceedings of the 2016 International Conference on Supercomputing. 23.
[8]
Alberto Delmas Lascorz, Sayeh Sharify, Patrick Judd, and Andreas Moshovos. 2017. Dynamic Stripes: Exploiting the Dynamic Precision Requirements of Activation Values in Neural Networks. CoRR abs/1706.00504 (2017). arXiv:1706.00504 http://arxiv.org/abs/1706.00504
[9]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single Shot MultiBox Detector. arXiv:1512.02325 {cs.CV} (2016).
[10]
Rastislav Lukac. 2016. Computational photography: methods and applications. CRC Press.
[11]
Naveen Muralimanohar and Rajeev Balasubramonian. 2015. CACTI 6.0: A Tool to Understand Large Caches. (2015).
[12]
Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). 27--40.
[13]
M. Poremba, S. Mittal, Dong Li, J.S. Vetter, and Yuan Xie. 2015. DESTINY: A tool for modeling emerging 3D NVM and eDRAM caches. In Design, Automation Test in Europe Conference Exhibition.
[14]
T. Szabo, L. Antoni, G. Horvath, and B. Feher. 2000. A full-parallel digital implementation for pre-trained NNs. In IJCNN 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks, 2000, Vol. 2.49--54 vol.2.
[15]
Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-X: An accelerator for sparse neural networks. In 49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2016, Taipei, Taiwan, October 15-19, 2016. 1--12.

Cited By

View all
  • (2024)BS2: Bit-Serial Architecture Exploiting Weight Bit Sparsity for Efficient Deep Learning Acceleration2024 21st International SoC Design Conference (ISOCC)10.1109/ISOCC62682.2024.10762498(356-357)Online publication date: 19-Aug-2024
  • (2024)BitNN: A Bit-Serial Accelerator for K-Nearest Neighbor Search in Point Clouds2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00095(1278-1292)Online publication date: 29-Jun-2024
  • (2024)Adversarial Robustness of Multi-bit Convolutional Neural NetworksIntelligent Systems and Applications10.1007/978-3-031-47715-7_12(157-174)Online publication date: 30-Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DAC '18: Proceedings of the 55th Annual Design Automation Conference
June 2018
1089 pages
ISBN:9781450357005
DOI:10.1145/3195970
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 June 2018

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

DAC '18
Sponsor:
DAC '18: The 55th Annual Design Automation Conference 2018
June 24 - 29, 2018
California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25
62nd ACM/IEEE Design Automation Conference
June 22 - 26, 2025
San Francisco , CA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)54
  • Downloads (Last 6 weeks)8
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)BS2: Bit-Serial Architecture Exploiting Weight Bit Sparsity for Efficient Deep Learning Acceleration2024 21st International SoC Design Conference (ISOCC)10.1109/ISOCC62682.2024.10762498(356-357)Online publication date: 19-Aug-2024
  • (2024)BitNN: A Bit-Serial Accelerator for K-Nearest Neighbor Search in Point Clouds2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00095(1278-1292)Online publication date: 29-Jun-2024
  • (2024)Adversarial Robustness of Multi-bit Convolutional Neural NetworksIntelligent Systems and Applications10.1007/978-3-031-47715-7_12(157-174)Online publication date: 30-Jan-2024
  • (2022)Reconfigurable Neural Network Accelerator and Simulator for Model ImplementationIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences10.1587/transfun.2021VLP0012E105.A:3(448-458)Online publication date: 1-Mar-2022
  • (2022)Hardware Approximate Techniques for Deep Neural Network Accelerators: A SurveyACM Computing Surveys10.1145/352715655:4(1-36)Online publication date: 21-Nov-2022
  • (2022)Ax-BxP: Approximate Blocked Computation for Precision-reconfigurable Deep Neural Network AccelerationACM Transactions on Design Automation of Electronic Systems10.1145/349273327:3(1-20)Online publication date: 28-Jan-2022
  • (2022)Enable Deep Learning on Mobile Devices: Methods, Systems, and ApplicationsACM Transactions on Design Automation of Electronic Systems10.1145/348661827:3(1-50)Online publication date: 4-Mar-2022
  • (2022)RNA: A Flexible and Efficient Accelerator Based on Dynamically Reconfigurable Computing for Multiple Convolutional Neural NetworksJournal of Circuits, Systems and Computers10.1142/S021812662250289931:16Online publication date: 7-Jul-2022
  • (2022)HiKonv: High Throughput Quantized Convolution with Novel Bit-Wise Management and ComputationProceedings of the 27th Asia and South Pacific Design Automation Conference10.1109/ASP-DAC52403.2022.9712553(140-146)Online publication date: 17-Jan-2022
  • (2022)BWPT: Binary weight partial‐sum table for BNN/BWN accelerationElectronics Letters10.1049/ell2.1246458:9(346-348)Online publication date: 9-Mar-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media