skip to main content
10.1145/3297858.3304041acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Bit-Tactical: A Software/Hardware Approach to Exploiting Value and Bit Sparsity in Neural Networks

Published: 04 April 2019 Publication History

Abstract

Weight and activation sparsity can be leveraged in hardware to boost the performance and energy efficiency of Deep Neural Networks during inference. Fully capitalizing on sparsity requires re-scheduling and mapping the execution stream to deliver non-zero weight/activation pairs to multiplier units for maximal utilization and reuse. However, permitting arbitrary value re-scheduling in memory space and in time places a considerable burden on hardware to perform dynamic at-runtime routing and matching of values, and incurs significant energy inefficiencies. Bit-Tactical (TCL) is a neural network accelerator where the responsibility for exploiting weight sparsity is shared between a novel static scheduling middleware, and a co-designed hardware front-end with a lightweight sparse shuffling network comprising two (2- to 8-input) multiplexers per activation input. We empirically motivate two back-end designs chosen to target bit-sparsity in activations, rather than value-sparsity, with two benefits: a) we avoid handling the dynamically sparse whole-value activation stream, and b) we uncover more ineffectual work. TCL outperforms other state-of-the-art accelerators that target sparsity for weights and activations, the dynamic precision requirements of activations, or their bit-level sparsity for a variety of neural networks.

References

[1]
Jorge Albericio, Alberto Delmás, Patrick Judd, Sayeh Sharify, Gerard O'Leary, Roman Genov, and Andreas Moshovos. 2017. Bit-pragmatic Deep Neural Network Computing. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-50 '17). 382--394.
[2]
Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing. In 2016 IEEE/ACM International Conference on Computer Architecture (ISCA) .
[3]
Peter Brucker. 2001. Scheduling Algorithms 3rd ed.). Springer-Verlag, Berlin, Heidelberg.
[4]
T Chen, Z Du, N Sun, J Wang, C Wu, Y Chen, and O Temam. 2014a. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Proceedings of the 19th international conference on Architectural support for programming languages and operating systems .
[5]
Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and O. Temam. 2014b. DaDianNao: A Machine-Learning Supercomputer. In Microarchitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium on. 609--622.
[6]
Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A Spatial Architecture for Energy-efficient Dataflow for Convolutional Neural Networks. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA '16). 367--379.
[7]
Chen, Yu-Hsin and Krishna, Tushar and Emer, Joel and Sze, Vivienne. 2016. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. In IEEE International Solid-State Circuits Conference, ISSCC 2016, Digest of Technical Papers. 262--263 .
[8]
Ronan Collobert and Jason Weston. 2008. A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning. In Proceedings of the 25th International Conference on Machine Learning (ICML '08). ACM, New York, NY, USA, 160--167.
[9]
Alberto Delmas, Patrick Judd, Sayeh Sharify, and Andreas Moshovos. 2017a. Dynamic Stripes: Exploiting the Dynamic Precision Requirements of Activation Values in Neural Networks. CoRR, Vol. abs/1706.00504 (2017). arxiv: 1706.00504 http://arxiv.org/abs/1706.00504
[10]
Alberto Delmas, Sayeh Sharify, Patrick Judd, and Andreas Moshovos. 2017b. Tartan: Accelerating Fully-Connected and Convolutional Layers in Deep Learning Networks by Exploiting Numerical Precision Variability. CoRR, Vol. abs/1707.09068 (2017). arxiv: 1707.09068 http://arxiv.org/abs/1707.09068
[11]
A. Delmas, S. Sharify, P. Judd, K. Siu, M. Nikolic, and A. Moshovos. 2018. DPRed: Making Typical Activation Values Matter In Deep Learning Computing. ArXiv e-prints (Dec. 2018). arxiv: 1804.06732
[12]
J. Deng, W. Dong, R. Socher, L. Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248--255.
[13]
Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, and William J. Dally. 2016. ESE: Efficient Speech Recognition Engine with Compressed LS™ on FPGA. CoRR, Vol. abs/1612.00694 (2016). arxiv: 1612.00694 http://arxiv.org/abs/1612.00694
[14]
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA '16). IEEE Press, Piscataway, NJ, USA, 243--254.
[15]
Song Han, Huizi Mao, and William J. Dally. 2015. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. CoRR, Vol. abs/1510.00149 (2015). arxiv: 1510.00149 http://arxiv.org/abs/1510.00149
[16]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput., Vol. 9, 8 (Nov. 1997), 1735--1780.
[17]
J.L. Holt and T.E. Baker. 1991. Back propagation simulations using limited precision calculations. In Neural Networks, 1991., IJCNN-91-Seattle International Joint Conference on, Vol. ii. 121--126 vol.2.
[18]
Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). 1--12.
[19]
Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor Aamodt, and Andreas Moshovos. 2016. Stripes: Bit-serial Deep Neural Network Computing. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-49).
[20]
D. Kim, J. Ahn, and S. Yoo. 2018. ZeNA: Zero-Aware Neural Network Accelerator. IEEE Design Test, Vol. 35, 1 (Feb 2018), 39--46.
[21]
Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. 1993. Building a Large Annotated Corpus of English: The Penn Treebank. Comput. Linguist., Vol. 19, 2 (June 1993), 313--330.
[22]
Micron. 2017. Calculating Memory Power for DDR4 SDRAM. Technical Note TN-40-07. https://www.micron.com/resource-details/868646c5--7ee2--4f6c-aaf4--7599bd5952df .
[23]
Naveen Muralimanohar and Rajeev Balasubramonian. {n. d.}. CACTI 6.0: A Tool to Understand Large Caches.
[24]
Sharan Narang, Gregory F. Diamos, Shubho Sengupta, and Erich Elsen. 2017. Exploring Sparsity in Recurrent Neural Networks. CoRR, Vol. abs/1704.05119 (2017). arxiv: 1704.05119 http://arxiv.org/abs/1704.05119
[25]
Milos Nikolic, Mostafa Mahmoud, Yiren Zhao, Robert Mullins, and Andreas Moshovos. 2019. Characterizing Sources of Ineffectual Computations in Deep Learning Networks. In International Symposium on Performance Analysis of Systems and Software.
[26]
NVIDIA. {n. d.}. NVIDIA Deep Learning Accelerator. ({n. d.}). nvdla.org
[27]
Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). ACM, New York, NY, USA, 27--40.
[28]
Jongsoo Park, Sheng Li, Wei Wen, Ping Tak Peter Tang, Hai Li, Yiran Chen, and Pradeep Dubey. 2017. Faster CNNs with Direct Sparse Convolutions and Guided Pruning. https://github.com/IntelLabs/SkimCaffe. In 5th International Conference on Learning Representations (ICLR).
[29]
Michael L. Pinedo. 2008. Scheduling: Theory, Algorithms, and Systems 3rd ed.). Springer Publishing Company, Incorporated.
[30]
Brandon Reagen, Paul Whatmough, Robert Adolf, Saketh Rama, Hyunkwang Lee, Sae Kyu Lee, José Miguel Hernández-Lobato, Gu-Yeon Wei, and David Brooks. 2016. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, 267--278.
[31]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2014. ImageNet Large Scale Visual Recognition Challenge. arXiv:1409.0575 {cs} (Sept. 2014). arXiv: 1409.0575.
[32]
Kevin Siu, Dylan Malone Stuart, Mostafa Mahmoud, and Andreas Moshovos. 2018. Memory Requirements for Convolutional Neural Network Hardware Accelerators. In IEEE International Symposium on Workload Characterization.
[33]
Xuan Yang, Jing Pu, Blaine Burton Rister, Nikhil Bhagdikar, Stephen Richardson, Shahar Kvatinsky, Jonathan Ragan-Kelley, Ardavan Pedram, and Mark Horowitz. 2016. A Systematic Approach to Blocking Convolutional Neural Networks. CoRR, Vol. abs/1606.04209 (2016).
[34]
Yang, Tien-Ju and Chen, Yu-Hsin and Sze, Vivienne. 2017. Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35]
Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-X: An accelerator for sparse neural networks. In 49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2016, Taipei, Taiwan, October 15--19, 2016. 1--12.
[36]
X. Zhou, Z. Du, Q. Guo, S. Liu, C. Liu, C. Wang, X. Zhou, L. Li, T. Chen, and Y. Chen. 2018. Cambricon-S: Addressing Irregularity in Sparse Neural Networks through A Cooperative Software/Hardware Approach. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 15--28.
[37]
M. Zhu and S. Gupta. 2017. To prune, or not to prune: exploring the efficacy of pruning for model compression. ArXiv e-prints (Oct. 2017). arxiv: stat.ML/1710.01878

Cited By

View all
  • (2025)Bit-Sparsity Aware Acceleration With Compact CSD Code on Generic Matrix MultiplicationIEEE Transactions on Computers10.1109/TC.2024.348363274:2(414-426)Online publication date: Feb-2025
  • (2024)Mentor: A Memory-Efficient Sparse-dense Matrix Multiplication Accelerator Based on Column-Wise ProductACM Transactions on Architecture and Code Optimization10.1145/368861221:4(1-25)Online publication date: 20-Nov-2024
  • (2024)Dyn-Bitpool: A Two-sided Sparse CIM Accelerator Featuring a Balanced Workload Scheme and High CIM Macro UtilizationProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3655690(1-6)Online publication date: 23-Jun-2024
  • Show More Cited By

Index Terms

  1. Bit-Tactical: A Software/Hardware Approach to Exploiting Value and Bit Sparsity in Neural Networks

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems
      April 2019
      1126 pages
      ISBN:9781450362405
      DOI:10.1145/3297858
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      In-Cooperation

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 04 April 2019

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. deep learning acceleration
      2. sparsity

      Qualifiers

      • Research-article

      Conference

      ASPLOS '19

      Acceptance Rates

      ASPLOS '19 Paper Acceptance Rate 74 of 351 submissions, 21%;
      Overall Acceptance Rate 535 of 2,713 submissions, 20%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)200
      • Downloads (Last 6 weeks)9
      Reflects downloads up to 17 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)Bit-Sparsity Aware Acceleration With Compact CSD Code on Generic Matrix MultiplicationIEEE Transactions on Computers10.1109/TC.2024.348363274:2(414-426)Online publication date: Feb-2025
      • (2024)Mentor: A Memory-Efficient Sparse-dense Matrix Multiplication Accelerator Based on Column-Wise ProductACM Transactions on Architecture and Code Optimization10.1145/368861221:4(1-25)Online publication date: 20-Nov-2024
      • (2024)Dyn-Bitpool: A Two-sided Sparse CIM Accelerator Featuring a Balanced Workload Scheme and High CIM Macro UtilizationProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3655690(1-6)Online publication date: 23-Jun-2024
      • (2024)Commercial Evaluation of Zero-Skipping MAC Design for Bit Sparsity Exploitation in DL Inference2024 IFIP/IEEE 32nd International Conference on Very Large Scale Integration (VLSI-SoC)10.1109/VLSI-SoC62099.2024.10767792(1-4)Online publication date: 6-Oct-2024
      • (2024)General Purpose Deep Learning Accelerator Based on Bit InterleavingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.334272843:5(1470-1483)Online publication date: May-2024
      • (2024)A Precision-Scalable Deep Neural Network Accelerator With Activation Sparsity ExploitationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.331091643:1(263-276)Online publication date: Jan-2024
      • (2024) 3 A -ReRAM: Adaptive Activation Accumulation in ReRAM-Based CNN Accelerator IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.329796843:1(176-188)Online publication date: Jan-2024
      • (2024)Bit-Balance: Model-Hardware Codesign for Accelerating NNs by Exploiting Bit-Level SparsityIEEE Transactions on Computers10.1109/TC.2023.332447773:1(152-163)Online publication date: 1-Jan-2024
      • (2024)ERA-BS: Boosting the Efficiency of ReRAM-Based PIM Accelerator With Fine-Grained Bit-Level SparsityIEEE Transactions on Computers10.1109/TC.2023.329086973:9(2320-2334)Online publication date: Sep-2024
      • (2024)VSPIM: SRAM Processing-in-Memory DNN Acceleration via Vector-Scalar OperationsIEEE Transactions on Computers10.1109/TC.2023.328509573:10(2378-2390)Online publication date: Oct-2024
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media