research-article

Anticipating and eliminating redundant computations in accelerated sparse training

Authors:

Jonathan S. Lew,

R. David Evans,

Tor M. AamodtAuthors Info & Claims

ISCA '22: Proceedings of the 49th Annual International Symposium on Computer Architecture

Pages 536 - 551

https://doi.org/10.1145/3470496.3527404

Published: 11 June 2022 Publication History

Abstract

Deep Neural Networks (DNNs) are the state of art in image, speech, and text processing. To address long training times and high energy consumption, custom accelerators can exploit sparsity, that is zero-valued weights, activations, and gradients. Proposed sparse Convolution Neural Network (CNN) accelerators support training with no more than one dynamic sparse convolution input. Among existing accelerator classes, the only ones supporting two-sided dynamic sparsity are outer-product-based accelerators. However, when mapping a convolution onto an outer product, multiplications occur that do not correspond to any valid output. These Redundant Cartesian Products (RCPs) decrease energy efficiency and performance. We observe that in sparse training, up to 90% of computations are RCPs resulting from the convolution of large matrices for weight updates during the backward pass of CNN training.

In this work, we design a mechanism, ANT, to anticipate and eliminate RCPs, enabling more efficient sparse training when integrated with an outer-product accelerator. By anticipating over 90% of RCPs, ANT achieves a geometric mean of 3.71× speed up over an SCNN-like accelerator [67] on 90% sparse training using DenseNet-121 [38], ResNet18 [35], VGG16 [73], Wide ResNet (WRN) [85], and ResNet-50 [35], with 4.40× decrease in energy consumption and 0.0017mm² of additional area. We extend ANT to sparse matrix multiplication, so that the same accelerator can anticipate RCPs in sparse fully-connected layers, transformers, and RNNs.

References

[1]

2020. NVIDIA A100 Tensor Core GPU Architecture. https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-ampere-architecture-whitepaper.pdf.

[2]

Jorge Albericio, Alberto Delmás, Patrick Judd, Sayeh Sharify, Gerard O'Leary, Roman Genov, and Andreas Moshovos. 2017. Bit-Pragmatic Deep Neural Network Computing. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-50 '17). Association for Computing Machinery, New York, NY, USA, 382--394.

Digital Library

[3]

Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing. ACM SIGARCH Computer Architecture News 44, 3 (June 2016), 1--13.

Digital Library

[4]

Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. Wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. Advances in Neural Information Processing Systems 33 (2020), 12449--12460.

[5]

Christian Bartz. 2021. chainer-transformer.

[6]

Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D. Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, Xin Zhang, Jake Zhao, and Karol Zieba. 2016. End to End Learning for Self-Driving Cars. arXiv:1604.07316 [cs] (April 2016). arXiv:1604.07316 [cs]

[7]

Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. 2016. Training Deep Nets with Sublinear Memory Cost. arXiv:1604.06174 [cs] (April 2016). arXiv:1604.06174 [cs]

[8]

Y. Chen, T. Krishna, J. S. Emer, and V. Sze. 2017. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE Journal of Solid-State Circuits 52, 1 (Jan. 2017), 127--138.

[9]

Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, and O. Temam. 2014. DaDianNao: A Machine-Learning Supercomputer. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. 609--622.

Digital Library

[10]

Y. Chen, T. Yang, J. Emer, and V. Sze. 2019. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9, 2 (June 2019), 292--308.

[11]

Brian Chmiel, Liad Ben-Uri, Moran Shkolnik, Elad Hoffer, Ron Banner, and Daniel Soudry. 2020. Neural Gradients Are Near-Lognormal: Improved Quantized and Sparse Training. In International Conference on Learning Representations.

[12]

Jungwook Choi, Pierce I.-Jen Chuang, Zhuo Wang, Swagath Venkataramani, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. 2018. Bridging the Accuracy Gap for 2-Bit Quantized Neural Networks (QNN). arXiv:1807.06964 [cs] (July 2018). arXiv:1807.06964 [cs]

[13]

Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2016. BinaryConnect: Training Deep Neural Networks with Binary Weights during Propagations. arXiv:1511.00363 [cs] (April 2016). arXiv:1511.00363 [cs]

[14]

Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. arXiv:1602.02830 [cs] (March 2016). arXiv:1602.02830 [cs]

[15]

William J. Dally, R. Curtis Harting, and Tor M. Aamodt. 2016. Digital Design Using VHDL: A Systems Approach (1st ed.). Cambridge University Press, USA.

[16]

Dipankar Das, Naveen Mellempudi, Dheevatsa Mudigere, Dhiraj Kalamkar, Sasikanth Avancha, Kunal Banerjee, Srinivas Sridharan, Karthik Vaidyanathan, Bharat Kaul, Evangelos Georganas, Alexander Heinecke, Pradeep Dubey, Jesus Corbal, Nikita Shustrov, Roma Dubtsov, Evarist Fomenko, and Vadim Pirogov. 2018. Mixed Precision Training of Convolutional Neural Networks Using Integer Operations. In International Conference on Learning Representations.

[17]

Alberto Delmas Lascorz, Patrick Judd, Dylan Malone Stuart, Zissis Poulos, Mostafa Mahmoud, Sayeh Sharify, Milos Nikolic, Kevin Siu, and Andreas Moshovos. 2019. Bit-Tactical: A Software/Hardware Approach to Exploiting Value and Bit Sparsity in Neural Networks. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '19). Association for Computing Machinery, New York, NY, USA, 749--763.

Digital Library

[18]

Chunhua Deng, Yang Sui, Siyu Liao, Xuehai Quan, and Bo Yuan. 2021. GoSPA: An Energy-Efficient High-Performance Globally Optimized SParse Convolutional Neural Network Accelerator. In Proceedings of the 48th International Symposium on Computer Architecture (ISCA'21). Association for Computing Machinery, 1110--1123.

Digital Library

[19]

J. Deng, W. Dong, R. Socher, L. Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248--255.

[20]

Tim Dettmers and Luke Zettlemoyer. 2019. Sparse Networks from Scratch: Faster Training without Losing Performance. arXiv:1907.04840 [cs, stat] (Aug. 2019). arXiv:1907.04840 [cs, stat]

[21]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs] (May 2019). arXiv:1810.04805 [cs]

[22]

R. D. Evans, L. Liu, and T. M. Aamodt. 2020. JPEG-ACT: Accelerating Deep Learning via Transform-Based Lossy Compression. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). 860--873.

Digital Library

[23]

Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, and Erich Elsen. 2020. Rigging the Lottery: Making All Tickets Winners. In International Conference on Machine Learning. PMLR, 2943--2952.

[24]

Andrew Feldman. 2020. Cerebras Wafer Scale Engine: Why We Need Big Chips for Deep Learning.

[25]

Jonathan Frankle and Michael Carbin. 2019. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. arXiv:1803.03635 [cs] (March 2019). arXiv:1803.03635 [cs]

[26]

J.P. Fricker and A. Hock. 2019. Building a Wafer-Scale Deep Learning System: Lessons Learned.

[27]

Georgios Georgiadis. 2019. Accelerating Convolutional Neural Networks via Activation Map Compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7085--7095.

[28]

Negar Goli and Tor M. Aamodt. 2020. ReSprop: Reuse Sparsified Backpropagation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1548--1558.

[29]

Maximilian Golub, Guy Lemieux, and Mieszko Lis. 2019. Full Deep Neural Network Training On A Pruned Weight Budget. Proceedings of Machine Learning and Systems 1 (April 2019), 252--263.

[30]

Aidan N. Gomez, Mengye Ren, Raquel Urtasun, and Roger B. Grosse. 2017. The Reversible Residual Network: Backpropagation Without Storing Activations. In NIPS.

[31]

Ashish Gondimalla, Noah Chesnut, Mithuna Thottethodi, and T. N. Vijaykumar. 2019. SparTen: A Sparse Tensor Accelerator for Convolutional Neural Networks. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '52). Association for Computing Machinery, New York, NY, USA, 151--165.

Digital Library

[32]

U. Gupta, B. Reagen, L. Pentecost, M. Donato, T. Tambe, A. M. Rush, G. Wei, and D. Brooks. 2019. MASR: A Modular Accelerator for Sparse RNNs. In 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT). 1--14.

Digital Library

[33]

Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. ACM SIGARCH Computer Architecture News 44, 3 (June 2016), 243--254.

Digital Library

[34]

Song Han, Huizi Mao, and William J. Dally. 2016. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv:1510.00149 [cs] (Feb. 2016). arXiv:1510.00149 [cs]

[35]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.

[36]

Yang He, Guoliang Kang, Xuanyi Dong, Yanwei Fu, and Yi Yang. 2018. Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI'18). AAAI Press, Stockholm, Sweden, 2234--2240.

[37]

Yihui He, Xiangyu Zhang, and Jian Sun. 2017. Channel Pruning for Accelerating Very Deep Neural Networks. In 2017 IEEE International Conference on Computer Vision (ICCV). IEEE Computer Society, 1398--1406.

[38]

Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. 2017. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]

Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Mia Xu Chen, Dehao Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, and Zhifeng Chen. 2019. GPipe: Efficient Training of Giant Neural Networks Using Pipeline Parallelism. arXiv:1811.06965 [cs] (July 2019). arXiv:1811.06965 [cs]

[40]

Zhengwei Huang, Ming Dong, Qirong Mao, and Yongzhao Zhan. 2014. Speech Emotion Recognition Using CNN. In Proceedings of the 22nd ACM International Conference on Multimedia (MM '14). Association for Computing Machinery, New York, NY, USA, 801--804.

Digital Library

[41]

Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2018. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations. Journal of Machine Learning Research 18, 187 (2018), 1--30.

[42]

Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-Level Accuracy with 50x Fewer Parameters and <0.5MB Model Size. arXiv:1602.07360 [cs] (Nov. 2016). arXiv:1602.07360 [cs]

[43]

Norman P Jouppi, Doe Hyun Yoon, Matthew Ashcraft, Mark Gottscho, Thomas B Jablin, George Kurian, James Laudon, Sheng Li, Peter Ma, Xiaoyu Ma, Thomas Norrie, Nishant Patil, Sushma Prasad, Cliff Young, Zongwei Zhou, and David Patterson. 2021. Ten Lessons from Three Generations Shaped Google's TPUv4i. In Proceedings of the 48th International Symposium on Computer Architecture (ISCA'21). Association for Computing Machinery.

Digital Library

[44]

Norman P. Jouppi, Doe Hyun Yoon, George Kurian, Sheng Li, Nishant Patil, James Laudon, Cliff Young, and David Patterson. 2020. A Domain-Specific Supercomputer for Training Deep Neural Networks. Commun. ACM 63, 7 (June 2020), 67--78.

Digital Library

[45]

Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). Association for Computing Machinery, New York, NY, USA, 1--12.

Digital Library

[46]

Alex Krizhevsky and Geoffrey Hinton. 2009. Learning Multiple Layers of Features from Tiny Images. (2009).

[47]

H.T. Kung, Bradley McDanel, and Sai Qian Zhang. 2019. Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '19). Association for Computing Machinery, New York, NY, USA, 821--834.

Digital Library

[48]

Mark Kurtz, Justin Kopinsky, Rati Gelashvili, Alexander Matveev, John Carr, Michael Goin, William Leiserson, Sage Moore, Nir Shavit, and Dan Alistarh. 2020. Inducing and Exploiting Activation Sparsity for Fast Inference on Deep Neural Networks. In International Conference on Machine Learning. PMLR, 5533--5543.

[49]

Yann Le Cun, John S. Denker, and Sara A. Solla. 1989. Optimal Brain Damage. In Proceedings of the 2nd International Conference on Neural Information Processing Systems (NIPS'89). MIT Press, Cambridge, MA, USA, 598--605.

[50]

Fengfu Li, Bo Zhang, and Bin Liu. 2016. Ternary Weight Networks. arXiv:1605.04711 [cs] (Nov. 2016). arXiv:1605.04711 [cs]

[51]

Yujun Lin, Song Han, Huizi Mao, Yu Wang, and William J. Dally. 2020. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training. arXiv:1712.01887 [cs, stat] (June 2020). arXiv:1712.01887 [cs, stat]

[52]

Junjie Liu, Zhe Xu, Runbin Shi, Ray C. C. Cheung, and Hayden K. H. So. 2020. Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers. arXiv:2005.06870 [cs, stat] (May 2020). arXiv:2005.06870 [cs, stat]

[53]

Liu Liu, Lei Deng, Xing Hu, Maohua Zhu, Guoqi Li, Yufei Ding, and Yuan Xie. 2019. Dynamic Sparse Graph for Efficient Deep Learning. arXiv:1810.00859 [cs, stat] (May 2019). arXiv:1810.00859 [cs, stat]

[54]

S. Liu, Z. Du, J. Tao, D. Han, T. Luo, Y. Xie, Y. Chen, and T. Chen. 2016. Cambricon: An Instruction Set Architecture for Neural Networks. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 393--405.

Digital Library

[55]

Christos Louizos, Max Welling, and Diederik P. Kingma. 2018. Learning Sparse Neural Networks through $L_0$ Regularization. arXiv: 1712.01312 [cs, stat] (June 2018). arXiv:1712.01312 [cs, stat]

[56]

Andrew Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher Potts. 2011. Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies. 142--150.

Digital Library

[57]

M. Mahmoud, I. Edo, A. H. Zadeh, O. Mohamed Awad, G. Pekhimenko, J. Albericio, and A. Moshovos. 2020. TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training. In 2020 53rd Annual IEEE/ACMInternational Symposium on Microarchitecture (MICRO). 781--795.

[58]

Huizi Mao, Song Han, Jeff Pool, Wenshuo Li, Xingyu Liu, Yu Wang, and William J. Dally. 2017. Exploring the Regularity of Sparse Structure in Convolutional Neural Networks. arXiv:1705.08922 [cs, stat] (June 2017). arXiv:1705.08922 [cs, stat]

[59]

Mayler Martins, Jody Maick Matos, Renato P. Ribas, André Reis, Guilherme Schlinker, Lucio Rech, and Jens Michelsen. 2015. Open Cell Library in 15Nm FreePDK Technology (ISPD '15). ACM, 171--178.

Digital Library

[60]

E. Medina and E. Dagan. 2020. Habana Labs Purpose-Built AI Inference and Training Processor Architectures: Scaling AI Training Systems Using Standard Ethernet With Gaudi Processor. IEEE Micro 40, 2 (March 2020), 17--24.

[61]

Dmitry Molchanov, Arsenii Ashukha, and Dmitry Vetrov. 2017. Variational Dropout Sparsifies Deep Neural Networks. In International Conference on Machine Learning. PMLR, 2498--2507.

[62]

Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2017. Pruning Convolutional Neural Networks for Resource Efficient Inference. arXiv:1611.06440 [cs, stat] (June 2017). arXiv:1611.06440 [cs, stat]

[63]

Hesham Mostafa and Xin Wang. 2019. Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization. In International Conference on Machine Learning. PMLR, 4646--4655.

[64]

Vinod Nair and Geoffrey E. Hinton. 2010. Rectified Linear Units Improve Restricted Boltzmann Machines. In ICML.

Digital Library

[65]

Thomas Norrie, Nishant Patil, Doe Hyun Yoon, George Kurian, Sheng Li, James Laudon, Cliff Young, Norman Jouppi, and David Patterson. 2021. The Design Process for Google's Training Chips: TPUv2 and TPUv3. IEEE Micro 41, 2 (March 2021), 56--63.

[66]

S. Pal, J. Beaumont, D. Park, A. Amarnath, S. Feng, C. Chakrabarti, H. Kim, D. Blaauw, T. Mudge, and R. Dreslinski. 2018. OuterSPACE: An Outer Product Based Sparse Matrix Multiplication Accelerator. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 724--736.

[67]

A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. Emer, S. W. Keckler, andW. J. Dally. 2017. SCNN: An Accelerator for Compressed-Sparse Convolutional Neural Networks. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) (ISCA'17). 27--40.

Digital Library

[68]

E. Qin, A. Samajdar, H. Kwon, V. Nadella, S. Srinivasan, D. Das, B. Kaul, and T. Krishna. 2020. SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). 58--70.

[69]

Md Aamir Raihan and Tor Aamodt. 2020. Sparse Weight Activation Training. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 15625--15638.

[70]

M. Rhu, M. O'Connor, N. Chatterjee, J. Pool, Y. Kwon, and S. W. Keckler. 2018. Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 78--91.

[71]

David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1986. Learning Representations by Back-Propagating Errors. Nature 323, 6088 (Oct. 1986), 533--536.

[72]

David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, and Demis Hassabis. 2017. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv:1712.01815 [cs] (Dec. 2017). arXiv:1712.01815 [cs]

[73]

Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 [cs] (Sept. 2014). arXiv:1409.1556 [cs]

[74]

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. The Journal of Machine Learning Research 15, 1 (Jan. 2014), 1929--1958.

Digital Library

[75]

James E. Stine, Ivan Castellanos, Michael Wood, Jeff Henson, Fred Love, W. Rhett Davis, Paul D. Franzon, Michael Bucher, Sunil Basavarajaiah, Julie Oh, and Ravi Jenkal. 2007. FreePDK: An Open-Source Variation-Aware Design Kit. In 2007 IEEE International Conference on Microelectronic Systems Education (MSE'07). 173--174.

Digital Library

[76]

Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S Emer. 2017. Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE 105, 12 (2017), 2295--2329.

[77]

Mingxing Tan and Quoc Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In International Conference on Machine Learning. PMLR, 6105--6114.

[78]

Seiya Tokui, Ryosuke Okuta, Takuya Akiba, Yusuke Niitani, Toru Ogawa, Shunta Saito, Shuji Suzuki, Kota Uenishi, Brian Vogel, and Hiroyuki Yamazaki Vincent. 2019. Chainer: A deep learning framework for accelerating the research cycle. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2002--2011.

Digital Library

[79]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).

[80]

Isak Edo Vivancos, Ali Hadizaden, and Omar Mohamed Awad. 2021. DNNSim. https://github.com/isakedo/DNNsim.

[81]

Yang Wang, Chen Zhang, Zhiqiang Xie, Cong Guo, Yunxin Liu, and Jingwen Leng. 2021. Dual-Side Sparse Tensor Core. In Proceedings of the 48th International Symposium on Computer Architecture (ISCA'21). Association for Computing Machinery, 1083--1095.

Digital Library

[82]

Shuang Wu, Guoqi Li, Feng Chen, and Luping Shi. 2018. Training and Inference with Integers in Deep Neural Networks. In International Conference on Learning Representations.

[83]

D. Yang, A. Ghasemazar, X. Ren, M. Golub, G. Lemieux, and M. Lis. 2020. Procrustes: A Dataflow and Accelerator for Sparse Deep Neural Network Training. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 711--724.

[84]

Xuan Yang, Mingyu Gao, Qiaoyi Liu, Jeff Setter, Jing Pu, Ankita Nayak, Steven Bell, Kaidi Cao, Heonjae Ha, Priyanka Raina, Christos Kozyrakis, and Mark Horowitz. 2020. Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '20). Association for Computing Machinery, New York, NY, USA, 369--383.

Digital Library

[85]

Sergey Zagoruyko and Nikos Komodakis. 2017. Wide Residual Networks. arXiv:1605.07146 [cs] (June 2017). arXiv:1605.07146 [cs]

[86]

S. Zhang, Z. Du, L. Zhang, H. Lan, S. Liu, L. Li, Q. Guo, T. Chen, and Y. Chen. 2016. Cambricon-X: An Accelerator for Sparse Neural Networks. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1--12.

[87]

Tianyun Zhang, Shaokai Ye, Kaiqi Zhang, Jian Tang, Wujie Wen, Makan Fardad, and Yanzhi Wang. 2018. A Systematic DNN Weight Pruning Framework Using Alternating Direction Method of Multipliers. In Proceedings of the European Conference on Computer Vision (ECCV). 184--199.

Digital Library

[88]

Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. 2018. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. arXiv:1606.06160 [cs] (Feb. 2018). arXiv:1606.06160 [cs]

[89]

Maohua Zhu, Tao Zhang, Zhenyu Gu, and Yuan Xie. 2019. Sparse Tensor Core: Algorithm and Hardware Co-Design for Vector-Wise Sparse Neural Networks on Modern GPUs. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '52). Association for Computing Machinery, New York, NY, USA, 359--371.

Digital Library

[90]

Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2018. Learning Transferable Architectures for Scalable Image Recognition. arXiv:1707.07012 [cs, stat] (April 2018). arXiv:1707.07012 [cs, stat]

Cited By

Chen YLouri ALiu SLombardi F(2024)A Balanced Sparse Matrix Convolution Accelerator for Efficient CNN TrainingIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2024.343083171:10(4638-4651)Online publication date: Oct-2024
https://doi.org/10.1109/TCSI.2024.3430831
Wang HFang JTang XYue ZLi JQin YGuan SYang QWang YLi CHu YYin S(2024)SOFA: A Compute-Memory Optimized Sparsity Accelerator via Cross-Stage Coordinated Tiling2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00093(1247-1263)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00093
Chen YLouri ALiu SLombardi F(2024)Approximate Communication in Network-on-Chips for Training and Inference of Image Classification ModelsDesign and Applications of Emerging Computer Systems10.1007/978-3-031-42478-6_27(709-740)Online publication date: 14-Jan-2024
https://doi.org/10.1007/978-3-031-42478-6_27
Show More Cited By

Index Terms

Anticipating and eliminating redundant computations in accelerated sparse training
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Systolic arrays
2. Computing methodologies
  1. Machine learning

Recommendations

Sparse Winograd Convolutional Neural Networks on Small-scale Systolic Arrays
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

The reconfigurability, energy-efficiency, and massive parallelism on FPGAs make them one of the best choices for implementing efficient deep learning accelerators. However, state-of-art implementations seldom consider the balance between high throughput ...
ZeD: A Generalized Accelerator for Variably Sparse Matrix Computations in ML
PACT '24: Proceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques

Modern Machine Learning (ML) models employ sparsity to mitigate storage and computation costs; but it gives rise to irregular and unstructured sparse matrix operations that dominate the execution time and require specialized accelerators to meet the ...
Spada: Accelerating Sparse Matrix Multiplication with Adaptive Dataflow
ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2

Sparse matrix-matrix multiplication (SpGEMM) is widely used in many scientific and deep learning applications. The highly irregular structures of SpGEMM limit its performance and efficiency on conventional computation platforms, and thus motivate a large ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '22: Proceedings of the 49th Annual International Symposium on Computer Architecture

June 2022

1097 pages

ISBN:9781450386104

DOI:10.1145/3470496

General Chairs:
Valentina Salapura
Google
,
Mohamed Zahran
New York University
,
Program Chairs:
Fred Chong
The University of Chicago
,
Lingjia Tang
The University of Michigan

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

In-Cooperation

IEEE CS TCAA: IEEE CS technical committee on architectural acoustics

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Sciences and Engineering Research Council of Canada (NSERC)

Conference

ISCA '22

Sponsor:

SIGARCH

ISCA '22: The 49th Annual International Symposium on Computer Architecture

June 18 - 22, 2022

New York, New York

Acceptance Rates

ISCA '22 Paper Acceptance Rate 67 of 400 submissions, 17%;

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
1,288
Total Downloads

Downloads (Last 12 months)111
Downloads (Last 6 weeks)3

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen YLouri ALiu SLombardi F(2024)A Balanced Sparse Matrix Convolution Accelerator for Efficient CNN TrainingIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2024.343083171:10(4638-4651)Online publication date: Oct-2024
https://doi.org/10.1109/TCSI.2024.3430831
Wang HFang JTang XYue ZLi JQin YGuan SYang QWang YLi CHu YYin S(2024)SOFA: A Compute-Memory Optimized Sparsity Accelerator via Cross-Stage Coordinated Tiling2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00093(1247-1263)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00093
Chen YLouri ALiu SLombardi F(2024)Approximate Communication in Network-on-Chips for Training and Inference of Image Classification ModelsDesign and Applications of Emerging Computer Systems10.1007/978-3-031-42478-6_27(709-740)Online publication date: 14-Jan-2024
https://doi.org/10.1007/978-3-031-42478-6_27
Fan ZLi WWang ZLiu TWu HLiu YWu MWu XYe XFan DSun NAn X(2023)Accelerating Convolutional Neural Networks by Exploiting the Sparsity of Output ActivationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.332493434:12(3253-3265)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1109/TPDS.2023.3324934

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten