skip to main content
research-article

Compressing RNNs to Kilobyte Budget for IoT Devices Using Kronecker Products

Published: 14 July 2021 Publication History

Abstract

Micro-controllers (MCUs) make up most of the processors in the world with widespread applicability from automobile to medical devices. The Internet of Things promises to enable these resource-constrained MCUs with machine learning algorithms to provide always-on intelligence. Many Internet of Things applications consume time-series data that are naturally suitable for recurrent neural networks (RNNs) like LSTMs and GRUs. However, RNNs can be large and difficult to deploy on these devices, as they have few kilobytes of memory. As a result, there is a need for compression techniques that can significantly compress RNNs without negatively impacting task accuracy. This article introduces a method to compress RNNs for resource-constrained environments using the Kronecker product (KP). KPs can compress RNN layers by 16× to 38× with minimal accuracy loss. By quantizing the resulting models to 8 bits, we further push the compression factor to 50×. We compare KP with other state-of-the-art compression techniques across seven benchmarks spanning five different applications and show that KP can beat the task accuracy achieved by other techniques by a large margin while simultaneously improving the inference runtime. Sometimes the KP compression mechanism can introduce an accuracy loss. We develop a hybrid KP approach to mitigate this. Our hybrid KP algorithm provides fine-grained control over the compression ratio, enabling us to regain accuracy lost during compression by adding a small number of model parameters.

References

[1]
Kaggle. 2020. Yelp Review Dataset. Retrieved August 3, 2020 from https://www.kaggle.com/yelp-dataset/yelp-dataset.
[2]
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, et al. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Retrieved June 1, 2021 from https://www.tensorflow.org/. (Software available from tensorflow.org.)
[3]
Colby R. Banbury, Vijay Janapa Reddi, Max Lam, William Fu, Amin Fazel, Jeremy Holleman, Xinyuan Huang, et al. 2020. Benchmarking TinyML systems: Challenges and direction. arxiv:cs.PF/2003.04821.
[4]
Giuseppe Giovanni Calvi, Ahmad Moniri, Mahmoud Mahfouz, Zeyang Yu, Qibin Zhao, and Danilo P. Mandic. 2019. Tucker tensor layer in fully connected neural networks. arxiv:1903.06133.
[5]
Soravit Changpinyo, Mark Sandler, and Andrey Zhmoginov. 2017. The power of sparsity in convolutional neural networks. arxiv:1702.06257.
[6]
Y. Cheng, F. X. Yu, R. S. Feris, S. Kumar, A. Choudhary, and S. Chang. 2015. An exploration of parameter redundancy in deep networks with circulant projections. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV’15). 2857–2865.
[7]
Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arxiv:1406.1078.
[8]
Krzysztof Choromanski, Mark Rowland, and Adrian Weller. 2017. The unreasonable effectiveness of structured random orthogonal embeddings. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). 218–227. http://dl.acm.org/citation.cfm?id=3294771.3294792.
[9]
Matthieu Courbariaux and Yoshua Bengio. 2016. BinaryNet: Training deep neural networks with weights and activations constrained to +1 or –1. arxiv:1602.02830.
[10]
Misha Denil, Babak Shakibi, Laurent Dinh, Marc’Aurelio Ranzato, and Nando de Freitas. 2013. Predicting parameters in deep learning. arxiv:1306.0543.
[11]
Caiwen Ding, Ao Ren, Geng Yuan, Xiaolong Ma, Jiayu Li, Ning Liu, Bo Yuan, and Yanzhi Wang. 2018. Structured weight matrices-based hardware accelerators in deep neural networks: FPGAs and ASICs. In Proceedings of the 2018 on Great Lakes Symposium on VLSI (GLSVLSI’18). ACM, New York, NY, 353–358.
[12]
Trevor Gale, Erich Elsen, and Sara Hooker. 2019. The state of sparsity in deep neural networks. arxiv:1902.09574.
[13]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press, Cambridge, MA. http://www.deeplearningbook.org.
[14]
Dibakar Gope, Jesse Beu, Urmish Thakker, and Matthew Mattina. 2020. Ternary MobileNets via per-layer hybrid filter banks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR’20).
[15]
Gael Guennebau and Benoit Jacob. 2009. Eigen Library. Retrieved December 21, 2018 from http://eigen.tuxfamily.org/.
[16]
Nils Y. Hammerla, Shane Halloran, and Thomas Ploetz. 2016. Deep, convolutional, and recurrent models for human activity recognition using wearables. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI’16). 1533–1540.
[17]
Song Han, Huizi Mao, and William J. Dally. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In Proceedings of the International Conference on Learning Representations (ICLR’16).
[18]
Qinyao He, He Wen, Shuchang Zhou, Yuxin Wu, Cong Yao, Xinyu Zhou, and Yuheng Zou. 2016. Effective quantization methods for recurrent neural networks. arxiv:1611.10176.
[19]
Qinyao He, He Wen, Shuchang Zhou, Yuxin Wu, Cong Yao, Xinyu Zhou, and Yuheng Zou. 2016. Effective quantization methods for recurrent neural networks. arxiv:1611.10176.
[20]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (Nov. 1997), 1735–1780.
[21]
Xueqin Huang, Urmish Thakker, Dibakar Gope, and Jesse Beu. 2020. Pushing the envelope of dynamic spatial gating technologies. In Proceedings of the 2nd International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things. ACM, New York, NY, 2126.
[22]
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Quantized neural networks: Training neural networks with low precision weights and activations. arxiv:1609.07061. https://doi.org/10.1145/3417313.3429380
[23]
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. Journal of Machine Learning Research 18, 1 (Jan. 2017), 6869–6898. http://dl.acm.org/citation.cfm?id=3122009.3242044.
[24]
J. J. Hull. 1994. A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence 16, 5 (May 1994), 550–554.
[25]
Cijo Jose, Moustapha Cissé, and François Fleuret. 2017. Kronecker recurrent units. arxiv:1705.10142.
[26]
Oleksii Kuchaiev and Boris Ginsburg. 2017. Factorization tricks for LSTM networks. arxiv:1703.10722.
[27]
Ashish Kumar, Saurabh Goyal, and Manik Varma. 2017. Resource-efficient machine learning in 2 KB RAM for the Internet of Things. In Proceedings of the 34th International Conference on Machine Learning (ICML’17), Vol. 70. 1935–1944. http://proceedings.mlr.press/v70/kumar17a.html.
[28]
Aditya Kusupati, Manish Singh, Kush Bhatia, Ashish Kumar, Prateek Jain, and Manik Varma. 2019. FastGRNN: A fast, accurate, stable and tiny kilobyte sized gated recurrent neural network. arxiv:1901.02358.
[29]
Alan J. Laub. 2005. Matrix Analysis for Scientists and Engineers. Vol. 91. SIAM.
[30]
V. Lebedev, Y. Ganin, M. Rakhuba, I. Oseledets, and V. Lempitsky. 2014. Speeding-up convolutional neural networks using fine-tuned cp-decomposition. arxiv:cs.CV/1412.6553.
[31]
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (Nov. 1998), 2278–2324.
[32]
Zhouhan Lin, Matthieu Courbariaux, Roland Memisevic, and Yoshua Bengio. 2015. Neural networks with few multiplications. arxiv:1510.03009. https://doi.org/10.1109/5.726791
[33]
Christos Louizos, Max Welling, and Diederik P. Kingma. 2017. Learning sparse neural networks through regularization. arxiv:1712.01312.
[34]
James Nagy. 2010. Introduction to Kronecker Products. Retrieved May 20, 2019 from http://www.mathcs.emory.edu/ nagy/courses/fall10/515/KroneckerIntro.pdf.
[35]
Sharan Narang, Eric Undersander, and Gregory F. Diamos. 2017. Block-sparse recurrent neural networks. arxiv:1711.02782.
[36]
Kirill Neklyudov, Dmitry Molchanov, Arsenii Ashukha, and Dmitry Vetrov. 2017. Structured Bayesian pruning via log-normal multiplicative noise. arxiv:1705.07283.
[37]
Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2012. Understanding the exploding gradient problem. arxiv:1211.5063.
[38]
Ravi Raju, Dibakar Gope, Urmish Thakker, and Jesse Beu. 2020. Understanding the impact of dynamic channel pruning on conditionally parameterized convolutions. In Proceedings of the 2nd International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things (AIChallengeIoT’20). ACM, New York, NY, 27–33.
[39]
D. Roggen, A. Calatroni, M. Rossi, T. Holleczek, K. Förster, G. Tröster, P. Lukowicz, et al. 2010. Collecting complex activity datasets in highly rich networked sensor environments. In Proceedings of the 2010 7th International Conference on Networked Sensing Systems (INSS’10). 233–240.
[40]
M. Schuster and K. K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45, 11 (Nov. 1997), 2673–2681.
[41]
Vikas Sindhwani, Tara Sainath, and Sanjiv Kumar. 2015. Structured transforms for small-footprint deep learning. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, Red Hook, NY, 3088–3096.
[42]
Yu Tang, Zhigang Kan, Dequan Sun, Linbo Qiao, Jingjing Xiao, Zhiquan Lai, and Dongsheng Li. 2020. ADMMiRNN: Training RNN with stable convergence via an efficient ADMM approach. arxiv:2006.05622.
[43]
Jin Tao, Urmish Thakker, Ganesh Dasika, and Jesse Beu. 2019. Skipping RNN state updates without retraining the original model. In Proceedings of the 1st Workshop on Machine Learning on Edge in Sensor Systems (SenSys-ML’19). ACM, New York, NY, 3136.
[44]
Urmish Thakker, Jesse Beu, Dibakar Gope, Ganesh Dasika, and Matthew Mattina. 2020. Rank and run-time aware compression of NLP applications. In Proceedings of the Workshop on Simple and Efficient Natural Language Processing (SustaiNLP’20). ACM, New York, NY, 8–18.
[45]
Urmish Thakker, Jesse G. Beu, Dibakar Gope, Ganesh Dasika, and Matthew Mattina. 2019. Run-time efficient RNN compression for inference on edge devices. arxiv:1906.04886.
[46]
Urmish Thakker, Ganesh Dasika, Jesse G. Beu, and Matthew Mattina. 2019. Measuring scheduling efficiency of RNNs for NLP applications. arxiv:1904.03302.
[47]
Urmish Thakker, Paul Whatmough, Matthew Mattina, and Jesse Beu. 2020. Compressing language models using doped Kronecker products. arxiv:cs.LG/2001.08896.
[48]
Anna Thomas, Albert Gu, Tri Dao, Atri Rudra, and Christopher Ré. 2018. Learning compressed transforms with low displacement rank. In Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). Curran Associates, Red Hook, NY, 9066–9078. http://papers.nips.cc/paper/8119-learning-compressed-transforms-with-low-displacement-rank.pdf.
[49]
Lloyd Trefethen and David Bau. 1997. Numerical Linear Algebra. SIAM.
[50]
Vincent Vanhoucke, Andrew Senior, and Mark Z. Mao. 2011. Improving the speed of neural networks on CPUs. In Proceedings of theDeep Learning and Unsupervised Feature Learning Workshop (NIPS’11).
[51]
Shuo Wang, Zhe Li, Caiwen Ding, Bo Yuan, Qinru Qiu, Yanzhi Wang, and Yun Liang. 2018. C-LSTM: Enabling efficient LSTM using structured compression techniques on FPGAs. In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’18). ACM, New York, NY, 11–20.
[52]
Pete Warden. 2018. Speech Commands: A dataset for limited-vocabulary speech recognition. arxiv:1804.03209.
[53]
Scott Wisdom, Thomas Powers, John R. Hershey, Jonathan Le Roux, and Les Atlas. 2016. Full-capacity unitary recurrent neural networks. arxiv:1611.00035.
[54]
Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. 2014. Recurrent neural network regularization. arxiv:1409.2329.
[55]
Jiong Zhang, Qi Lei, and Inderjit S. Dhillon. 2018. Stabilizing gradients for deep neural networks via efficient SVD parameterization. arxiv:1803.09327.
[56]
X. Zhang, F. X. Yu, R. Guo, S. Kumar, S. Wang, and S. Chang. 2015. Fast orthogonal projection based on Kronecker product. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV’15). 2929–2937.
[57]
Yundong Zhang, Naveen Suda, Liangzhen Lai, and Vikas Chandra. 2017. Hello Edge: Keyword spotting on microcontrollers. arxiv:1711.07128.
[58]
Shuchang Zhou and Jia-Nan Wu. 2015. Compression of fully-connected layer in neural network by Kronecker product. arxiv:1507.05775.
[59]
Michael Zhu and Suyog Gupta. 2017. To prune, or not to prune: Exploring the efficacy of pruning for model compression. arxiv:1710.01878.

Cited By

View all
  • (2024)TinyNS: Platform-aware Neurosymbolic Auto Tiny Machine LearningACM Transactions on Embedded Computing Systems10.1145/360317123:3(1-48)Online publication date: 11-May-2024
  • (2022)Machine Learning for Microcontroller-Class Hardware: A ReviewIEEE Sensors Journal10.1109/JSEN.2022.321077322:22(21362-21390)Online publication date: 15-Nov-2022
  • (2021)Opportunity++: A Multimodal Dataset for Video- and Wearable, Object and Ambient Sensors-Based Human Activity RecognitionFrontiers in Computer Science10.3389/fcomp.2021.7920653Online publication date: 20-Dec-2021

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Journal on Emerging Technologies in Computing Systems
ACM Journal on Emerging Technologies in Computing Systems  Volume 17, Issue 4
October 2021
446 pages
ISSN:1550-4832
EISSN:1550-4840
DOI:10.1145/3472280
  • Editor:
  • Ramesh Karri
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 14 July 2021
Accepted: 01 November 2020
Revised: 01 September 2020
Received: 01 April 2020
Published in JETC Volume 17, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Neural networks
  2. micro-controllers
  3. matrix decomposition
  4. Kronecker products
  5. model compression
  6. IoT

Qualifiers

  • Research-article
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)2
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)TinyNS: Platform-aware Neurosymbolic Auto Tiny Machine LearningACM Transactions on Embedded Computing Systems10.1145/360317123:3(1-48)Online publication date: 11-May-2024
  • (2022)Machine Learning for Microcontroller-Class Hardware: A ReviewIEEE Sensors Journal10.1109/JSEN.2022.321077322:22(21362-21390)Online publication date: 15-Nov-2022
  • (2021)Opportunity++: A Multimodal Dataset for Video- and Wearable, Object and Ambient Sensors-Based Human Activity RecognitionFrontiers in Computer Science10.3389/fcomp.2021.7920653Online publication date: 20-Dec-2021

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media