skip to main content
10.1145/3362743.3362965acmconferencesArticle/Chapter ViewAbstractPublication PagessensysConference Proceedingsconference-collections
research-article

Skipping RNN State Updates without Retraining the Original Model

Published:10 November 2019Publication History

ABSTRACT

Recurrent Neural Networks (RNNs) break a time-series input (or a sentence) into multiple time-steps (or words) and process it one time-step (word) at a time. However, not all of these time-steps (words) need to be processed to determine the final output accurately. Prior work has exploited this intuition by incorporating an additional predictor in front of the RNN model to prune time-steps that are not relevant. However, they jointly train the predictor and the RNN model, allowing one to learn from the mistakes of the other. In this work we present a method to skip RNN time-steps without retraining or fine tuning the original RNN model. Using an ideal predictor, we show that even without retraining the original model, we can train a predictor to skip 45% of steps for the SST dataset and 80% of steps for the IMDB dataset without impacting the model accuracy. We show that the decision to skip is not easy by comparing against 5 different baselines based on solutions derived from domain knowledge. Finally, we present a case study about the cost and accuracy benefits of realizing such a predictor. This realistic predictor on the SST dataset is able to reduce the computation by more than 25% with at most 0.3% loss in accuracy while being 40× smaller than the original RNN model.

References

  1. Victor Campos, Brendan Jou, Xavier Giró i Nieto, Jordi Torres, and Shih-Fu Chang. Skip RNN: learning to skip state updates in recurrent neural networks. CoRR, abs/1708.06834, 2017.Google ScholarGoogle Scholar
  2. Ting Chen, Ji Lin, Tian Lin, Song Han, Chong Wang, and Denny Zhou. Adaptive mixture of low-rank factorizations for compact neural modeling. Advances in neural information processing systems (CDNNRIA workshop), 2018.Google ScholarGoogle Scholar
  3. Caiwen Ding, Siyu Liao, Yanzhi Wang, Zhe Li, Ning Liu, Youwei Zhuo, Chao Wang, Xuehai Qian, Yu Bai, Geng Yuan, Xiaolong Ma, Yipeng Zhang, Jian Tang, Qinru Qiu, Xue Lin, and Bo Yuan. Circnn: Accelerating and compressing deep neural networks using block-circulant weight matrices. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-50 '17, pages 395--408, New York, NY, USA, 2017. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Tsu-Jui Fu and Wei-Yun Ma. Speed reading: Learning to read ForBackward via shuttle. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4439--4448, Brussels, Belgium, October-November 2018. Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  5. Trevor Gale, Erich Elsen, and Sara Hooker. The state of sparsity in deep neural networks. CoRR, abs/1902.09574, 2019.Google ScholarGoogle Scholar
  6. Dibakar Gope, Ganesh Dasika, and Matthew Mattina. Ternary hybrid neural-tree networks for highly constrained iot applications. CoRR, abs/1903.01531, 2019.Google ScholarGoogle Scholar
  7. Artem M. Grachev, Dmitry I. Ignatov, and Andrey V. Savchenko. Neural networks compression for language modeling. In B. Uma Shankar, Kuntal Ghosh, Deba Prasad Mandal, Shubhra Sankar Ray, David Zhang, and Sankar K. Pal, editors, Pattern Recognition and Machine Intelligence, pages 351--357, Cham, 2017. Springer International Publishing.Google ScholarGoogle ScholarCross RefCross Ref
  8. Jan Koutník, Klaus Greff, Faustino J. Gomez, and Jürgen Schmidhuber. A clockwork RNN. CoRR, abs/1402.3511, 2014.Google ScholarGoogle Scholar
  9. Daniel Neil, Michael Pfeiffer, and Shih-Chii Liu. Phased LSTM: accelerating recurrent network training for long or event-based sequences. CoRR, abs/1610.09513, 2016.Google ScholarGoogle Scholar
  10. Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28, ICML'13, pages III--1310--III--1318. JMLR.org, 2013.Google ScholarGoogle Scholar
  11. Min Joon Seo, Sewon Min, Ali Farhadi, and Hannaneh Hajishirzi. Neural speed reading via skim-rnn. CoRR, abs/1711.02085, 2017.Google ScholarGoogle Scholar
  12. Urmish Thakker, Jesse G. Beu, Dibakar Gope, Ganesh Dasika, and Matthew Mattina. Run-time efficient RNN compression for inference on edge devices. CoRR, abs/1906.04886, 2019.Google ScholarGoogle Scholar
  13. Urmish Thakker, Jesse G. Beu, Dibakar Gope, Chu Zhou, Igor Fedorov, Ganesh Dasika, and Matthew Mattina. Compressing rnns for iot devices by 15-38x using kronecker products. CoRR, abs/1906.02876, 2019.Google ScholarGoogle Scholar
  14. Urmish Thakker, Ganesh Dasika, Jesse G. Beu, and Matthew Mattina. Measuring scheduling efficiency of rnns for NLP applications. CoRR, abs/1904.03302, 2019.Google ScholarGoogle Scholar
  15. Michael Tschannen, Aran Khanna, and Anima Anandkumar. Strassennets: Deep learning with a multiplication budget. CoRR, abs/1712.03942, 2017.Google ScholarGoogle Scholar
  16. Vincent Vanhoucke, Andrew Senior, and Mark Z. Mao. Improving the speed of neural networks on cpus. In Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011, 2011.Google ScholarGoogle Scholar
  17. Adams Wei Yu, Hongrae Lee, and Quoc V. Le. Learning to skim text. CoRR, abs/1704.06877, 2017.Google ScholarGoogle Scholar
  18. Keyi Yu, Yang Liu, Alexander G. Schwing, and Jian Peng. Fast and accurate text classification: Skimming, rereading and early stopping, 2018.Google ScholarGoogle Scholar
  19. Chenzhuo Zhu, Song Han, Huizi Mao, and William J. Dally. Trained ternary quantization. CoRR, abs/1612.01064, 2016.Google ScholarGoogle Scholar
  20. Michael Zhu and Suyog Gupta. To prune, or not to prune: exploring the efficacy of pruning for model compression. arXiv e-prints, page arXiv:1710.01878, October 2017.Google ScholarGoogle Scholar

Index Terms

  1. Skipping RNN State Updates without Retraining the Original Model

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SenSys-ML 2019: Proceedings of the 1st Workshop on Machine Learning on Edge in Sensor Systems
          November 2019
          47 pages
          ISBN:9781450370110
          DOI:10.1145/3362743

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 10 November 2019

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          SenSys-ML 2019 Paper Acceptance Rate7of14submissions,50%Overall Acceptance Rate7of14submissions,50%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader