skip to main content
10.1145/3508546.3508649acmotherconferencesArticle/Chapter ViewAbstractPublication PagesacaiConference Proceedingsconference-collections
research-article

An LSTM Acceleration Method Based on Embedded Neural Network Accelerator

Published: 25 February 2022 Publication History

Abstract

With the maturity of neural network technology, chips to accelerate neural network inference are emerging endlessly. Faced with the emerging complex neural network operators (such as LSTM) that are constantly evolving in neural network algorithms, it is unrealistic to modify the hardware design of the neural network inference chip to support the evolving new operators. Therefore, it has important research significance and practical value to make existing hardware support new operators through software. We propose an LSTM acceleration method based on an embedded neural network accelerator. Split the LSTM operator into multiple basic operators supported by the neural network accelerator by software, and optimize it. Finally, the embedded neural network accelerator supports LSTM operators quickly and efficiently. Experimental results show that the execution efficiency of LRCN model deployed on a low-power accelerator is x1.6 and X1.3 higher than that on CPU and GPU, respectively.

Supplementary Material

Poster (LSTM_poster.pdf)

References

[1]
Dario Amodei, Sundaram Ananthanarayanan, Rishita Anubhai, Jingliang Bai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Qiang Cheng, Guoliang Chen, 2016. Deep speech 2: End-to-end speech recognition in english and mandarin. In International conference on machine learning. PMLR, 173–182.
[2]
Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2625–2634.
[3]
Chang Gao, Daniel Neil, Enea Ceolini, Shih-Chii Liu, and Tobi Delbruck. 2018. DeltaRNN: A power-efficient recurrent neural network accelerator. In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 21–30.
[4]
Klaus Greff, Rupesh K Srivastava, Jan Koutník, Bas R Steunebrink, and Jürgen Schmidhuber. 2016. LSTM: A search space odyssey. IEEE transactions on neural networks and learning systems 28, 10(2016), 2222–2232.
[5]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
[6]
Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th annual international symposium on computer architecture. 1–12.
[7]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84–90.
[8]
Heng Liao, Jiajin Tu, Jing Xia, and Xiping Zhou. 2019. Davinci: A scalable architecture for neural network computing. In 2019 IEEE Hot Chips 31 Symposium (HCS). IEEE Computer Society, 1–44.
[9]
David MQ Nelson, Adriano CM Pereira, and Renato A de Oliveira. 2017. Stock market’s price movement prediction with LSTM neural networks. In 2017 International joint conference on neural networks (IJCNN). IEEE, 1419–1426.
[10]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019), 8026–8037.
[11]
Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, and Jeremy Kepner. 2021. AI Accelerator Survey and Trends. arXiv preprint arXiv:2109.08957(2021).
[12]
Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402(2012).
[13]
Hao Xue, Du Q Huynh, and Mark Reynolds. 2018. SS-LSTM: A hierarchical LSTM model for pedestrian trajectory prediction. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 1186–1194.
[14]
Pu Zhang, Wanli Ouyang, Pengfei Zhang, Jianru Xue, and Nanning Zheng. 2019. Sr-lstm: State refinement for lstm towards pedestrian trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12085–12094.
[15]
Xiaofan Zhang, Yuan Ma, Jinjun Xiong, Wen-mei Hwu, Volodymyr Kindratenko, and Deming Chen. 2021. Exploring HW/SW Co-Design for Video Analysis on CPU-FPGA Heterogeneous Systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2021).
[16]
Yiwei Zhang, Chao Wang, Lei Gong, Yuntao Lu, Fan Sun, Chongchong Xu, Xi Li, and Xuehai Zhou. 2017. Implementation and optimization of the accelerator based on fpga hardware for lstm network. In 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC). IEEE, 614–621.
[17]
Yiwei Zhang, Chao Wang, Lei Gong, Yuntao Lu, Fan Sun, Chongchong Xu, Xi Li, and Xuehai Zhou. 2017. A power-efficient accelerator based on FPGAs for LSTM network. In 2017 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 629–630.
[18]
Xuda Zhou, Zidong Du, Qi Guo, Shaoli Liu, Chengsi Liu, Chao Wang, Xuehai Zhou, Ling Li, Tianshi Chen, and Yunji Chen. 2018. Cambricon-S: Addressing irregularity in sparse neural networks through a cooperative software/hardware approach. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 15–28.

Cited By

View all
  • (2024)High precision temperature measurement for cryogenic temperature sensors based on deep learning technologyCryogenics10.1016/j.cryogenics.2024.103830140(103830)Online publication date: Jun-2024
  • (2022)Sequential Characteristics Based Operators Disassembly Quantization Method for LSTM LayersApplied Sciences10.3390/app12241274412:24(12744)Online publication date: 12-Dec-2022
  • (2022)A New Quantization Deployment Method of Neural Network Models Integrating LSTM Layers2022 5th International Conference on Pattern Recognition and Artificial Intelligence (PRAI)10.1109/PRAI55851.2022.9904120(1299-1303)Online publication date: 19-Aug-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ACAI '21: Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence
December 2021
699 pages
ISBN:9781450385053
DOI:10.1145/3508546
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 February 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Acceleration
  2. Deep Learning
  3. LSTM
  4. Model deploy

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ACAI'21

Acceptance Rates

Overall Acceptance Rate 173 of 395 submissions, 44%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)32
  • Downloads (Last 6 weeks)6
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)High precision temperature measurement for cryogenic temperature sensors based on deep learning technologyCryogenics10.1016/j.cryogenics.2024.103830140(103830)Online publication date: Jun-2024
  • (2022)Sequential Characteristics Based Operators Disassembly Quantization Method for LSTM LayersApplied Sciences10.3390/app12241274412:24(12744)Online publication date: 12-Dec-2022
  • (2022)A New Quantization Deployment Method of Neural Network Models Integrating LSTM Layers2022 5th International Conference on Pattern Recognition and Artificial Intelligence (PRAI)10.1109/PRAI55851.2022.9904120(1299-1303)Online publication date: 19-Aug-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media