Multi-layer LSTM Parallel Optimization Based on Hardware and Software Cooperation

Chen, Qingfeng; Wu, Jing; Huang, Feihu; Han, Yu; Zhao, Qiming

doi:10.1007/978-3-031-10986-7_55

Qingfeng Chen¹²,
Jing Wu¹²,
Feihu Huang¹²,
Yu Han¹² &
…
Qiming Zhao¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13369))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

2101 Accesses

Abstract

LSTM’s special gate structure and memory unit make it suitable for solving problems that are related to time series. It has excellent performance in the fields of machine translation and reasoning. However, LSTM also has some shortcomings, such as low parallelism, which leads to insufficient computing speed. Some existing optimization ideas only focus on one of the software and hardware. The former mostly focuses on model accuracy, and CPU accelerated LSTM doesn’t dynamically adjust to network characteristics; While the latter can be based on the LSTM model structure. Customized accelerators are often limited by the structure of LSTM and cannot fully utilize the advantages of the hardware. This paper proposed a multi-layer LSTM optimization scheme based on the idea of software and hardware collaboration. We used the pruning by row scheme to greatly reduce the number of parameters while ensuring accuracy, making it adapt to the parallel structure of the hardware. From the perspective of software, the multi-layer LSTM module was analyzed. It was concluded that some neurons in different layers could be calculated in parallel. Therefore, this paper redesigned the computational order of the multilayer LSTM so that the model guaranteed its own timing properly and it was hardware friendly at the same time. Experiments showed that our throughput increased by 10x compared with the CPU implementation. Compared with other hardware accelerators, the throughput increased by 1.2x-1.4x, and the latency and resource utilization had also been improved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Mapping Large LSTMs to FPGAs with Weight Reuse

Article Open access 09 July 2020

Efficient Hardware Architectures for 1D- and MD-LSTM Networks

Article Open access 02 July 2020

Parallelizable Simple Recurrent Units with Hierarchical Memory

References

Qiu, M., Xue, C., et al.: Energy minimization with soft real-time and DVS for uniprocessor and multiprocessor embedded systems. In: IEEE DATE, pp. 1–6 (2007)
Google Scholar
Qiu, M., Xue, C., et al.: Efficient algorithm of energy minimization for heterogeneous wireless sensor network. In: IEEE EUC, pp. 25–34 (2006)
Google Scholar
Qiu, M., Liu, J., et al.: A novel energy-aware fault tolerance mechanism for wireless sensor networks. In: IEEE/ACM Conference on GCC (2011)
Google Scholar
Wu, G., Zhang, H., et al.: A decentralized approach for mining event correlations in distributed system monitoring. JPDC 73(3), 330–340 (2013)
MATH Google Scholar
Lu, Z., et al.: IoTDeM: an IoT big data-oriented MapReduce performance prediction extended model in multiple edge clouds. JPDC 118, 316–327 (2018)
Google Scholar
Qiu, L., Gai, K., Qiu, M.: Optimal big data sharing approach for tele-health in cloud computing. In: IEEE SmartCloud, pp. 184–189 (2016)
Google Scholar
Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization (2014). arXiv preprint, arXiv:1409.2329
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18, 602–610 (2005)
Google Scholar
Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: Continual prediction with LSTM. Neural Comput. 12(10), 2451–2471 (2000)
Article Google Scholar
Donahue, J., Anne Hendricks, L., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: IEEE CVPR, pp. 2625–2634 (2015)
Google Scholar
Bank-Tavakoli, E., Ghasemzadeh, S.A., et al.: POLAR: a pipelined/overlapped FPGA-based LSTM accelerator. IEEE TVLSI 28(3), 838–842 (2020)
Google Scholar
Liao, Y., Li, H., Wang, Z.: FPGA based real-time processing architecture for recurrent neural network. In: Xhafa, F., Patnaik, S., Zomaya, A.Y. (eds.) IISA 2017. AISC, vol. 686, pp. 705–709. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-69096-4_99
Chapter Google Scholar
Chang, X.M., Culurciello, E.: Hardware accelerators for recurrent neural networks on fpga. In: IEEE Conference on ISCAS, pp. 1–4 (2017)
Google Scholar
Li, S., Wu, C., et al.: Fpga acceleration of recurrent neural network based language model. In: IEEE Symposium on Field-Programmable Custom Computing Machine, pp. 111–118 (2015)
Google Scholar
Li, Y., et al.: Intelligent fault diagnosis by fusing domain adversarial training and maximum mean discrepancy via ensemble learning. IEEE TII 17(4), 2833–2841 (2020)
Google Scholar
Qiu, H., Zheng, Q., et al.: Deep residual learning-based enhanced jpeg compression in the internet of things. IEEE TII 17(3), 2124–2133 (2020)
Google Scholar
Gai, K., et al.: Efficiency-aware workload optimizations of heterogeneous cloud computing for capacity planning in financial industry. In: IEEE CSCloud (2015)
Google Scholar
Qiu, M., et al.: Data transfer minimization for financial derivative pricing using monte carlo simulation with GPU in 5G. JCS 29(16), 2364–2374 (2016)
Google Scholar
Qiu, H., et al.: Topological graph convolutional network-based urban traffic flow and density prediction. IEEE TITS 22, 4560–4569 (2020)
Google Scholar
Qiu, H., et al.: Secure health data sharing for medical cyber-physical systems for the healthcare 4.0. IEEE JBHI 24, 2499–2505 (2020)
Google Scholar
Qiu, M., Zhang, L., et al.: Security-aware optimization for ubiquitous computing systems with seat graph approach. JCSS 79(5), 518–529 (2013)
MathSciNet MATH Google Scholar
Qiu, M., Li, H., Sha, E.: Heterogeneous real-time embedded software optimization considering hardware platform. In: ACM SAC, pp. 1637–1641 (2009)
Google Scholar
Ferreira, J.C., Fonseca, J.: An FPGA implementation of a long short-term memory neural network. In: IEEE Conference on ReConFigurable, pp. 1–8 (2016)
Google Scholar
Guan, Y., Yuan, Z., Sun, G., Cong, J.: Fpga-based accelerator for long short-term memory recurrent neural networks (2017)
Google Scholar
Ledwon, M., Cockburn, B.F., Han, J.: High-throughput FPGA-based hardware accelerators for deflate compression and decompression using high-level synthesis. IEEE Access 8, 62207–62217 (2020)
Article Google Scholar
Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding (2015). arXiv preprint, arXiv:1510.00149
Han, S., Kang, J., et al.: ESE: Efficient speech recognition engine with sparse LSTM on FPGA. In: ACM/SIGDA FPGA, pp. 75–84 (2017)
Google Scholar
Jouppi, N.P., Young, C., et al.: In-datacenter performance analysis of a tensor processing unit. In: 44th IEEE ISCA, pp. 1–12 (2017)
Google Scholar
Yadav, C., Bottou, L.: Cold case: The lost mnist digits (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, 430065, China
Qingfeng Chen, Jing Wu, Feihu Huang, Yu Han & Qiming Zhao

Authors

Qingfeng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jing Wu
View author publications
You can also search for this author in PubMed Google Scholar
Feihu Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yu Han
View author publications
You can also search for this author in PubMed Google Scholar
Qiming Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jing Wu .

Editor information

Editors and Affiliations

Télécom Paris, Paris, France
Gerard Memmi
Purdue University, West Lafayette, IN, USA
Baijian Yang
Shanghai Jiao Tong University, Shanghai, Shanghai, China
Linghe Kong
Nanyang Technological University, Singapore, Singapore
Tianwei Zhang
Texas A&M University – Commerce, Commerce, TX, USA
Meikang Qiu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Q., Wu, J., Huang, F., Han, Y., Zhao, Q. (2022). Multi-layer LSTM Parallel Optimization Based on Hardware and Software Cooperation. In: Memmi, G., Yang, B., Kong, L., Zhang, T., Qiu, M. (eds) Knowledge Science, Engineering and Management. KSEM 2022. Lecture Notes in Computer Science(), vol 13369. Springer, Cham. https://doi.org/10.1007/978-3-031-10986-7_55

Download citation

DOI: https://doi.org/10.1007/978-3-031-10986-7_55
Published: 19 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10985-0
Online ISBN: 978-3-031-10986-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-layer LSTM Parallel Optimization Based on Hardware and Software Cooperation