Skip to main content

Multi-layer LSTM Parallel Optimization Based on Hardware and Software Cooperation

  • Conference paper
  • First Online:
Knowledge Science, Engineering and Management (KSEM 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13369))

  • 2101 Accesses

Abstract

LSTM’s special gate structure and memory unit make it suitable for solving problems that are related to time series. It has excellent performance in the fields of machine translation and reasoning. However, LSTM also has some shortcomings, such as low parallelism, which leads to insufficient computing speed. Some existing optimization ideas only focus on one of the software and hardware. The former mostly focuses on model accuracy, and CPU accelerated LSTM doesn’t dynamically adjust to network characteristics; While the latter can be based on the LSTM model structure. Customized accelerators are often limited by the structure of LSTM and cannot fully utilize the advantages of the hardware. This paper proposed a multi-layer LSTM optimization scheme based on the idea of software and hardware collaboration. We used the pruning by row scheme to greatly reduce the number of parameters while ensuring accuracy, making it adapt to the parallel structure of the hardware. From the perspective of software, the multi-layer LSTM module was analyzed. It was concluded that some neurons in different layers could be calculated in parallel. Therefore, this paper redesigned the computational order of the multilayer LSTM so that the model guaranteed its own timing properly and it was hardware friendly at the same time. Experiments showed that our throughput increased by 10x compared with the CPU implementation. Compared with other hardware accelerators, the throughput increased by 1.2x-1.4x, and the latency and resource utilization had also been improved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Qiu, M., Xue, C., et al.: Energy minimization with soft real-time and DVS for uniprocessor and multiprocessor embedded systems. In: IEEE DATE, pp. 1–6 (2007)

    Google Scholar 

  2. Qiu, M., Xue, C., et al.: Efficient algorithm of energy minimization for heterogeneous wireless sensor network. In: IEEE EUC, pp. 25–34 (2006)

    Google Scholar 

  3. Qiu, M., Liu, J., et al.: A novel energy-aware fault tolerance mechanism for wireless sensor networks. In: IEEE/ACM Conference on GCC (2011)

    Google Scholar 

  4. Wu, G., Zhang, H., et al.: A decentralized approach for mining event correlations in distributed system monitoring. JPDC 73(3), 330–340 (2013)

    MATH  Google Scholar 

  5. Lu, Z., et al.: IoTDeM: an IoT big data-oriented MapReduce performance prediction extended model in multiple edge clouds. JPDC 118, 316–327 (2018)

    Google Scholar 

  6. Qiu, L., Gai, K., Qiu, M.: Optimal big data sharing approach for tele-health in cloud computing. In: IEEE SmartCloud, pp. 184–189 (2016)

    Google Scholar 

  7. Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization (2014). arXiv preprint, arXiv:1409.2329

  8. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  9. Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18, 602–610 (2005)

    Google Scholar 

  10. Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: Continual prediction with LSTM. Neural Comput. 12(10), 2451–2471 (2000)

    Article  Google Scholar 

  11. Donahue, J., Anne Hendricks, L., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: IEEE CVPR, pp. 2625–2634 (2015)

    Google Scholar 

  12. Bank-Tavakoli, E., Ghasemzadeh, S.A., et al.: POLAR: a pipelined/overlapped FPGA-based LSTM accelerator. IEEE TVLSI 28(3), 838–842 (2020)

    Google Scholar 

  13. Liao, Y., Li, H., Wang, Z.: FPGA based real-time processing architecture for recurrent neural network. In: Xhafa, F., Patnaik, S., Zomaya, A.Y. (eds.) IISA 2017. AISC, vol. 686, pp. 705–709. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-69096-4_99

    Chapter  Google Scholar 

  14. Chang, X.M., Culurciello, E.: Hardware accelerators for recurrent neural networks on fpga. In: IEEE Conference on ISCAS, pp. 1–4 (2017)

    Google Scholar 

  15. Li, S., Wu, C., et al.: Fpga acceleration of recurrent neural network based language model. In: IEEE Symposium on Field-Programmable Custom Computing Machine, pp. 111–118 (2015)

    Google Scholar 

  16. Li, Y., et al.: Intelligent fault diagnosis by fusing domain adversarial training and maximum mean discrepancy via ensemble learning. IEEE TII 17(4), 2833–2841 (2020)

    Google Scholar 

  17. Qiu, H., Zheng, Q., et al.: Deep residual learning-based enhanced jpeg compression in the internet of things. IEEE TII 17(3), 2124–2133 (2020)

    Google Scholar 

  18. Gai, K., et al.: Efficiency-aware workload optimizations of heterogeneous cloud computing for capacity planning in financial industry. In: IEEE CSCloud (2015)

    Google Scholar 

  19. Qiu, M., et al.: Data transfer minimization for financial derivative pricing using monte carlo simulation with GPU in 5G. JCS 29(16), 2364–2374 (2016)

    Google Scholar 

  20. Qiu, H., et al.: Topological graph convolutional network-based urban traffic flow and density prediction. IEEE TITS 22, 4560–4569 (2020)

    Google Scholar 

  21. Qiu, H., et al.: Secure health data sharing for medical cyber-physical systems for the healthcare 4.0. IEEE JBHI 24, 2499–2505 (2020)

    Google Scholar 

  22. Qiu, M., Zhang, L., et al.: Security-aware optimization for ubiquitous computing systems with seat graph approach. JCSS 79(5), 518–529 (2013)

    MathSciNet  MATH  Google Scholar 

  23. Qiu, M., Li, H., Sha, E.: Heterogeneous real-time embedded software optimization considering hardware platform. In: ACM SAC, pp. 1637–1641 (2009)

    Google Scholar 

  24. Ferreira, J.C., Fonseca, J.: An FPGA implementation of a long short-term memory neural network. In: IEEE Conference on ReConFigurable, pp. 1–8 (2016)

    Google Scholar 

  25. Guan, Y., Yuan, Z., Sun, G., Cong, J.: Fpga-based accelerator for long short-term memory recurrent neural networks (2017)

    Google Scholar 

  26. Ledwon, M., Cockburn, B.F., Han, J.: High-throughput FPGA-based hardware accelerators for deflate compression and decompression using high-level synthesis. IEEE Access 8, 62207–62217 (2020)

    Article  Google Scholar 

  27. Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding (2015). arXiv preprint, arXiv:1510.00149

  28. Han, S., Kang, J., et al.: ESE: Efficient speech recognition engine with sparse LSTM on FPGA. In: ACM/SIGDA FPGA, pp. 75–84 (2017)

    Google Scholar 

  29. Jouppi, N.P., Young, C., et al.: In-datacenter performance analysis of a tensor processing unit. In: 44th IEEE ISCA, pp. 1–12 (2017)

    Google Scholar 

  30. Yadav, C., Bottou, L.: Cold case: The lost mnist digits (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, Q., Wu, J., Huang, F., Han, Y., Zhao, Q. (2022). Multi-layer LSTM Parallel Optimization Based on Hardware and Software Cooperation. In: Memmi, G., Yang, B., Kong, L., Zhang, T., Qiu, M. (eds) Knowledge Science, Engineering and Management. KSEM 2022. Lecture Notes in Computer Science(), vol 13369. Springer, Cham. https://doi.org/10.1007/978-3-031-10986-7_55

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-10986-7_55

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-10985-0

  • Online ISBN: 978-3-031-10986-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics