skip to main content
10.1145/3373087.3375301acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

When Massive GPU Parallelism Ain't Enough: A Novel Hardware Architecture of 2D-LSTM Neural Network

Published:24 February 2020Publication History

ABSTRACT

Multidimensional Long Short-Term Memory (MD-LSTM) neural network is an extension of one-dimensional LSTM for data with more than one dimension that allows MD-LSTM to show state-of-the-art results in various applications including handwritten text recognition, medical imaging, and many more. However, efficient implementation suffers from very sequential execution that tremendously slows down both training and inference compared to other neural networks. This is the primary reason that prevents intensive research involving MD-LSTM in the recent years, despite large progress in microelectronics and architectures. The main goal of the current research is to provide acceleration for inference of MD-LSTM, so to open a door for efficient training that can boost application of MD-LSTM. By this research we advocate that FPGA is an alternative platform for deep learning that can offer a solution in cases when a massive parallelism of GPUs does not provide the necessary performance required by the application. In this paper, we present the first hardware architecture for MD-LSTM. We conduct a systematic exploration of precision vs. accuracy trade-off using challenging dataset for historical document image binarization from DIBCO 2017 contest, and well known MNIST dataset for handwritten digits recognition. Based on our new architecture we implement FPGA-based accelerator that outperforms NVIDIA K80 GPU implementation in terms of runtime by up to 50x and energy efficiency by up to 746x. At the same time, our accelerator demonstrates higher accuracy and comparable throughput in comparison with state-of-the-art FPGA-based implementations of multilayer perceptron for MNIST dataset.

References

  1. [n. d.]. http://pytorch.org/Google ScholarGoogle Scholar
  2. [n. d.]. Zynq UltraScaleï MPSoC Power Advantage Tool. https://xilinxwiki. atlassian.net/wiki/spaces/A/pages/18841813/Zynq+UltraScale+MP\SoC+ Power+Management.Google ScholarGoogle Scholar
  3. Muhammad Zeshan Afzal, Joan Pastor-Pellicer, Faisal Shafait, Thomas M Breuel, Andreas Dengel, and Marcus Liwicki. 2015. Document image binarization using lstm: A sequence learning approach. In Proceedings of the 3rd International Workshop on Historical Document Imaging and Processing. ACM, 79--84.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Hande Alemdar, Vincent Leroy, Adrien Prost-Boucle, and Frédéric Pétrot. 2017. Ternary neural networks for resource-efficient AI applications. In 2017 International Joint Conference on Neural Networks (IJCNN). IEEE, 2547--2554.Google ScholarGoogle ScholarCross RefCross Ref
  5. Wonmin Byeon, Thomas M Breuel, Federico Raue, and Marcus Liwicki. 2015. Scene labeling with lstm recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3547--3555.Google ScholarGoogle ScholarCross RefCross Ref
  6. William Chan, Navdeep Jaitly, Quoc Le, and Oriol Vinyals. 2016. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4960--4964.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Benjamin Davidson, Angelos Kalitzeos, Joseph Carroll, Alfredo Dubra, Sebastien Ourselin, Michel Michaelides, and Christos Bergeles. 2018. Automatic cone photoreceptor localisation in healthy and Stargardt afflicted retinas using deep learning. Scientific reports 8, 1 (2018), 7911.Google ScholarGoogle Scholar
  8. Tong Geng, Tianqi Wang, Ang Li, Xi Jin, and Martin Herbordt. 2019. A Scalable Framework for Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters with Weight and Workload Balancing. arXiv preprint arXiv:1901.01007 (2019).Google ScholarGoogle Scholar
  9. Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics. 249--256.Google ScholarGoogle Scholar
  10. Alex Graves, Santiago Fernández, and Jürgen Schmidhuber. 2007. Multidimensional recurrent neural networks. In International conference on artificial neural networks. Springer, 549--558.Google ScholarGoogle Scholar
  11. Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, et al. 2017. Ese: Efficient Speech Recognition Rngine with Sparse LSTM on FPGA. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 75--84.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Lu Hou, Quanming Yao, and James T Kwok. 2016. Loss-aware binarization of deep networks. arXiv preprint arXiv:1611.01600 (2016).Google ScholarGoogle Scholar
  13. Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Quantized neural networks: Training neural networks with low precision weights and activations. arXiv preprint arXiv:1609.07061 (2016).Google ScholarGoogle Scholar
  14. Nal Kalchbrenner, Ivo Danihelka, and Alex Graves. 2015. Grid long short-term memory. arXiv preprint arXiv:1507.01526 (2015).Google ScholarGoogle Scholar
  15. Kamran Kowsari, Mojtaba Heidarysafa, Donald E Brown, Kiana Jafari Meimandi, and Laura E Barnes. 2018. Rmdl: Random multimodel deep learning for classification. In Proceedings of the 2nd International Conference on Information System and Data Mining. ACM, 19--28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. 1998. Gradientbased learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278-- 2324.Google ScholarGoogle ScholarCross RefCross Ref
  17. Gundram Leifert, Tobias Strauß, Tobias Grüning, Welf Wustlich, and Roger Labahn. 2016. Cells in multidimensional recurrent neural networks. The Journal of Machine Learning Research 17, 1 (2016), 3313--3349.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Shuang Liang, Shouyi Yin, Leibo Liu,Wayne Luk, and ShaojunWei. 2018. FP-BNN: Binarized neural network on FPGA. Neurocomputing 275 (2018), 1072--1086.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Bastien Moysset and Ronaldo Messina. 2018. Are 2D-LSTM really dead for offline text recognition? arXiv preprint arXiv:1811.10899 (2018).Google ScholarGoogle Scholar
  20. Aaron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. 2016. Pixel recurrent neural networks. arXiv preprint arXiv:1601.06759 (2016).Google ScholarGoogle Scholar
  21. Jinhwan Park andWonyong Sung. 2016. FPGA based implementation of deep neural networks using on-chip memory only. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1011--1015.Google ScholarGoogle ScholarCross RefCross Ref
  22. Vu Pham, Théodore Bluche, Christopher Kermorvant, and Jérôme Louradour. 2014. Dropout improves recurrent neural networks for handwriting recognition. In 2014 14th International Conference on Frontiers in Handwriting Recognition. IEEE, 285--290.Google ScholarGoogle ScholarCross RefCross Ref
  23. Ioannis Pratikakis, Konstantinos Zagoris, George Barlas, and Basilis Gatos. 2017. ICDAR2017 competition on document image binarization (DIBCO 2017). In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Vol. 1. IEEE, 1395--1403.Google ScholarGoogle Scholar
  24. Joan Puigcerver. 2017. Are multidimensional recurrent layers really necessary for handwritten text recognition?. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Vol. 1. IEEE, 67--72.Google ScholarGoogle ScholarCross RefCross Ref
  25. Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In European Conference on Computer Vision. Springer, 525--542.Google ScholarGoogle ScholarCross RefCross Ref
  26. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.Google ScholarGoogle ScholarCross RefCross Ref
  27. Vladimir Rybalkin, Alessandro Pappalardo, Muhammad Mohsin Ghaffar, Giulio Gambardella, Norbert Wehn, and Michaela Blott. 2018. FINN-L: Library extensions and design trade-off analysis for variable precision LSTM networks on FPGAs. In 2018 28th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 89--897.Google ScholarGoogle ScholarCross RefCross Ref
  28. Vladimir Rybalkin, Norbert Wehn, Mohammad Reza Yousefi, and Didier Stricker. 2017. Hardware architecture of bidirectional long short-term memory neural network for optical character recognition. In Proceedings of the Conference on Design, Automation & Test in Europe. European Design and Automation Association, 1394--1399.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Marijn F Stollenga, Wonmin Byeon, Marcus Liwicki, and Juergen Schmidhuber. 2015. Parallel multi-dimensional lstm, with application to fast biomedical volumetric image segmentation. In Advances in neural information processing systems. 2998--3006.Google ScholarGoogle Scholar
  30. Yaman Umuroglu, Nicholas J Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, and Kees Vissers. 2017. Finn: A framework for fast, scalable binarized neural network inference. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 65--74.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2015. Show and tell: A neural image caption generator. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3156--3164.Google ScholarGoogle ScholarCross RefCross Ref
  32. Paul Voigtlaender, Patrick Doetsch, and Hermann Ney. 2016. Handwriting recognition with large multidimensional long short-term memory recurrent neural networks. In 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE, 228--233.Google ScholarGoogle ScholarCross RefCross Ref
  33. Gideon Maillette de Buy Wenniger, Lambert Schomaker, and Andy Way. 2019. No Padding Please: Efficient Neural Handwriting Recognition. arXiv preprint arXiv:1902.11208 (2019).Google ScholarGoogle Scholar
  34. Chen Xu, Jianqiang Yao, Zhouchen Lin,Wenwu Ou, Yuanbin Cao, ZhirongWang, and Hongbin Zha. 2018. Alternating Multi-bit Quantization for Recurrent Neural Networks. arXiv preprint arXiv:1802.00150 (2018).Google ScholarGoogle Scholar
  35. Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016).Google ScholarGoogle Scholar

Index Terms

  1. When Massive GPU Parallelism Ain't Enough: A Novel Hardware Architecture of 2D-LSTM Neural Network

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          FPGA '20: Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
          February 2020
          346 pages
          ISBN:9781450370998
          DOI:10.1145/3373087

          Copyright © 2020 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 24 February 2020

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate125of627submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader