ABSTRACT
Multidimensional Long Short-Term Memory (MD-LSTM) neural network is an extension of one-dimensional LSTM for data with more than one dimension that allows MD-LSTM to show state-of-the-art results in various applications including handwritten text recognition, medical imaging, and many more. However, efficient implementation suffers from very sequential execution that tremendously slows down both training and inference compared to other neural networks. This is the primary reason that prevents intensive research involving MD-LSTM in the recent years, despite large progress in microelectronics and architectures. The main goal of the current research is to provide acceleration for inference of MD-LSTM, so to open a door for efficient training that can boost application of MD-LSTM. By this research we advocate that FPGA is an alternative platform for deep learning that can offer a solution in cases when a massive parallelism of GPUs does not provide the necessary performance required by the application. In this paper, we present the first hardware architecture for MD-LSTM. We conduct a systematic exploration of precision vs. accuracy trade-off using challenging dataset for historical document image binarization from DIBCO 2017 contest, and well known MNIST dataset for handwritten digits recognition. Based on our new architecture we implement FPGA-based accelerator that outperforms NVIDIA K80 GPU implementation in terms of runtime by up to 50x and energy efficiency by up to 746x. At the same time, our accelerator demonstrates higher accuracy and comparable throughput in comparison with state-of-the-art FPGA-based implementations of multilayer perceptron for MNIST dataset.
- [n. d.]. http://pytorch.org/Google Scholar
- [n. d.]. Zynq UltraScaleï MPSoC Power Advantage Tool. https://xilinxwiki. atlassian.net/wiki/spaces/A/pages/18841813/Zynq+UltraScale+MP\SoC+ Power+Management.Google Scholar
- Muhammad Zeshan Afzal, Joan Pastor-Pellicer, Faisal Shafait, Thomas M Breuel, Andreas Dengel, and Marcus Liwicki. 2015. Document image binarization using lstm: A sequence learning approach. In Proceedings of the 3rd International Workshop on Historical Document Imaging and Processing. ACM, 79--84.Google ScholarDigital Library
- Hande Alemdar, Vincent Leroy, Adrien Prost-Boucle, and Frédéric Pétrot. 2017. Ternary neural networks for resource-efficient AI applications. In 2017 International Joint Conference on Neural Networks (IJCNN). IEEE, 2547--2554.Google ScholarCross Ref
- Wonmin Byeon, Thomas M Breuel, Federico Raue, and Marcus Liwicki. 2015. Scene labeling with lstm recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3547--3555.Google ScholarCross Ref
- William Chan, Navdeep Jaitly, Quoc Le, and Oriol Vinyals. 2016. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4960--4964.Google ScholarDigital Library
- Benjamin Davidson, Angelos Kalitzeos, Joseph Carroll, Alfredo Dubra, Sebastien Ourselin, Michel Michaelides, and Christos Bergeles. 2018. Automatic cone photoreceptor localisation in healthy and Stargardt afflicted retinas using deep learning. Scientific reports 8, 1 (2018), 7911.Google Scholar
- Tong Geng, Tianqi Wang, Ang Li, Xi Jin, and Martin Herbordt. 2019. A Scalable Framework for Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters with Weight and Workload Balancing. arXiv preprint arXiv:1901.01007 (2019).Google Scholar
- Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics. 249--256.Google Scholar
- Alex Graves, Santiago Fernández, and Jürgen Schmidhuber. 2007. Multidimensional recurrent neural networks. In International conference on artificial neural networks. Springer, 549--558.Google Scholar
- Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, et al. 2017. Ese: Efficient Speech Recognition Rngine with Sparse LSTM on FPGA. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 75--84.Google ScholarDigital Library
- Lu Hou, Quanming Yao, and James T Kwok. 2016. Loss-aware binarization of deep networks. arXiv preprint arXiv:1611.01600 (2016).Google Scholar
- Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Quantized neural networks: Training neural networks with low precision weights and activations. arXiv preprint arXiv:1609.07061 (2016).Google Scholar
- Nal Kalchbrenner, Ivo Danihelka, and Alex Graves. 2015. Grid long short-term memory. arXiv preprint arXiv:1507.01526 (2015).Google Scholar
- Kamran Kowsari, Mojtaba Heidarysafa, Donald E Brown, Kiana Jafari Meimandi, and Laura E Barnes. 2018. Rmdl: Random multimodel deep learning for classification. In Proceedings of the 2nd International Conference on Information System and Data Mining. ACM, 19--28.Google ScholarDigital Library
- Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. 1998. Gradientbased learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278-- 2324.Google ScholarCross Ref
- Gundram Leifert, Tobias Strauß, Tobias Grüning, Welf Wustlich, and Roger Labahn. 2016. Cells in multidimensional recurrent neural networks. The Journal of Machine Learning Research 17, 1 (2016), 3313--3349.Google ScholarDigital Library
- Shuang Liang, Shouyi Yin, Leibo Liu,Wayne Luk, and ShaojunWei. 2018. FP-BNN: Binarized neural network on FPGA. Neurocomputing 275 (2018), 1072--1086.Google ScholarDigital Library
- Bastien Moysset and Ronaldo Messina. 2018. Are 2D-LSTM really dead for offline text recognition? arXiv preprint arXiv:1811.10899 (2018).Google Scholar
- Aaron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. 2016. Pixel recurrent neural networks. arXiv preprint arXiv:1601.06759 (2016).Google Scholar
- Jinhwan Park andWonyong Sung. 2016. FPGA based implementation of deep neural networks using on-chip memory only. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1011--1015.Google ScholarCross Ref
- Vu Pham, Théodore Bluche, Christopher Kermorvant, and Jérôme Louradour. 2014. Dropout improves recurrent neural networks for handwriting recognition. In 2014 14th International Conference on Frontiers in Handwriting Recognition. IEEE, 285--290.Google ScholarCross Ref
- Ioannis Pratikakis, Konstantinos Zagoris, George Barlas, and Basilis Gatos. 2017. ICDAR2017 competition on document image binarization (DIBCO 2017). In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Vol. 1. IEEE, 1395--1403.Google Scholar
- Joan Puigcerver. 2017. Are multidimensional recurrent layers really necessary for handwritten text recognition?. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Vol. 1. IEEE, 67--72.Google ScholarCross Ref
- Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In European Conference on Computer Vision. Springer, 525--542.Google ScholarCross Ref
- Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.Google ScholarCross Ref
- Vladimir Rybalkin, Alessandro Pappalardo, Muhammad Mohsin Ghaffar, Giulio Gambardella, Norbert Wehn, and Michaela Blott. 2018. FINN-L: Library extensions and design trade-off analysis for variable precision LSTM networks on FPGAs. In 2018 28th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 89--897.Google ScholarCross Ref
- Vladimir Rybalkin, Norbert Wehn, Mohammad Reza Yousefi, and Didier Stricker. 2017. Hardware architecture of bidirectional long short-term memory neural network for optical character recognition. In Proceedings of the Conference on Design, Automation & Test in Europe. European Design and Automation Association, 1394--1399.Google ScholarDigital Library
- Marijn F Stollenga, Wonmin Byeon, Marcus Liwicki, and Juergen Schmidhuber. 2015. Parallel multi-dimensional lstm, with application to fast biomedical volumetric image segmentation. In Advances in neural information processing systems. 2998--3006.Google Scholar
- Yaman Umuroglu, Nicholas J Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, and Kees Vissers. 2017. Finn: A framework for fast, scalable binarized neural network inference. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 65--74.Google ScholarDigital Library
- Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2015. Show and tell: A neural image caption generator. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3156--3164.Google ScholarCross Ref
- Paul Voigtlaender, Patrick Doetsch, and Hermann Ney. 2016. Handwriting recognition with large multidimensional long short-term memory recurrent neural networks. In 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE, 228--233.Google ScholarCross Ref
- Gideon Maillette de Buy Wenniger, Lambert Schomaker, and Andy Way. 2019. No Padding Please: Efficient Neural Handwriting Recognition. arXiv preprint arXiv:1902.11208 (2019).Google Scholar
- Chen Xu, Jianqiang Yao, Zhouchen Lin,Wenwu Ou, Yuanbin Cao, ZhirongWang, and Hongbin Zha. 2018. Alternating Multi-bit Quantization for Recurrent Neural Networks. arXiv preprint arXiv:1802.00150 (2018).Google Scholar
- Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016).Google Scholar
Index Terms
- When Massive GPU Parallelism Ain't Enough: A Novel Hardware Architecture of 2D-LSTM Neural Network
Recommendations
When Massive GPU Parallelism Ain’t Enough: A Novel Hardware Architecture of 2D-LSTM Neural Network
Multidimensional Long Short-Term Memory (MD-LSTM) neural network is an extension of one-dimensional LSTM for data with more than one dimension. MD-LSTM achieves state-of-the-art results in various applications, including handwritten text recognition, ...
Efficient Hardware Architectures for 1D- and MD-LSTM Networks
AbstractRecurrent Neural Networks, in particular One-dimensional and Multidimensional Long Short-Term Memory (1D-LSTM and MD-LSTM) have achieved state-of-the-art classification accuracy in many applications such as machine translation, image caption ...
Rapid Implementation of Embedded Systems using Xilinx Zynq Platform
SEEDA-CECNSM '16: Proceedings of the SouthEast European Design Automation, Computer Engineering, Computer Networks and Social Media ConferenceIn any digital system design, it is crucial to achieve the lowest time-to-market possible. Indeed, that need has pushed large FPGA manufacturers to produce SoCs which will implement reprogrammable logic along with CPU and DSP cores. Especially, during ...
Comments