ABSTRACT
Today, recurrent neural network (RNN) is used in various applications like image captioning, speech recognition and machine translation. However, because of data dependencies, recurrent neural network is hard to parallelize. Furthermore, to increase network's accuracy, recurrent neural network uses complicated cell units such as long short-term memory (LSTM) and gated recurrent unit (GRU). To run such models on an embedded system, the size of the network model and the amount of computation need to be reduced to achieve low power consumption and low required memory bandwidth. In this paper, implementation of RNN based on GRU with a logarithmic quantization method is proposed. The proposed implementation is synthesized using high-level synthesis (HLS) targeting Xilinx ZCU102 FPGA running at 100MHz. The proposed implementation with an 8-bit log-quantization achieves 90.57% accuracy without re-training or fine-tuning. And the memory usage is 31% lower than that for an implementation with 32-bit floating point data representation.
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun 2016. Deep Residual Learning for Image Recognition. The IEEE Conference on Computer Vision and Pattern Recognition (June 2016), 770--778Google Scholar
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre S., et al. 2015. Going Deeper With Convolutions. The IEEE Conference on Computer Vision and Pattern Recognition, (June 2015), 1--9Google Scholar
- Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, 3104--3112 (Dec. 2014) DOI=https://dl.acm.org/citation.cfm?id=2969033.2969173 Google ScholarDigital Library
- Douglas Eck and Juergen Schmidhuber. 2002. A First Look at Music Composition Using LSTM Recurrent Neural Networks. Technical Report. Istituto Dalle Molle Di Studi Sull Intelligenza Artificiale. Google Scholar
- Song Han, Huizi Mao, William J. Dally. 2016. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. International Conference on Learning Representations (May 2016), DOI= https://arxiv.org/pdf/1510.00149v5.pdfGoogle Scholar
- Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size. DOI=https://arxiv.org/abs/1602.07360Google Scholar
- Alex Krizhevsky, Ilya Sutskever and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems (Dec. 2012), 1097--1105, DOI=https://dl.acm.org/citation.cfm?id=2999257 Google ScholarDigital Library
- Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, and William (Bill) J. Dally. 2017. ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Feb. 2017), 75--84 Google ScholarDigital Library
- Daisuke Miyashita, Edward H. Lee and Boris Murmann. 2016. Convolutional Neural Networks using Logarithmic Data Representation. DOI=https://arxiv.org/abs/1603.01025Google Scholar
- Karen Simonyan and Andrew Zisserman. 2014 Very Deep Convolutional Networks for Large-Scale Image Recognition. DOI=https://arxiv.org/abs/1409.1556Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (November 1997), 1735--1780. Google ScholarDigital Library
- Cho Kyunghyun, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. DOI= https://arxiv.org/abs/1406.1078Google Scholar
- Junyoung Chung, Caglar Gulcehre, KyungHyun Cho and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. Advances in Neural Information Processing Systems (Dec. 2014), DOI= https://arxiv.org/abs/1412.3555Google Scholar
- Song Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. Horowitz, W. J. Dally. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. ACM/IEEE 43rd Annual International Symposium on Computer Architecture (June 2016), 243--254 Google ScholarDigital Library
- ZCU102 Board User Guide, https://www.xilinx.com/support/documentation/boards_and_kits/zcu102/ug1182-zcu102-eval-bd.pdfGoogle Scholar
- William Shakespeare Plays Datasets, https://ocw.mit.edu/ans7870/6/6.006/s08/lecturenotes/files/t8.shakespeare.txtGoogle Scholar
- Martín Abadi, Paul Barham, Jianmin Chen et al. 2016. TensorFlow: a system for large-scale machine learning. In Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation (Nov. 2016), 265--283, https://www.tensorflow.org Google ScholarDigital Library
Index Terms
- Log-quantization on GRU networks
Recommendations
Implementation of a CNN accelerator on an Embedded SoC Platform using SDSoC
ICDSP '18: Proceedings of the 2nd International Conference on Digital Signal ProcessingToday, Convolution Neural Networks (CNN) is adopted by various application areas such as computer vision, speech recognition, and natural language processing. Due to a massive amount of computing for CNN, CNN running on an embedded platform may not meet ...
Implementation of deep neural networks on FPGA-CPU platform using Xilinx SDSOC
AbstractDeep Convolutional Neural Networks (CNNs) are the state-of-the-art systems for image classification due to their high accuracy but on the other hand their high computational complexity is very costly. The acceleration is the target in this field ...
A Runtime Programmable Accelerator for Convolutional and Multilayer Perceptron Neural Networks on FPGA
Applied Reconfigurable Computing. Architectures, Tools, and ApplicationsAbstractDeep neural networks (DNNs) are prevalent for many applications related to classification, prediction and regression. To perform different applications with better performance and accuracy, an optimized network architecture is required, which can ...
Comments