research-article

Log-quantization on GRU networks

Authors:
Sang-Ki Park

Hanyang University, Seoul, Republic of Korea

Hanyang University, Seoul, Republic of Korea
View Profile

,
Sang-Soo Park

Hanyang University, Seoul, Republic of Korea

Hanyang University, Seoul, Republic of Korea
View Profile

,
Ki-Seok Chung

Hanyang University, Seoul, Republic of Korea

Hanyang University, Seoul, Republic of Korea
View Profile

ICCIP '18: Proceedings of the 4th International Conference on Communication and Information ProcessingNovember 2018Pages 112–116https://doi.org/10.1145/3290420.3290443

Published:02 November 2018Publication History

ICCIP '18: Proceedings of the 4th International Conference on Communication and Information Processing

Pages 112–116

ABSTRACT

Today, recurrent neural network (RNN) is used in various applications like image captioning, speech recognition and machine translation. However, because of data dependencies, recurrent neural network is hard to parallelize. Furthermore, to increase network's accuracy, recurrent neural network uses complicated cell units such as long short-term memory (LSTM) and gated recurrent unit (GRU). To run such models on an embedded system, the size of the network model and the amount of computation need to be reduced to achieve low power consumption and low required memory bandwidth. In this paper, implementation of RNN based on GRU with a logarithmic quantization method is proposed. The proposed implementation is synthesized using high-level synthesis (HLS) targeting Xilinx ZCU102 FPGA running at 100MHz. The proposed implementation with an 8-bit log-quantization achieves 90.57% accuracy without re-training or fine-tuning. And the memory usage is 31% lower than that for an implementation with 32-bit floating point data representation.

References

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun 2016. Deep Residual Learning for Image Recognition. The IEEE Conference on Computer Vision and Pattern Recognition (June 2016), 770--778Google Scholar
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre S., et al. 2015. Going Deeper With Convolutions. The IEEE Conference on Computer Vision and Pattern Recognition, (June 2015), 1--9Google Scholar
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, 3104--3112 (Dec. 2014) DOI=https://dl.acm.org/citation.cfm?id=2969033.2969173 Google ScholarDigital Library
Douglas Eck and Juergen Schmidhuber. 2002. A First Look at Music Composition Using LSTM Recurrent Neural Networks. Technical Report. Istituto Dalle Molle Di Studi Sull Intelligenza Artificiale. Google Scholar
Song Han, Huizi Mao, William J. Dally. 2016. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. International Conference on Learning Representations (May 2016), DOI= https://arxiv.org/pdf/1510.00149v5.pdfGoogle Scholar
Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size. DOI=https://arxiv.org/abs/1602.07360Google Scholar
Alex Krizhevsky, Ilya Sutskever and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems (Dec. 2012), 1097--1105, DOI=https://dl.acm.org/citation.cfm?id=2999257 Google ScholarDigital Library
Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, and William (Bill) J. Dally. 2017. ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Feb. 2017), 75--84 Google ScholarDigital Library
Daisuke Miyashita, Edward H. Lee and Boris Murmann. 2016. Convolutional Neural Networks using Logarithmic Data Representation. DOI=https://arxiv.org/abs/1603.01025Google Scholar
Karen Simonyan and Andrew Zisserman. 2014 Very Deep Convolutional Networks for Large-Scale Image Recognition. DOI=https://arxiv.org/abs/1409.1556Google Scholar
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (November 1997), 1735--1780. Google ScholarDigital Library
Cho Kyunghyun, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. DOI= https://arxiv.org/abs/1406.1078Google Scholar
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. Advances in Neural Information Processing Systems (Dec. 2014), DOI= https://arxiv.org/abs/1412.3555Google Scholar
Song Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. Horowitz, W. J. Dally. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. ACM/IEEE 43rd Annual International Symposium on Computer Architecture (June 2016), 243--254 Google ScholarDigital Library
ZCU102 Board User Guide, https://www.xilinx.com/support/documentation/boards_and_kits/zcu102/ug1182-zcu102-eval-bd.pdfGoogle Scholar
William Shakespeare Plays Datasets, https://ocw.mit.edu/ans7870/6/6.006/s08/lecturenotes/files/t8.shakespeare.txtGoogle Scholar
Martín Abadi, Paul Barham, Jianmin Chen et al. 2016. TensorFlow: a system for large-scale machine learning. In Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation (Nov. 2016), 265--283, https://www.tensorflow.org Google ScholarDigital Library

Index Terms

Log-quantization on GRU networks
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems
      2. Neural networks
  2. Embedded and cyber-physical systems
    1. Embedded systems
2. Hardware
  1. Electronic design automation
    1. High-level and register-transfer level synthesis
  2. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators

Recommendations

Implementation of a CNN accelerator on an Embedded SoC Platform using SDSoC
ICDSP '18: Proceedings of the 2nd International Conference on Digital Signal Processing

Today, Convolution Neural Networks (CNN) is adopted by various application areas such as computer vision, speech recognition, and natural language processing. Due to a massive amount of computing for CNN, CNN running on an embedded platform may not meet ...
Read More
Implementation of deep neural networks on FPGA-CPU platform using Xilinx SDSOC
Abstract
Deep Convolutional Neural Networks (CNNs) are the state-of-the-art systems for image classification due to their high accuracy but on the other hand their high computational complexity is very costly. The acceleration is the target in this field ...
Read More
A Runtime Programmable Accelerator for Convolutional and Multilayer Perceptron Neural Networks on FPGA
Applied Reconfigurable Computing. Architectures, Tools, and Applications
Abstract
Deep neural networks (DNNs) are prevalent for many applications related to classification, prediction and regression. To perform different applications with better performance and accuracy, an optimized network architecture is required, which can ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICCIP '18: Proceedings of the 4th International Conference on Communication and Information Processing
November 2018
326 pages
ISBN:9781450365345
DOI:10.1145/3290420
Conference Chairs:
Jalel Ben-Othman
University of Paris 13, France
,
Hui Yu
University of Portsmouth, the United Kingdom, UK
,
Program Chairs:
Herwig Unger
University of Hagen, Germany
,
Masayuki Arai
Graduate School of Science and Engineering Teikyo University, Japan
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 November 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
AI
CNN
FPGA
HLS
HW/SW co-design
LeNet-5
SDSoC
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate61of301submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 105
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Log-quantization on GRU networks

ICCIP '18: Proceedings of the 4th International Conference on Communication and Information Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Implementation of a CNN accelerator on an Embedded SoC Platform using SDSoC

Implementation of deep neural networks on FPGA-CPU platform using Xilinx SDSOC

A Runtime Programmable Accelerator for Convolutional and Multilayer Perceptron Neural Networks on FPGA

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Log-quantization on GRU networks

ICCIP '18: Proceedings of the 4th International Conference on Communication and Information Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Implementation of a CNN accelerator on an Embedded SoC Platform using SDSoC

Implementation of deep neural networks on FPGA-CPU platform using Xilinx SDSOC

A Runtime Programmable Accelerator for Convolutional and Multilayer Perceptron Neural Networks on FPGA

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media