research-article

PLACID: A Platform for FPGA-Based Accelerator Creation for DCNNs

Authors:
Mohammad Motamedi

Electrical and Computer Engineering Department, University of California, Davis, CA

Electrical and Computer Engineering Department, University of California, Davis, CA
View Profile

,
Philipp Gysel

Electrical and Computer Engineering Department, University of California, Davis, CA

Electrical and Computer Engineering Department, University of California, Davis, CA
View Profile

,
Soheil Ghiasi

Electrical and Computer Engineering Department, University of California, Davis, CA

Electrical and Computer Engineering Department, University of California, Davis, CA
View Profile

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 13 Issue 4Article No.: 62pp 1–21https://doi.org/10.1145/3131289

Published:18 September 2017Publication History

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

Deep Convolutional Neural Networks (DCNNs) exhibit remarkable performance in a number of pattern recognition and classification tasks. Modern DCNNs involve many millions of parameters and billions of operations. Inference using such DCNNs, if implemented as software running on an embedded processor, results in considerable execution time and energy consumption, which is prohibitive in many mobile applications. Field-programmable gate array (FPGA)-based acceleration of DCNN inference is a promising approach to improve both energy consumption and classification throughput. However, the engineering effort required for development and verification of an optimized FPGA-based architecture is significant.

In this article, we present PLACID, an automated PLatform for Accelerator CreatIon for DCNNs. PLACID uses an analytical approach to characterization and exploration of the implementation space. PLACID enables generation of an accelerator with the highest throughput for a given DCNN on a specific target FPGA platform. Subsequently, it generates an RTL level architecture in Verilog, which can be passed onto commercial tools for FPGA implementation. PLACID is fully automated, and reduces the accelerator design time from a few months down to a few hours. Experimental results show that architectures synthesized by PLACID yield 2× higher throughput density than the best competing approach.

References

Martın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin et al. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016).Google Scholar
James Bergstra, Frédéric Bastien, Olivier Breuleux, Pascal Lamblin, Razvan Pascanu, Olivier Delalleau, Guillaume Desjardins, David Warde-Farley, Ian Goodfellow, Arnaud Bergeronet al. 2011. Theano: Deep learning on gpus with python. In Proceedings of the BigLearning Workshop (NIPS’11), Granada, Spain, Vol. 3. Citeseer.Google Scholar
Srihari Cadambi, Abhinandan Majumdar, Michela Becchi, Srimat Chakradhar, and Hans Peter Graf. 2010. A programmable parallel accelerator for learning and classification. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques. ACM, 273--284. Google ScholarDigital Library
Srimat Chakradhar, Murugan Sankaradas, Venkata Jakkula, and Srihari Cadambi. 2010. A dynamically configurable coprocessor for convolutional neural networks. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA’10). ACM, New York, NY, 247--257. DOI:http://dx.doi.org/10.1145/1815961.1815993 Google ScholarDigital Library
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In ACM Sigplan Notices, Vol. 49. ACM, 269--284. Google ScholarDigital Library
Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sunet al. 2014. Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 609--622. Google ScholarDigital Library
Ronan Collobert, Koray Kavukcuoglu, and Clément Farabet. 2011. Torch7: A matlab-like environment for machine learning. In Proceedings of the BigLearn, NIPS Workshop.Google Scholar
Jason Cong and Bingjun Xiao. 2014. Minimizing computation in convolutional neural networks. In Proceedings of the International Conference on Artificial Neural Networks. Springer, 281--290. Google ScholarCross Ref
Matthieu Courbariaux, Jean-Pierre David, and Yoshua Bengio. 2014. Training deep neural networks with low precision multiplications. arXiv preprint arXiv:1412.7024 (2014).Google Scholar
Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. CoRR, abs/1502.02551 392 (2015).Google Scholar
Philipp Gysel, Mohammad Motamedi, and Soheil Ghiasi. 2016. Hardware-oriented approximation of convolutional neural networks. arXiv preprint arXiv:0000.0000 (2016).Google Scholar
Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. CoRR, abs/1510.00149 2 (2015).Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015).Google Scholar
Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv preprint arXiv:1602.07360 (2016).Google Scholar
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia. ACM, 675--678. Google ScholarDigital Library
N. Jouppi. 2016. Google supercharges machine learning tasks with TPU custom chip. Google Blog, May 18 (2016).Google Scholar
Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. (2009).Google Scholar
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.Google Scholar
Edward F. Moore. 1956. Gedanken-experiments on sequential machines. Automata Studies 34 (1956), 129--153. Google ScholarCross Ref
M. Motamedi, P. Gysel, V. Akella, and S. Ghiasi. 2016. Design space exploration of FPGA-based deep convolutional neural networks. In Proceedings of the 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC’16). 575--580. DOI:http://dx.doi.org/10.1109/ASPDAC.2016.7428073 Google ScholarCross Ref
Maurice Peemen, Arnaud A. A. Setio, Bart Mesman, and Henk Corporaal. 2013. Memory-centric accelerator design for convolutional neural networks. In Proceedings of the 2013 IEEE 31st International Conference on Computer Design (ICCD’13). IEEE, 13--19. Google ScholarCross Ref
Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Songet al. 2016. Going deeper with embedded fpga platform for convolutional neural network. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. Google ScholarDigital Library
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. arXiv preprint arXiv:1603.05279 (2016).Google Scholar
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vision (IJCV) 115, 3 (2015), 211--252. DOI:http://dx.doi.org/10.1007/s11263-015-0816-y Google ScholarDigital Library
Murugan Sankaradas, Venkata Jakkula, Srihari Cadambi, Srimat Chakradhar, Igor Durdanovic, Eric Cosatto, and Hans Peter Graf. 2009. A massively parallel coprocessor for convolutional neural networks. In Proceedings of the 20th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP’09). IEEE, 53--60. Google ScholarDigital Library
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
Naveen Suda, Vikas Chandra, Ganesh Dasika, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, and Yu Cao. 2016. Throughput-optimized openCL-based FPGA accelerator for large-scale convolutional neural networks. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’16). ACM, New York, NY, 16--25. DOI:http://dx.doi.org/10.1145/2847263.2847276 Google ScholarDigital Library
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--9. Google ScholarCross Ref
Andrea Vedaldi and Karel Lenc. 2015. Matconvnet: Convolutional neural networks for matlab. In Proceedings of the 23rd ACM International Conference on Multimedia. ACM, 689--692. Google ScholarDigital Library
Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 161--170. Google ScholarDigital Library

Index Terms

PLACID: A Platform for FPGA-Based Accelerator Creation for DCNNs
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Embedded systems
      1. Embedded hardware
    2. System on a chip

Recommendations

Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

As convolution layers contribute most operations in convolutional neural network (CNN) algorithms, an effective convolution acceleration scheme significantly affects the efficiency and performance of a hardware CNN accelerator. Convolution in CNNs ...
Read More
You Cannot Improve What You Do not Measure: FPGA vs. ASIC Efficiency Gaps for Convolutional Neural Network Inference
Special Issue on Deep learning on FPGAs

Recently, deep learning (DL) has become best-in-class for numerous applications but at a high computational cost that necessitates high-performance energy-efficient acceleration. The reconfigurability of FPGAs is appealing due to the rapid change in DL ...
Read More
Efficient FPGA hardware development: A multi-language approach

This paper presents a multi-language framework to FPGA hardware development which aims to satisfy the dual requirement of high-level hardware design and efficient hardware implementation. The central idea of this framework is the integration of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Multimedia Computing, Communications, and Applications Volume 13, Issue 4
November 2017
362 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3129737
Editor:
Alberto Del Bimbo
University of Firenze, Italy
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 September 2017
- Accepted: 1 July 2017
- Revised: 1 April 2017
- Received: 1 November 2016
Published in tomm Volume 13, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Convolutional neural networks
accelerator design
deep learning
design automation
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 31
  Total Citations
  View Citations
- 532
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

PLACID: A Platform for FPGA-Based Accelerator Creation for DCNNs

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

References

Cited By

Index Terms

Recommendations

Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks

You Cannot Improve What You Do not Measure: FPGA vs. ASIC Efficiency Gaps for Convolutional Neural Network Inference

Efficient FPGA hardware development: A multi-language approach