skip to main content
research-article

PLACID: A Platform for FPGA-Based Accelerator Creation for DCNNs

Authors Info & Claims
Published:18 September 2017Publication History
Skip Abstract Section

Abstract

Deep Convolutional Neural Networks (DCNNs) exhibit remarkable performance in a number of pattern recognition and classification tasks. Modern DCNNs involve many millions of parameters and billions of operations. Inference using such DCNNs, if implemented as software running on an embedded processor, results in considerable execution time and energy consumption, which is prohibitive in many mobile applications. Field-programmable gate array (FPGA)-based acceleration of DCNN inference is a promising approach to improve both energy consumption and classification throughput. However, the engineering effort required for development and verification of an optimized FPGA-based architecture is significant.

In this article, we present PLACID, an automated PLatform for Accelerator CreatIon for DCNNs. PLACID uses an analytical approach to characterization and exploration of the implementation space. PLACID enables generation of an accelerator with the highest throughput for a given DCNN on a specific target FPGA platform. Subsequently, it generates an RTL level architecture in Verilog, which can be passed onto commercial tools for FPGA implementation. PLACID is fully automated, and reduces the accelerator design time from a few months down to a few hours. Experimental results show that architectures synthesized by PLACID yield 2× higher throughput density than the best competing approach.

References

  1. Martın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin et al. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016).Google ScholarGoogle Scholar
  2. James Bergstra, Frédéric Bastien, Olivier Breuleux, Pascal Lamblin, Razvan Pascanu, Olivier Delalleau, Guillaume Desjardins, David Warde-Farley, Ian Goodfellow, Arnaud Bergeronet al. 2011. Theano: Deep learning on gpus with python. In Proceedings of the BigLearning Workshop (NIPS’11), Granada, Spain, Vol. 3. Citeseer.Google ScholarGoogle Scholar
  3. Srihari Cadambi, Abhinandan Majumdar, Michela Becchi, Srimat Chakradhar, and Hans Peter Graf. 2010. A programmable parallel accelerator for learning and classification. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques. ACM, 273--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Srimat Chakradhar, Murugan Sankaradas, Venkata Jakkula, and Srihari Cadambi. 2010. A dynamically configurable coprocessor for convolutional neural networks. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA’10). ACM, New York, NY, 247--257. DOI:http://dx.doi.org/10.1145/1815961.1815993 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In ACM Sigplan Notices, Vol. 49. ACM, 269--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sunet al. 2014. Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 609--622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ronan Collobert, Koray Kavukcuoglu, and Clément Farabet. 2011. Torch7: A matlab-like environment for machine learning. In Proceedings of the BigLearn, NIPS Workshop.Google ScholarGoogle Scholar
  8. Jason Cong and Bingjun Xiao. 2014. Minimizing computation in convolutional neural networks. In Proceedings of the International Conference on Artificial Neural Networks. Springer, 281--290. Google ScholarGoogle ScholarCross RefCross Ref
  9. Matthieu Courbariaux, Jean-Pierre David, and Yoshua Bengio. 2014. Training deep neural networks with low precision multiplications. arXiv preprint arXiv:1412.7024 (2014).Google ScholarGoogle Scholar
  10. Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. CoRR, abs/1502.02551 392 (2015).Google ScholarGoogle Scholar
  11. Philipp Gysel, Mohammad Motamedi, and Soheil Ghiasi. 2016. Hardware-oriented approximation of convolutional neural networks. arXiv preprint arXiv:0000.0000 (2016).Google ScholarGoogle Scholar
  12. Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. CoRR, abs/1510.00149 2 (2015).Google ScholarGoogle Scholar
  13. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015).Google ScholarGoogle Scholar
  14. Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv preprint arXiv:1602.07360 (2016).Google ScholarGoogle Scholar
  15. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia. ACM, 675--678. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. N. Jouppi. 2016. Google supercharges machine learning tasks with TPU custom chip. Google Blog, May 18 (2016).Google ScholarGoogle Scholar
  17. Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. (2009).Google ScholarGoogle Scholar
  18. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.Google ScholarGoogle Scholar
  19. Edward F. Moore. 1956. Gedanken-experiments on sequential machines. Automata Studies 34 (1956), 129--153. Google ScholarGoogle ScholarCross RefCross Ref
  20. M. Motamedi, P. Gysel, V. Akella, and S. Ghiasi. 2016. Design space exploration of FPGA-based deep convolutional neural networks. In Proceedings of the 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC’16). 575--580. DOI:http://dx.doi.org/10.1109/ASPDAC.2016.7428073 Google ScholarGoogle ScholarCross RefCross Ref
  21. Maurice Peemen, Arnaud A. A. Setio, Bart Mesman, and Henk Corporaal. 2013. Memory-centric accelerator design for convolutional neural networks. In Proceedings of the 2013 IEEE 31st International Conference on Computer Design (ICCD’13). IEEE, 13--19. Google ScholarGoogle ScholarCross RefCross Ref
  22. Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Songet al. 2016. Going deeper with embedded fpga platform for convolutional neural network. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. arXiv preprint arXiv:1603.05279 (2016).Google ScholarGoogle Scholar
  24. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vision (IJCV) 115, 3 (2015), 211--252. DOI:http://dx.doi.org/10.1007/s11263-015-0816-y Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Murugan Sankaradas, Venkata Jakkula, Srihari Cadambi, Srimat Chakradhar, Igor Durdanovic, Eric Cosatto, and Hans Peter Graf. 2009. A massively parallel coprocessor for convolutional neural networks. In Proceedings of the 20th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP’09). IEEE, 53--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  27. Naveen Suda, Vikas Chandra, Ganesh Dasika, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, and Yu Cao. 2016. Throughput-optimized openCL-based FPGA accelerator for large-scale convolutional neural networks. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’16). ACM, New York, NY, 16--25. DOI:http://dx.doi.org/10.1145/2847263.2847276 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--9. Google ScholarGoogle ScholarCross RefCross Ref
  29. Andrea Vedaldi and Karel Lenc. 2015. Matconvnet: Convolutional neural networks for matlab. In Proceedings of the 23rd ACM International Conference on Multimedia. ACM, 689--692. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 161--170. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PLACID: A Platform for FPGA-Based Accelerator Creation for DCNNs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 13, Issue 4
        November 2017
        362 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3129737
        Issue’s Table of Contents

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 September 2017
        • Accepted: 1 July 2017
        • Revised: 1 April 2017
        • Received: 1 November 2016
        Published in tomm Volume 13, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader