short-paper

Deep neural networks compiler for a trace-based accelerator (short WIP paper)

Authors:
Andre Xian Ming Chang

FWDNXT, USA

FWDNXT, USA
View Profile

,
Aliasger Zaidy

FWDNXT, USA

FWDNXT, USA
View Profile

,
Lukasz Burzawa

FWDNXT, USA

FWDNXT, USA
View Profile

,
Eugenio Culurciello

FWDNXT, USA

FWDNXT, USA
View Profile

LCTES 2018: Proceedings of the 19th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded SystemsJune 2018Pages 89–93https://doi.org/10.1145/3211332.3211333

Published:19 June 2018Publication History

LCTES 2018: Proceedings of the 19th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems

Pages 89–93

ABSTRACT

Deep Neural Networks (DNNs) are the algorithm of choice for image processing applications. DNNs present highly parallel workloads that lead to the emergence of custom hardware accelerators. Deep Learning (DL) models specialized in different tasks require a programmable custom hardware and a compiler/mapper to efficiently translate different DNNs into an efficient dataflow in the accelerator. The goal of this paper is to present a compiler for running DNNs on Snowflake, which is a programmable hardware accelerator that targets DNNs. The compiler correctly generates instructions for various DL models: AlexNet, VGG, ResNet and LightCNN9. Snowflake, with a varying number of processing units, was implemented on FPGA to measure the compiler and Snowflake performance properties upon scaling up. The system achieves 70 frames/s and 4.5 GB/s of off-chip memory bandwidth for AlexNet without linear layers on Xilinx’s Zynq-SoC XC7Z045 FPGA.

References

2017. Open Neural Network Exchange. (2017). https://github.com/onnx/onnxGoogle Scholar
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016).Google Scholar
Erfan Azarkhish, Davide Rossi, Igor Loi, and Luca Benini. 2017. Neurostream: Scalable and Energy Efficient Deep Learning with Smart Memory Cubes. arXiv preprint arXiv:1701.06420 (2017).Google Scholar
Soheil Bahrampour, Naveen Ramakrishnan, Lukas Schott, and Mohak Shah. 2015. Comparative study of deep learning software frameworks. arXiv preprint arXiv:1511.06435 (2015).Google Scholar
Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274 (2015).Google Scholar
Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A Spatial Architecture for Energy-efficient Dataflow for Convolutional Neural Networks. SIGARCH Comput. Archit. News 44, 3 (June 2016), 367-379. Google ScholarDigital Library
Ronan Collobert, Koray Kavukcuoglu, and Clément Farabet. 2011. Torch7: A matlab-like environment for machine learning. In BigLearn, NIPS Workshop.Google Scholar
Vinayak Gokhale, Aliasger Zaidy, Andre Xian Ming Chang, and Eugenio Culurciello. 2017 - in press. Snowflake: An Efficient Hardware Accelerator for Convolutional Neural Networks. In IEEE International Symposium on Circuits and Systems (ISCAS).Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. CoRR abs/1512.03385 (2015). http://arxiv.org/abs/1512.03385Google Scholar
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia. ACM, 675-678. Google ScholarDigital Library
Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. arXiv preprint arXiv:1704.04760 (2017). Google ScholarDigital Library
Alex Krizhevsky. 2014. One weird trick for parallelizing convolutional neural networks. CoRR abs/1404.5997 (2014). http://arxiv.org/abs/1404.5997Google Scholar
Shaoli Liu, Zidong Du, Jinhua Tao, Dong Han, Tao Luo, Yuan Xie, Yunji Chen, and Tianshi Chen. 2016. Cambricon: An instruction set architecture for neural networks. In Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, 393-405. Google ScholarDigital Library
Micron 2017. AC-510 UltraScale FPGA with Hybrid Memory Cube. Micron. http://picocomputing.com/wp-content/uploads/2016/01/AC-510_Product_Brief.pdfGoogle Scholar
Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, Yu Wang, and Huazhong Yang. 2016. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA'16). ACM, New York, NY, USA, 26-35. Google ScholarDigital Library
Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556 (2014). http://arxiv.org/abs/1409.1556Google Scholar
Marko Vitez. 2017. Thnets. (2017). https://github.com/mvitez/thnetsGoogle Scholar
Xiang Wu, Ran He, Zhenan Sun, and Tieniu Tan. 2015. A light CNN for deep face representation with noisy labels. arXiv preprint arXiv:1511.02683 (2015).Google Scholar
Xilinx 2015. ZC706 Evaluation Board for the Zynq-7000 XC7Z045 All Programmable SoC. Xilinx. http://www.xilinx.com/support/documentation/boards_and_kits/zc706/ug954-zc706-eval-board-xc7z045-ap-soc.pdf v1.5.Google Scholar
Chen Zhang, Zhenman Fang, Peipei Zhou, Peichen Pan, and Jason Cong. 2016. Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks. In Proceedings of the 35th International Conference on Computer-Aided Design (ICCAD '16). ACM, New York, NY, USA, Article 12, 8 pages. Google ScholarDigital Library

Index Terms

Deep neural networks compiler for a trace-based accelerator (short WIP paper)
1. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Deep neural networks compiler for a trace-based accelerator (short WIP paper)
LCTES '18

Deep Neural Networks (DNNs) are the algorithm of choice for image processing applications. DNNs present highly parallel workloads that lead to the emergence of custom hardware accelerators. Deep Learning (DL) models specialized in different tasks ...
Read More
Deep neural networks compiler for a trace-based accelerator
Abstract
Convolutional Neural Networks (CNNs) are the algorithm of choice for image processing applications. CNNs are a highly parallel workload that leads to the emergence of custom hardware accelerators. Deep Learning (DL) models specialized ...
Read More
An FPGA-based accelerator platform implements for convolutional neural network
HP3C '19: Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications

In recent years, convolutional neural network (CNN) has become widely universal in large number of applications including computer vision, natural language processing and automatic driving. However, the CNN-based methods are computational-intensive and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
LCTES 2018: Proceedings of the 19th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems
June 2018
112 pages
ISBN:9781450358033
DOI:10.1145/3211332
General Chair:
Zheng Zhang
Rutgers University, USA
,
Program Chair:
Christophe Dubach
University of Edinburgh, UK
ACM SIGPLAN Notices Volume 53, Issue 6
LCTES '18
June 2018
112 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3299710
Editor:
Matthew Fluet
Rodchester Institude of Technology
Issue’s Table of Contents
Copyright © 2018 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 June 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Compiler
DNN
accelerator
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate116of438submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 282
  Total Downloads
- Downloads (Last 12 months)11
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Deep neural networks compiler for a trace-based accelerator (short WIP paper)

LCTES 2018: Proceedings of the 19th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Deep neural networks compiler for a trace-based accelerator (short WIP paper)

Deep neural networks compiler for a trace-based accelerator

An FPGA-based accelerator platform implements for convolutional neural network