research-article

Bridge the Gap between Neural Networks and Neuromorphic Hardware with a Neural Network Compiler

Authors:
Yu Ji

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Youhui Zhang

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Wenguang Chen

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Yuan Xie

University of California at Santa Barbara, Santa Barbara, CA, USA

University of California at Santa Barbara, Santa Barbara, CA, USA
View Profile

ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating SystemsMarch 2018Pages 448–460https://doi.org/10.1145/3173162.3173205

Published:19 March 2018Publication History

ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems

Pages 448–460

ABSTRACT

Different from developing neural networks (NNs) for general-purpose processors, the development for NN chips usually faces with some hardware-specific restrictions, such as limited precision of network signals and parameters, constrained computation scale, and limited types of non-linear functions. This paper proposes a general methodology to address the challenges. We decouple the NN applications from the target hardware by introducing a compiler that can transform an existing trained, unrestricted NN into an equivalent network that meets the given hardware's constraints. We propose multiple techniques to make the transformation adaptable to different kinds of NN chips, and reliable for restrict hardware constraints. We have built such a software tool that supports both spiking neural networks (SNNs) and traditional artificial neural networks (ANNs). We have demonstrated its effectiveness with a fabricated neuromorphic chip and a processing-in-memory (PIM) design. Tests show that the inference error caused by this solution is insignificant and the transformation time is much shorter than the retraining time. Also, we have studied the parameter-sensitivity evaluations to explore the tradeoffs between network error and resource utilization for different transformation strategies, which could provide insights for co-design optimization of neuromorphic hardware and software.

References

Mart'ın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Gregory S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian J. Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Józefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Gordon Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul A. Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda B. Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. CoRR Vol. abs/1603.04467 (2016). http://arxiv.org/abs/1603.04467Google Scholar
A. Agarwal, E. Akchurin, and C. Basoglu. 2014. An introduction to computational networks and the computational network toolkit. (2014).Google Scholar
Filipp Akopyan, Jun Sawada, Andrew Cassidy, Rodrigo Alvarez-Icaza, John Arthur, Paul Merolla, Nabil Imam, Yutaka Nakamura, Pallab Datta, Gi-Joon Nam, Brian Taba, Michael Beakes, Bernard Brezzo, Jente B Kuang, Rajit Manohar, William P Risk, Bryan Jackson, and Dharmendra S Modha. 2015. Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Vol. 34, 10 (2015), 1537--1557.Google ScholarDigital Library
Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermüller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano, Tim Cooijmans, Marc-Alexandre Côté, Myriam Côté, Aaron C. Courville, Yann N. Dauphin, Olivier Delalleau, Julien Demouth, Guillaume Desjardins, Sander Dieleman, Laurent Dinh, Melanie Ducoffe, Vincent Dumoulin, Samira Ebrahimi Kahou, Dumitru Erhan, Ziye Fan, Orhan Firat, Mathieu Germain, Xavier Glorot, Ian J. Goodfellow, Matthew Graham, cCaglar Gülccehre, Philippe Hamel, Iban Harlouchet, Jean-Philippe Heng, Balázs Hidasi, Sina Honari, Arjun Jain, Sébastien Jean, Kai Jia, Mikhail Korobov, Vivek Kulkarni, Alex Lamb, Pascal Lamblin, Eric Larsen, César Laurent, Sean Lee, Simon Lefranccois, Simon Lemieux, Nicholas Léonard, Zhouhan Lin, Jesse A. Livezey, Cory Lorenz, Jeremiah Lowin, Qianli Ma, Pierre-Antoine Manzagol, Olivier Mastropietro, Robert McGibbon, Roland Memisevic, Bart van Merriënboer, Vincent Michalski, Mehdi Mirza, Alberto Orlandi, Christopher Joseph Pal, Razvan Pascanu, Mohammad Pezeshki, Colin Raffel, Daniel Renshaw, Matthew Rocklin, Adriana Romero, Markus Roth, Peter Sadowski, John Salvatier, Franccois Savard, Jan Schlüter, John Schulman, Gabriel Schwartz, Iulian Vlad Serban, Dmitriy Serdyuk, Samira Shabanian, Étienne Simon, Sigurd Spieckermann, S. Ramana Subramanyam, Jakub Sygnowski, Jérémie Tanguay, Gijs van Tulder, Joseph P. Turian, Sebastian Urban, Pascal Vincent, Francesco Visin, Harm de Vries, David Warde-Farley, Dustin J. Webb, Matthew Willson, Kelvin Xu, Lijun Xue, Li Yao, Saizheng Zhang, and Ying Zhang. 2016. Theano: A Python framework for fast computation of mathematical expressions. CoRR Vol. abs/1605.02688 (2016). http://arxiv.org/abs/1605.02688Google Scholar
Arnon Amir, Pallab Datta, William P Risk, Andrew S Cassidy, Jeffrey A Kusnitz, Steve K Esser, Alexander Andreopoulos, Theodore M Wong, Myron Flickner, Rodrigo Alvarez-Icaza, Emmett McQuinn, Ben Shaw, Norm Pass, and Dharmendra S Modha. 2013. Cognitive computing programming paradigm: a corelet language for composing networks of neurosynaptic cores. In Neural Networks (IJCNN), The 2013 International Joint Conference on. IEEE, 1--10.Google Scholar
Sajid Anwar, Kyuyeon Hwang, and Wonyong Sung. 2015. Fixed point optimization of deep convolutional neural networks for object recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 1131--1135.Google ScholarCross Ref
Ben Varkey Benjamin, Peiran Gao, Emmett McQuinn, Swadesh Choudhary, Anand R Chandrasekaran, Jean-Marie Bussat, Rodrigo Alvarez-Icaza, John V Arthur, Paul A Merolla, and Kwabena Boahen. 2014. Neurogrid: A mixed-analog-digital multichip system for large-scale neural simulations. Proc. IEEE Vol. 102, 5 (2014), 699--716.Google ScholarCross Ref
Mahdi Nazm Bojnordi and Engin Ipek. 2016. Memristive Boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning. In High Performance Computer Architecture (HPCA), 2016 IEEE International Symposium on. 1--13.Google ScholarCross Ref
Snaider Carrillo, Jim Harkin, Liam J McDaid, Fearghal Morgan, Sandeep Pande, Seamus Cawley, and Brian McGinley. 2013. Scalable hierarchical network-on-chip architecture for spiking neural network hardware implementations. IEEE Transactions on Parallel and Distributed Systems Vol. 24, 12 (2013), 2451--2461. Google ScholarDigital Library
Andrew S Cassidy, Paul Merolla, John V Arthur, Steve K Esser, Bryan Jackson, Rodrigo Alvarez-Icaza, Pallab Datta, Jun Sawada, Theodore M Wong, Vitaly Feldman, Arnon Amir, Daniel Ben-Dayan Rubin, Filipp Akopyan, Emmett McQuinn, William P Risk, and Dharmendra S Modha. 2013. Cognitive computing building block: A versatile and efficient digital neuron model for neurosynaptic cores. In Neural Networks (IJCNN), The 2013 International Joint Conference on. IEEE, 1--10.Google Scholar
Lukas Cavigelli and Luca Benini. 2016. A 803 gop/s/w convolutional network accelerator. IEEE Transactions on Circuits and Systems for Video Technology (2016).Google Scholar
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014 a. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In ACM Sigplan Notices, Vol. Vol. 49. ACM, 269--284. Google ScholarDigital Library
Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015 b. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274.Google Scholar
Wenlin Chen, James Wilson, Stephen Tyree, Kilian Weinberger, and Yixin Chen. 2015 c. Compressing neural networks with the hashing trick International Conference on Machine Learning. 2285--2294. Google ScholarDigital Library
Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Teman. 2014 b. Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 609--622. Google ScholarDigital Library
Y. H. Chen, T. Krishna, J. Emer, and V. Sze. 2016. 14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. In 2016 IEEE International Solid-State Circuits Conference (ISSCC). 262--263.Google Scholar
Zhen Chen, Bin Gao, Zheng Zhou, Peng Huang, Haitong Li, and Wenjia Ma. 2015 a. Optimized learning scheme for grayscale image recognition in a RRAM based analog neuromorphic system. In Electron Devices Meeting (IEDM), 2015 IEEE International. IEEE.Google ScholarCross Ref
Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory. In Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, 27--39. Google ScholarDigital Library
Ronan Collobert, Koray Kavukcuoglu, and Clement Farabet. 2011. Torch7: A Matlab-like Environment for Machine Learning neural information processing systems.Google Scholar
Misha Denil, Babak Shakibi, Laurent Dinh, Marctextquotesingle Aurelio Ranzato, and Nando de Freitas. 2013. Predicting Parameters in Deep Learning. In Advances in Neural Information Processing Systems 26, bibfieldeditorC. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2148--2156. Google ScholarDigital Library
Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: Shifting vision processing closer to the sensor ACM SIGARCH Computer Architecture News, Vol. Vol. 43. ACM, 92--104. Google ScholarDigital Library
Steven K Esser, Paul A Merolla, John V Arthur, Andrew S Cassidy, Rathinakumar Appuswamy, Alexander Andreopoulos, David J Berg, Jeffrey L McKinstry, Timothy Melano, Davis R Barch, Carmelo di Nolfo, Pallab Datta, Arnon Amir, Brian Taba, Myron D Flickner, and Dharmendra S Modha. 2016. Convolutional networks for fast, energy-efficient neuromorphic computing. Proceedings of the National Academy of Sciences (2016), 201604850.Google ScholarCross Ref
Clément Farabet, Berin Martini, Benoit Corda, Polina Akselrod, Eugenio Culurciello, and Yann LeCun. 2011. Neuflow: A runtime reconfigurable dataflow processor for vision Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Conference on. IEEE, 109--116.Google Scholar
Clément Farabet, Cyril Poulet, Jefferson Y Han, and Yann LeCun. 2009. Cnp: An fpga-based processor for convolutional networks Field Programmable Logic and Applications, 2009. FPL 2009. International Conference on. IEEE, 32--37.Google Scholar
Steve B Furber, David R Lester, Luis A Plana, Jim D Garside, Eustace Painkras, Steve Temple, and Andrew D Brown. 2013. Overview of the spinnaker system architecture. IEEE Trans. Comput. Vol. 62, 12 (2013), 2454--2467. Google ScholarDigital Library
Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, and William J. Dally. 2016 a. ESE: Efficient Speech Recognition Engine with Compressed LS™ on FPGA. CoRR Vol. abs/1612.00694 (2016). http://arxiv.org/abs/1612.00694 Google ScholarDigital Library
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016 b. EIE: efficient inference engine on compressed deep neural network Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, 243--254. Google ScholarDigital Library
Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).Google Scholar
Bart LM Happel and Jacob MJ Murre. 1994. Design and evolution of modular neural network architectures. Neural networks Vol. 7, 6 (1994), 985--1004. Google ScholarDigital Library
Kurt Hornik, Maxwell Stinchcombe, and Halbert White. 1989. Multilayer feedforward networks are universal approximators. Neural networks Vol. 2, 5 (1989), 359--366. Google ScholarDigital Library
Miao Hu, Hai Li, Yiran Chen, Qing Wu, and Garrett S Rose. 2013. BSB training scheme implementation on memristor-based circuit Computational Intelligence for Security and Defense Applications (CISDA), 2013 IEEE Symposium on. IEEE, 80--87.Google Scholar
Miao Hu, John Paul Strachan, Zhiyong Li, Emmanuelle M Grafals, Noraica Davila, Catherine Graves, Sity Lam, Ning Ge, Jianhua Joshua Yang, and R Stanley Williams. 2016. Dot-product engine for neuromorphic computing: programming 1T1M crossbar to accelerate matrix-vector multiplication. In Design Automation Conference (DAC), 2016 53nd ACM/EDAC/IEEE. IEEE, 1--6. Google ScholarDigital Library
Eric Hunsberger and Chris Eliasmith. 2016. Training Spiking Deep Networks for Neuromorphic Hardware. CoRR Vol. abs/1611.05141 (2016). http://arxiv.org/abs/1611.05141Google Scholar
Kyuyeon Hwang and Wonyong Sung. 2014. Fixed-point feedforward deep neural network design using weightsGoogle Scholar
1, 0, and- 1 Signal Processing Systems (SiPS), 2014 IEEE Workshop on. IEEE, 1--6.Google Scholar
YU Ji, Youhui Zhang, ShuangChen Li, Ping Chi, CiHang Jiang, Peng Qu, Yuan Xie, and WenGuang Chen. 2016. NEUTRAMS: Neural Network Transformation and Co-design under Neuromorphic Hardware Constraints. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on. Google ScholarDigital Library
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding Proceedings of the 22nd ACM international conference on Multimedia. ACM, 675--678. Google ScholarDigital Library
Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, Richard C. Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. CoRR Vol. abs/1704.04760. http://arxiv.org/abs/1704.04760 Google ScholarDigital Library
Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor M Aamodt, and Andreas Moshovos. 2016. Stripes: Bit-serial deep neural network computing. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on. IEEE, 1--12. Google ScholarDigital Library
Duckhwan Kim, Jaeha Kung, Sek Chai, Sudhakar Yalamanchili, and Saibal Mukhopadhyay. 2016. Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory. In Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on. IEEE, 380--392. Google ScholarDigital Library
Yongtae Kim, Yong Zhang, and Peng Li. 2015. A Reconfigurable Digital Neuromorphic Processor with Memristive Synaptic Crossbar for Cognitive Computing. J. Emerg. Technol. Comput. Syst. Vol. 11, 4, Article bibinfoarticleno38 (April. 2015), 25 pages. dl.acm.org/citation.cfm?id=1373936.1373969 Google ScholarDigital Library
Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing fpga-based accelerator design for deep convolutional neural networks Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 161--170. Google ScholarDigital Library
Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-X: An accelerator for sparse neural networks Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on. IEEE, 1--12. Google ScholarDigital Library

Index Terms

Bridge the Gap between Neural Networks and Neuromorphic Hardware with a Neural Network Compiler
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks

Recommendations

Bridge the Gap between Neural Networks and Neuromorphic Hardware with a Neural Network Compiler
ASPLOS '18

Different from developing neural networks (NNs) for general-purpose processors, the development for NN chips usually faces with some hardware-specific restrictions, such as limited precision of network signals and parameters, constrained computation ...
Read More
Hardware implementation of neural network with Sigmoidal activation functions using CORDIC

Activation function is the most important function in neural network processing. In this article, the field-programmable gate array (FPGA)-based hardware implementation of a multilayer feed-forward neural network, with a log sigmoid activation function ...
Read More
GMDH-type neural network algorithm with a feedback loop for structural identification of RBF neural network

In this paper, a Group Method of Data Handling (GMDH)-type neural network algorithm with a feedback loop for structural identification of Radial Basis Function (RBF) neural network is proposed. In case of the GMDH-type neural network, the network ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems
March 2018
827 pages
ISBN:9781450349116
DOI:10.1145/3173162
General Chairs:
Xipeng Shen
North Carolina State University, USA
,
James Tuck
North Carolina State University, USA
,
Program Chairs:
Ricardo Bianchini
Microsoft Research, USA
,
Vivek Sarkar
Georgia Institute of Technology, USA
ACM SIGPLAN Notices Volume 53, Issue 2
ASPLOS '18
February 2018
809 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3296957
Editor:
Matthew Fluet
Rodchester Institude of Technology
Issue’s Table of Contents
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 March 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
accelerator
compiler
neural network
Qualifiers
- research-article
Conference

Acceptance Rates
ASPLOS '18 Paper Acceptance Rate56of319submissions,18%Overall Acceptance Rate535of2,713submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 38
  Total Citations
  View Citations
- 1,427
  Total Downloads
- Downloads (Last 12 months)98
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Bridge the Gap between Neural Networks and Neuromorphic Hardware with a Neural Network Compiler

ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Bridge the Gap between Neural Networks and Neuromorphic Hardware with a Neural Network Compiler

Hardware implementation of neural network with Sigmoidal activation functions using CORDIC

GMDH-type neural network algorithm with a feedback loop for structural identification of RBF neural network