skip to main content
10.1145/3508352.3549342acmconferencesArticle/Chapter ViewAbstractPublication PagesiccadConference Proceedingsconference-collections
research-article

Hardware Computation Graph for DNN Accelerator Design Automation without Inter-PU Templates

Published: 22 December 2022 Publication History

Abstract

Existing deep neural network (DNN) accelerator design automation (ADA) methods adopt architecture templates to predetermine parts of design choices and then explore the left design choices beyond templates. These templates can be classified into intra-PU templates and inter-PU templates according to the architecture hierarchy. Since templates limit the flexibility of ADA, designing effective ADA methods without templates has become an important research topic. Although there have appeared some works to enhance the flexibility of ADA by removing intra-PU templates, to the best of our knowledge no existing works have studied ADA methods without inter-PU templates. ADA with predetermined inter-PU templates is typically inefficient in terms of resource utilization, especially for DNNs with complex topology. In this paper, we propose a novel method, called hardware computation graph (HCG), for ADA without inter-PU templates. Experiments show that HCG method can achieve competitive latency while using only 1.4× ~ 5× fewer on-chip memory, compared with existing state-of-the-art ADA methods.

References

[1]
Byung Hoon Arm, Jinwon Lee, Jamie Menjay Lin, Hsin Pai Cheng, Jilei Hou, and Hadi Esmaeilzadeh. 2020. Ordering Chaos: Memory-Aware Scheduling of Irregularly Wired Neural Networks for Edge Devices. In Conference on Machine Learning and Systems (MLSys).
[2]
Yao Chen, Jiong He, Xiaofan Zhang, Cong Hao, and Deming Chen. 2019. Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs. In International Symposium on Field-Programmable Gate Arrays (FPGA).
[3]
Hasan Genc, Ameer Haj-Ali, Vighnesh Iyer, Alon Amid, Howard Mao, John Wright, Colin Schmidt, Jerry Zhao, Albert J. Ou, Max Banister, Yakun Sophia Shao, Borivoje Nikolic, Ion Stoica, and Krste Asanovic. 2021. Gemmini: An Agile Systolic Array Generator Enabling Systematic Evaluations of Deep-Learning Architectures. In Design Automation Conference (DAC).
[4]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Conference on Computer Vision and Pattern Recognition (CVPR).
[5]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely Connected Convolutional Networks. In Conference on Computer Vision and Pattern Recognition (CVPR).
[6]
Ranggi Hwang, Taehun Kim, Youngeun Kwon, and Minsoo Rhu. 2020. Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations. In International Symposium on Computer Architecture (ISCA).
[7]
Liancheng Jia, Zizhang Luo, Liqiang Lu, and Yun Liang. 2021. TensorLib: A Spatial Accelerator Generation Framework for Tensor Algebra. In Design Automation Conference (DAC).
[8]
Hamza Khan, Asma Khan, Zainab Khan, Lun Bin Huang, Kun Wang, and Lei He. 2021. NPE: An FPGA-Based Overlay Processor for Natural Language Processing. In International Symposium on Field-Programmable Gate Arrays (FPGA).
[9]
Ryosuke Kuramochi and Hiroki Nakahara. 2020. An FPGA-Based Low-Latency Accelerator for Randomly Wired Neural Networks. In International Conference on Field-Programmable Logic and Applications (FPL).
[10]
Xinhan Lin, Shouyi Yin, Fengbin Tu, Leibo Liu, Xiangyu Li, and Shaojun Wei. 2018. LCP: A Layer Clusters Paralleling Mapping Method for Accelerating Inception and Residual Networks on FPGA. In Design Automation Conference (DAC).
[11]
Yujun Lin, Mengtian Yang, and Song Han. 2021. NAAS: Neural Accelerator Architecture Search. In Design Automation Conference (DAC).
[12]
Liqiang Lu, Naiqing Guan, Yuyue Wang, Liancheng Jia, Zizhang Luo, Jieming Yin, Jason Cong, and Yun Liang. 2021. TENET: A Framework for Modeling Tensor Dataflow Based on Relation-centric Notation. In International Symposium on Computer Architecture (ISCA).
[13]
Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le. 2019. Regularized evolution for image classifier architecture search. In AAAI Conference on Artificial Intelligence (AAAI).
[14]
Yakun Sophia Shao, Jason Clemons, Rangharajan Venkatesan, Brian Zimmer, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia Klinefelter, Nathaniel Pinckney, Priyanka Raina, Stephen G. Tell, Yanqing Zhang, William J. Dally, Joel Emer, C. Thomas Gray, Brucek Khailany, and Stephen W. Keckler. 2019. Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture. In International Symposium on Microarchitecture (MICRO).
[15]
Hardik Sharma, Jongse Park, Divya Mahajan, Emmanuel Amaro, Joon Kyung Kim, Chenkai Shao, Asit Mishra, and Hadi Esmaeilzadeh. 2016. From high-level deep neural models to FPGAs. In International Symposium on Microarchitecture (MICRO).
[16]
Yongming Shen, Michael Ferdman, and Peter Milder. 2017. Maximizing CNN Accelerator Efficiency Through Resource Partitioning. In International Symposium on Computer Architecture (ISCA).
[17]
Linghao Song, Fan Chen, Youwei Zhuo, Xuehai Qian, Hai Li, and Yiran Chen. 2020. AccPar: Tensor Partitioning for Heterogeneous Deep Learning Accelerators. In International Symposium on High Performance Computer Architecture (HPCA).
[18]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going Deeper with Convolutions. In Conference on Computer Vision and Pattern Recognition (CVPR).
[19]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2016. Rethinking the Inception Architecture for Computer Vision. In Conference on Computer Vision and Pattern Recognition (CVPR).
[20]
Deguang Wang, Junzhong Shen, Mei Wen, and Chunyuan Zhang. 2019. An Efficient Design Flow for Accelerating Complicated-Connected CNNs on a Multi-FPGA Platform. In International Conference on Parallel Processing (ICPP).
[21]
Xuechao Wei, Yun Liang, and Jason Cong. 2019. Overcoming Data Transfer Bottlenecks in FPGA-based DNN Accelerators via Layer Conscious Memory Management. In Design Automation Conference (DAC).
[22]
Qingcheng Xiao, Yun Liang, Liqiang Lu, Shengen Yan, and Yu Wing Tai. 2017. Exploring Heterogeneous Algorithms for Accelerating Deep Convolutional Neural Networks on FPGAs. In Design Automation Conference (DAC).
[23]
Saining Xie, Alexander Kirillov, Ross Girshick, and Kaiming He. 2019. Exploring Randomly Wired Neural Networks for Image Recognition. In International Conference on Computer Vision (ICCV).
[24]
Xilinx, Inc. 2021. UltraScale Architecture Memory Resources. Xilinx, Inc.
[25]
Xuan Yang, Mingyu Gao, Qiaoyi Liu, Jeff Setter, Jing Pu, Ankita Nayak, Steven Bell, Kaidi Cao, Heonjae Ha, Priyanka Raina, Christos Kozyrakis, and Mark Horowitz. 2020. Interstellar: Using Halide's Scheduling Language to Analyze DNN AcceleratorO. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[26]
Hanchen Ye, Xiaofan Zhang, Zhize Huang, Gengsheng Chen, and Deming Chen. 2020. HybridDNN: A Framework for High-Performance Hybrid DNN Accelerator Design and Implementation. In Design Automation Conference (DAC).
[27]
Chen Zhang, Zhenman Fang, Peipei Zhou, Peichen Pan, and Jason Cong. 2016. Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks. In International Conference on Computer-Aided Design (ICCAD).
[28]
Xiaofan Zhang, Dawei Wang, Pierce Chuang, Shugao Ma, Deming Chen, and Yuecheng Li. 2021. F-CAD: A Framework to Explore Hardware Accelerators for Codec Avatar Decoding. In Design Automation Conference (DAC).
[29]
Xiaofan Zhang, Junsong Wang, Chao Zhu, Yonghua Lin, Jinjun Xiong, Wen-Mei W. Hwu, and Deming Chen. 2018. DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs. In International Conference on Computer-Aided Design (ICCAD).
[30]
Xiaofan Zhang, Hanchen Ye, Junsong Wang, Yonghua Lin, Jinjun Xiong, Wenmei Hwu, and Deming Chen. 2020. DNNExplorer: A Framework for Modeling and Exploring a Novel Paradigm of FPGA-Based DNN Accelerator. In International Conference on Computer-Aided Design (ICCAD).
[31]
Size Zheng, Renze Chen, Anjiang Wei, Yicheng Jin, Qin Han, Liqiang Lu, Bingyang Wu, Xiuhong Li, Shengen Yan, and Yun Liang. 2022. AMOS: Enabling Automatic Mapping for Tensor Computations On Spatial Accelerators with Hardware Abstraction. In International Symposium on Computer Architecture (ISCA).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design
October 2022
1467 pages
ISBN:9781450392174
DOI:10.1145/3508352
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE-EDS: Electronic Devices Society
  • IEEE CAS
  • IEEE CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 December 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. DNN accelerator
  2. FPGA
  3. accelerator design automation
  4. hardware computation graph

Qualifiers

  • Research-article

Funding Sources

  • National Key R&D Program of China; NSFC

Conference

ICCAD '22
Sponsor:
ICCAD '22: IEEE/ACM International Conference on Computer-Aided Design
October 30 - November 3, 2022
California, San Diego

Acceptance Rates

Overall Acceptance Rate 457 of 1,762 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 132
    Total Downloads
  • Downloads (Last 12 months)31
  • Downloads (Last 6 weeks)3
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media