research-article

Hardware Computation Graph for DNN Accelerator Design Automation without Inter-PU Templates

Authors:

Wu-Jun LiAuthors Info & Claims

ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design

Article No.: 47, Pages 1 - 9

https://doi.org/10.1145/3508352.3549342

Published: 22 December 2022 Publication History

Abstract

Existing deep neural network (DNN) accelerator design automation (ADA) methods adopt architecture templates to predetermine parts of design choices and then explore the left design choices beyond templates. These templates can be classified into intra-PU templates and inter-PU templates according to the architecture hierarchy. Since templates limit the flexibility of ADA, designing effective ADA methods without templates has become an important research topic. Although there have appeared some works to enhance the flexibility of ADA by removing intra-PU templates, to the best of our knowledge no existing works have studied ADA methods without inter-PU templates. ADA with predetermined inter-PU templates is typically inefficient in terms of resource utilization, especially for DNNs with complex topology. In this paper, we propose a novel method, called hardware computation graph (HCG), for ADA without inter-PU templates. Experiments show that HCG method can achieve competitive latency while using only 1.4× ~ 5× fewer on-chip memory, compared with existing state-of-the-art ADA methods.

References

[1]

Byung Hoon Arm, Jinwon Lee, Jamie Menjay Lin, Hsin Pai Cheng, Jilei Hou, and Hadi Esmaeilzadeh. 2020. Ordering Chaos: Memory-Aware Scheduling of Irregularly Wired Neural Networks for Edge Devices. In Conference on Machine Learning and Systems (MLSys).

[2]

Yao Chen, Jiong He, Xiaofan Zhang, Cong Hao, and Deming Chen. 2019. Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs. In International Symposium on Field-Programmable Gate Arrays (FPGA).

Digital Library

[3]

Hasan Genc, Ameer Haj-Ali, Vighnesh Iyer, Alon Amid, Howard Mao, John Wright, Colin Schmidt, Jerry Zhao, Albert J. Ou, Max Banister, Yakun Sophia Shao, Borivoje Nikolic, Ion Stoica, and Krste Asanovic. 2021. Gemmini: An Agile Systolic Array Generator Enabling Systematic Evaluations of Deep-Learning Architectures. In Design Automation Conference (DAC).

[4]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Conference on Computer Vision and Pattern Recognition (CVPR).

[5]

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely Connected Convolutional Networks. In Conference on Computer Vision and Pattern Recognition (CVPR).

[6]

Ranggi Hwang, Taehun Kim, Youngeun Kwon, and Minsoo Rhu. 2020. Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations. In International Symposium on Computer Architecture (ISCA).

[7]

Liancheng Jia, Zizhang Luo, Liqiang Lu, and Yun Liang. 2021. TensorLib: A Spatial Accelerator Generation Framework for Tensor Algebra. In Design Automation Conference (DAC).

[8]

Hamza Khan, Asma Khan, Zainab Khan, Lun Bin Huang, Kun Wang, and Lei He. 2021. NPE: An FPGA-Based Overlay Processor for Natural Language Processing. In International Symposium on Field-Programmable Gate Arrays (FPGA).

[9]

Ryosuke Kuramochi and Hiroki Nakahara. 2020. An FPGA-Based Low-Latency Accelerator for Randomly Wired Neural Networks. In International Conference on Field-Programmable Logic and Applications (FPL).

[10]

Xinhan Lin, Shouyi Yin, Fengbin Tu, Leibo Liu, Xiangyu Li, and Shaojun Wei. 2018. LCP: A Layer Clusters Paralleling Mapping Method for Accelerating Inception and Residual Networks on FPGA. In Design Automation Conference (DAC).

Digital Library

[11]

Yujun Lin, Mengtian Yang, and Song Han. 2021. NAAS: Neural Accelerator Architecture Search. In Design Automation Conference (DAC).

[12]

Liqiang Lu, Naiqing Guan, Yuyue Wang, Liancheng Jia, Zizhang Luo, Jieming Yin, Jason Cong, and Yun Liang. 2021. TENET: A Framework for Modeling Tensor Dataflow Based on Relation-centric Notation. In International Symposium on Computer Architecture (ISCA).

[13]

Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le. 2019. Regularized evolution for image classifier architecture search. In AAAI Conference on Artificial Intelligence (AAAI).

Digital Library

[14]

Yakun Sophia Shao, Jason Clemons, Rangharajan Venkatesan, Brian Zimmer, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia Klinefelter, Nathaniel Pinckney, Priyanka Raina, Stephen G. Tell, Yanqing Zhang, William J. Dally, Joel Emer, C. Thomas Gray, Brucek Khailany, and Stephen W. Keckler. 2019. Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture. In International Symposium on Microarchitecture (MICRO).

[15]

Hardik Sharma, Jongse Park, Divya Mahajan, Emmanuel Amaro, Joon Kyung Kim, Chenkai Shao, Asit Mishra, and Hadi Esmaeilzadeh. 2016. From high-level deep neural models to FPGAs. In International Symposium on Microarchitecture (MICRO).

[16]

Yongming Shen, Michael Ferdman, and Peter Milder. 2017. Maximizing CNN Accelerator Efficiency Through Resource Partitioning. In International Symposium on Computer Architecture (ISCA).

[17]

Linghao Song, Fan Chen, Youwei Zhuo, Xuehai Qian, Hai Li, and Yiran Chen. 2020. AccPar: Tensor Partitioning for Heterogeneous Deep Learning Accelerators. In International Symposium on High Performance Computer Architecture (HPCA).

[18]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going Deeper with Convolutions. In Conference on Computer Vision and Pattern Recognition (CVPR).

[19]

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2016. Rethinking the Inception Architecture for Computer Vision. In Conference on Computer Vision and Pattern Recognition (CVPR).

[20]

Deguang Wang, Junzhong Shen, Mei Wen, and Chunyuan Zhang. 2019. An Efficient Design Flow for Accelerating Complicated-Connected CNNs on a Multi-FPGA Platform. In International Conference on Parallel Processing (ICPP).

Digital Library

[21]

Xuechao Wei, Yun Liang, and Jason Cong. 2019. Overcoming Data Transfer Bottlenecks in FPGA-based DNN Accelerators via Layer Conscious Memory Management. In Design Automation Conference (DAC).

Digital Library

[22]

Qingcheng Xiao, Yun Liang, Liqiang Lu, Shengen Yan, and Yu Wing Tai. 2017. Exploring Heterogeneous Algorithms for Accelerating Deep Convolutional Neural Networks on FPGAs. In Design Automation Conference (DAC).

[23]

Saining Xie, Alexander Kirillov, Ross Girshick, and Kaiming He. 2019. Exploring Randomly Wired Neural Networks for Image Recognition. In International Conference on Computer Vision (ICCV).

[24]

Xilinx, Inc. 2021. UltraScale Architecture Memory Resources. Xilinx, Inc.

[25]

Xuan Yang, Mingyu Gao, Qiaoyi Liu, Jeff Setter, Jing Pu, Ankita Nayak, Steven Bell, Kaidi Cao, Heonjae Ha, Priyanka Raina, Christos Kozyrakis, and Mark Horowitz. 2020. Interstellar: Using Halide's Scheduling Language to Analyze DNN AcceleratorO. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

Digital Library

[26]

Hanchen Ye, Xiaofan Zhang, Zhize Huang, Gengsheng Chen, and Deming Chen. 2020. HybridDNN: A Framework for High-Performance Hybrid DNN Accelerator Design and Implementation. In Design Automation Conference (DAC).

[27]

Chen Zhang, Zhenman Fang, Peipei Zhou, Peichen Pan, and Jason Cong. 2016. Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks. In International Conference on Computer-Aided Design (ICCAD).

[28]

Xiaofan Zhang, Dawei Wang, Pierce Chuang, Shugao Ma, Deming Chen, and Yuecheng Li. 2021. F-CAD: A Framework to Explore Hardware Accelerators for Codec Avatar Decoding. In Design Automation Conference (DAC).

Digital Library

[29]

Xiaofan Zhang, Junsong Wang, Chao Zhu, Yonghua Lin, Jinjun Xiong, Wen-Mei W. Hwu, and Deming Chen. 2018. DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs. In International Conference on Computer-Aided Design (ICCAD).

Digital Library

[30]

Xiaofan Zhang, Hanchen Ye, Junsong Wang, Yonghua Lin, Jinjun Xiong, Wenmei Hwu, and Deming Chen. 2020. DNNExplorer: A Framework for Modeling and Exploring a Novel Paradigm of FPGA-Based DNN Accelerator. In International Conference on Computer-Aided Design (ICCAD).

Digital Library

[31]

Size Zheng, Renze Chen, Anjiang Wei, Yicheng Jin, Qin Han, Liqiang Lu, Bingyang Wu, Xiuhong Li, Shengen Yan, and Yun Liang. 2022. AMOS: Enabling Automatic Mapping for Tensor Computations On Spatial Accelerators with Hardware Abstraction. In International Symposium on Computer Architecture (ISCA).

Index Terms

Hardware Computation Graph for DNN Accelerator Design Automation without Inter-PU Templates
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
  2. Embedded and cyber-physical systems
2. Hardware
  1. Electronic design automation
    1. High-level and register-transfer level synthesis
    2. Methodologies for EDA
      1. Software tools for EDA
  2. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators

Index terms have been assigned to the content through auto-classification.

Recommendations

Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

The efficacy and effectiveness of Convolutional Neural Networks (CNNs) have been proven in a wide range of machine learning applications. However, the high computational complexity of CNNs presents a critical challenge towards their broader adoption in ...
Reconfigurable Hardware Accelerator for Convolution Operations in Convolutional Neural Networks
ICCBN '24: Proceedings of the 2024 12th International Conference on Communications and Broadband Networking

Convolutional neural network (CNN) have significantly advanced image classification, video processing, and pattern recognition. Compared to other hardware deployment platforms, field programmable gate arrays (FPGAs) offer advantages such as ...
Hardware Accelerator Design Based on Rough Set Philosophy
Rough Sets and Knowledge Technology
Abstract
This paper presents a design of hardware accelerator for algorithms of rough set theory. A hardware implementation of incremental reduct generation and rule induction is proposed in this paper. Incremental reduct generation algorithm is based on ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design

October 2022

1467 pages

ISBN:9781450392174

DOI:10.1145/3508352

Conference Chair:
Tulika Mitra
National University of Singapore
,
Program Chairs:
Evangeline Young
The Chinese University of Hong Kong
,
Jinjun Xiong
University at Buffalo (UB)

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

In-Cooperation

IEEE-EDS: Electronic Devices Society
IEEE CAS
IEEE CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 December 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Key R&D Program of China; NSFC

Conference

ICCAD '22

Sponsor:

SIGDA

ICCAD '22: IEEE/ACM International Conference on Computer-Aided Design

October 30 - November 3, 2022

California, San Diego

Acceptance Rates

Overall Acceptance Rate 457 of 1,762 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
132
Total Downloads

Downloads (Last 12 months)31
Downloads (Last 6 weeks)3

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten