research-article

Public Access

Deep Learning Toolkit-Accelerated Analytical Co-Optimization of CNN Hardware and Dataflow

Authors:

Rongjian Liang,

Jiang HuAuthors Info & Claims

ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design

Article No.: 82, Pages 1 - 9

https://doi.org/10.1145/3508352.3549402

Published: 22 December 2022 Publication History

Abstract

The continuous growth of CNN complexity not only intensifies the need for hardware acceleration but also presents a huge challenge. That is, the solution space for CNN hardware design and dataflow mapping becomes enormously large besides the fact that it is discrete and lacks a well behaved structure. Most previous works either are stochastic metaheuristics, such as genetic algorithm, which are typically very slow for solving large problems, or rely on expensive sampling, e.g., Gumbel Softmax-based differentiable optimization and Bayesian optimization. We propose an analytical model for evaluating power and performance of CNN hardware design and dataflow solutions. Based on this model, we introduce a co-optimization method consisting of nonlinear programming and parallel local search. A key innovation in this model is its matrix form, which enables the use of deep learning toolkit for highly efficient computations of power/performance values and gradients in the optimization. In handling power-performance tradeoff, our method can lead to better solutions than minimizing a weighted sum of power and latency. The average relative error of our model compared with Timeloop is as small as 1%. Compared to state-of-the-art methods, our approach achieves solutions with up to 1.7 × shorter inference latency, 37.5% less power consumption, and 3 × less area on ResNet 18. Moreover, it provides a 6.2 × speedup of optimization runtime.

References

[1]

M. S. Abdelfattah, Ł. Dudziak, T. Chau, R. Lee, H. Kim, and N. D. Lane. 2020. Best of Both Worlds: AutoML Codesign of a CNN and its Hardware Accelerator. In Design Automation Conference. 1--6.

[2]

Y-H. Chen, T. Krishna, J. Emer, and V. Sze. 2017. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE Journal of Solid-State Circuits 52, 1 (2017), 127--138.

[3]

K. Choi, D. Hong, H. Yoon, J. Yu, Y. Kim, and J. Lee. 2021. DANCE: Differentiable Accelerator/Network Co-Exploration. In Design Automation Conference. 337--342.

[4]

J. Cong, P. Wei, C. H. Yu, and P. Zhang. 2018. Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture. In Design Automation Conference. 1--6.

[5]

C. Deng, Y. Sui, S. Liao, X. Qian, and B. Yuan. 2021. GoSPA: An Energy-efficient High-performance Globally Optimized SParse Convolutional Neural Network Accelerator. In International Symposium on Computer Architecture. 1110--1123.

[6]

Y. Fu, Y. A. Zhang, Y. Zhang, D. Cox, and Y. Lin. 2021. Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators. arXiv:2106.06575

[7]

K. Hegde, H. Asghari-Moghaddam, M. Pellauer, N. Crago, A. Jaleel, E. Solomonik, J. Emer, and C. Fletcher. 2019. ExTensor: An Accelerator for Sparse Tensor Algebra. In International Symposium on Microarchitecture. 319--333.

[8]

K. Hegde, P. Tsai, S. Huang, V. Chandra, A. Parashar, and C. Fletcher. 2021. Mind Mappings: Enabling Efficient Algorithm-Accelerator Mapping Space Search. In International Conference on Architectural Support for Programming Languages and Operating Systems. 943--958.

[9]

Q. Huang, M. Kang, G. Dinh, T. Norell, A. Kalaiah, J. Demmel, J. Wawrzynek, and Y. Shao. 2021. CoSA: Scheduling by Constrained Optimization for Spatial Accelerators. In International Symposium on Computer Architecture. 554--566.

[10]

Y. Huang, Y. Cheng, A. Bapna, O. Firat, M. X. Chen, D. Chen, H. Lee, J. Ngiam, Q. V. Le, Y. Wu, and Z. Chen. 2019. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism. In Conference on Neural Information Processing Systems. 103--112.

[11]

E. Jiang, S. Gu, and B. Poole. 2017. Categorical Reparameterization with Gumbel-Softmax. arXiv:1611.01144

[12]

W. Jiang, L. Yang, E. Sha, Q. Zhuge, S. Gu, S. Dasgupta, Y. Shi, and J. Hu. 2020. Hardware/Software Co-Exploration of Neural Architectures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2020), 1--6.

[13]

Norman P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Annual International Symposium on Computer Architecture. 1--12.

Digital Library

[14]

S. Kao, G. Jeong, and T. Krishna. 2020. ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning. In International Symposium on Microarchitecture. 622--636.

[15]

S.-C. Kao and T. Krishna. 2020. GAMMA: Automating the HW Mapping of DNN Models on Accelerators via Genetic Algorithm. In International Conference On Computer Aided Design. 1--9.

[16]

H. Kwon, P. Chatarasi, V. Sarkar, T. Krishna, M. Pellauer, and A. Parashar. 2020. MAESTRO: A Data-Centric Approach to Understand Reuse, Performance, and Hardware Cost of DNN Mappings. International Symposium on Microarchitecture 40, 3 (2020), 20--29.

[17]

H. Kwon, A. Samajdar, and T. Krishna. 2018. Maeri: Enabling flexible dataflow mapping over dnn accelerators via reconfigurable interconnects. ACM SIGPLAN Notices 53, 2 (2018), 461--475.

Digital Library

[18]

Y. Li, C. Hao, X. Zhang, X. Liu, Y. Chen, J. Xiong, W. Hwu, and D. Chen. 2020. EDD: Efficient Differentiable DNN Architecture and Implementation Co-search for Embedded AI Solutions. In Design Automation Conference. 1--6.

[19]

Y. Lin, Z. Jiang, J. Gu, W. Li, S. Dhar, H. Ren, B. Khailany, and D. Pan. 2020. Dreamplace: Deep learning toolkit-enabled gpu acceleration for modern vlsi placement. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 40, 4 (2020), 748--761.

[20]

A. Parashar, P. Raina, Y.-S. Shao, Y.-H. Chen, V. A. Ying, A. Mukkara, R. Venkatesan, B. Khailany, S. W. Keckler, and J. Emer. 2019. Timeloop: A Systematic Approach to DNN Accelerator Evaluation. In International Symposium on Performance Analysis of Systems and Software. 304--315.

[21]

A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. Emer, S. Keckler, and S. Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. In International Symposium on Computer Architecture. 27--40.

[22]

M. Parsa, J. P. Mitchell, C. D. Schuman, R. M. Patton, T. E. Potok, and K. Roy. 2020. Bayesian Multi-objective Hyperparameter Optimization for Accurate, Fast, and Efficient Neural Network Accelerator Design. Frontiers in Neuroscience 14 (2020).

[23]

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019).

[24]

B. Reagen, J. M. Hernandez-Lobato, R. Adolf, M. Gelbart, P. Whatmough, G.-Y. Wei, and D. Brooks. 2017. A Case for Efficient Accelerator Design Space Exploration via Bayesian Optimization. In International Symposium on Low Power Electronics and Design. 1--6.

[25]

O. Sener and V. Koltun. 2018. Multi-task learning as multi-objective optimization. Advances in neural information processing systems (2018), 525--536.

[26]

A. Stoutchinin, F. Conti, and L. Benini. 2019. Optimally Scheduling CNN Convolutions for Efficient Memory Access. arXiv:1902.01492

[27]

S. Venkataramani, J. Choi, et al. 2019. DeepTools: Compiler and Execution Runtime Extensions for RaPiD AI Accelerator. International Symposium on Microarchitecture 39, 5 (2019), 102--111.

[28]

L. Waeijen, S. Sioutas, M. Peemen, M. Lindwer, and H. Corporaal. 2021. ConvFusion: A Model for Layer Fusion in Convolutional Neural Networks. IEEE Access 9 (2021), 168245--168267.

[29]

Y. N. Wu, P. A. Tsai, A. Parashar, V. Sze, and J. S. Emer. 2021. Sparseloop: An Analytical, Energy-Focused Design Space Exploration Methodology for Sparse Tensor Accelerators. In International Symposium on Performance Analysis of Systems and Software. 232--234.

[30]

L. Yang, Z. Yan, M. Li, H. Kwon, L. Lai, T. Krishna, V. Chandra, W. Jiang, and Y. Shi. 2020. Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks. In Design Automation Conference. 1--6.

[31]

X. Yang, M. Gao, Q. Liu, J. Setter, J. Pu, A. Nayak, S. Bell, K. Cao, H. Ha, P. Raina, et al. 2020. Interstellar: Using halide's scheduling language to analyze dnn accelerators. In International Conference on Architectural Support for Programming Languages and Operating Systems. 369--383.

Digital Library

[32]

Y. Zhao, C. Li, Y. Wang, P. Xu, Y. Zhang, and Y. Lin. 2020. DNN-Chip Predictor: An Analytical Performance Predictor for DNN Accelerators with Various Dataflows and Hardware Architectures. In International Conference on Acoustics, Speech and Signal Processing.

[33]

Y. Zhou, X. Dong, B. Akin, M. Tan, D. Peng, T. Meng, A. Yazdanbakhsh, D. Huang, and R. Narayanaswami. 2021. Rethinking Co-design of Neural Architectures and Hardware Accelerators. arXiv:2102.08619

Cited By

Hu J(2023)Lightning Talk: Power and Performance Reconciliation – from Tradeoff to Win-Win2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247854(1-2)Online publication date: 9-Jul-2023
https://doi.org/10.1109/DAC56929.2023.10247854

Index Terms

Deep Learning Toolkit-Accelerated Analytical Co-Optimization of CNN Hardware and Dataflow
1. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators

Recommendations

DiMO-CNN: Deep Learning Toolkit-Accelerated Analytical Modeling and Optimization of CNN Hardware and Dataflow
The growing complexity of CNNs demands both hardware acceleration design and dataflow mapping solutions. The large co-design solution space presents a huge challenge. We introduce an analytical model for assessing CNN hardware design and dataflow ...
An Efficient FPGA-based Depthwise Separable Convolutional Neural Network Accelerator with Hardware Pruning
Convolutional neural networks (CNNs) have been widely deployed in computer vision tasks. However, the computation and resource intensive characteristics of CNN bring obstacles to its application on embedded systems. This article proposes an efficient ...
Dataflow optimization with layer-wise design variables estimation method for enflame CNN accelerators
Abstract
As convolution layers have been proved to be the most time-consuming operation in convolutional neural network (CNN) algorithms, many efficient CNN accelerators have been designed to boost the performance of convolution operations. Previous works ...
Highlights
- A dataflow optimization method for efficient design space explorations is proposed.
- It narrows the design space and enumerates solutions to select the optimal variables.
- The optimization is validated on accelerator for computing ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design

October 2022

1467 pages

ISBN:9781450392174

DOI:10.1145/3508352

Conference Chair:
Tulika Mitra
National University of Singapore
,
Program Chairs:
Evangeline Young
The Chinese University of Hong Kong
,
Jinjun Xiong
University at Buffalo (UB)

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

In-Cooperation

IEEE-EDS: Electronic Devices Society
IEEE CAS
IEEE CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 December 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSF (National Science Foundation)

Conference

ICCAD '22

Sponsor:

SIGDA

ICCAD '22: IEEE/ACM International Conference on Computer-Aided Design

October 30 - November 3, 2022

California, San Diego

Acceptance Rates

Overall Acceptance Rate 457 of 1,762 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
389
Total Downloads

Downloads (Last 12 months)195
Downloads (Last 6 weeks)15

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hu J(2023)Lightning Talk: Power and Performance Reconciliation – from Tradeoff to Win-Win2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247854(1-2)Online publication date: 9-Jul-2023
https://doi.org/10.1109/DAC56929.2023.10247854

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten