skip to main content
10.1145/3508352.3549402acmconferencesArticle/Chapter ViewAbstractPublication PagesiccadConference Proceedingsconference-collections
research-article
Public Access

Deep Learning Toolkit-Accelerated Analytical Co-Optimization of CNN Hardware and Dataflow

Published: 22 December 2022 Publication History

Abstract

The continuous growth of CNN complexity not only intensifies the need for hardware acceleration but also presents a huge challenge. That is, the solution space for CNN hardware design and dataflow mapping becomes enormously large besides the fact that it is discrete and lacks a well behaved structure. Most previous works either are stochastic metaheuristics, such as genetic algorithm, which are typically very slow for solving large problems, or rely on expensive sampling, e.g., Gumbel Softmax-based differentiable optimization and Bayesian optimization. We propose an analytical model for evaluating power and performance of CNN hardware design and dataflow solutions. Based on this model, we introduce a co-optimization method consisting of nonlinear programming and parallel local search. A key innovation in this model is its matrix form, which enables the use of deep learning toolkit for highly efficient computations of power/performance values and gradients in the optimization. In handling power-performance tradeoff, our method can lead to better solutions than minimizing a weighted sum of power and latency. The average relative error of our model compared with Timeloop is as small as 1%. Compared to state-of-the-art methods, our approach achieves solutions with up to 1.7 × shorter inference latency, 37.5% less power consumption, and 3 × less area on ResNet 18. Moreover, it provides a 6.2 × speedup of optimization runtime.

References

[1]
M. S. Abdelfattah, Ł. Dudziak, T. Chau, R. Lee, H. Kim, and N. D. Lane. 2020. Best of Both Worlds: AutoML Codesign of a CNN and its Hardware Accelerator. In Design Automation Conference. 1--6.
[2]
Y-H. Chen, T. Krishna, J. Emer, and V. Sze. 2017. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE Journal of Solid-State Circuits 52, 1 (2017), 127--138.
[3]
K. Choi, D. Hong, H. Yoon, J. Yu, Y. Kim, and J. Lee. 2021. DANCE: Differentiable Accelerator/Network Co-Exploration. In Design Automation Conference. 337--342.
[4]
J. Cong, P. Wei, C. H. Yu, and P. Zhang. 2018. Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture. In Design Automation Conference. 1--6.
[5]
C. Deng, Y. Sui, S. Liao, X. Qian, and B. Yuan. 2021. GoSPA: An Energy-efficient High-performance Globally Optimized SParse Convolutional Neural Network Accelerator. In International Symposium on Computer Architecture. 1110--1123.
[6]
Y. Fu, Y. A. Zhang, Y. Zhang, D. Cox, and Y. Lin. 2021. Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators. arXiv:2106.06575
[7]
K. Hegde, H. Asghari-Moghaddam, M. Pellauer, N. Crago, A. Jaleel, E. Solomonik, J. Emer, and C. Fletcher. 2019. ExTensor: An Accelerator for Sparse Tensor Algebra. In International Symposium on Microarchitecture. 319--333.
[8]
K. Hegde, P. Tsai, S. Huang, V. Chandra, A. Parashar, and C. Fletcher. 2021. Mind Mappings: Enabling Efficient Algorithm-Accelerator Mapping Space Search. In International Conference on Architectural Support for Programming Languages and Operating Systems. 943--958.
[9]
Q. Huang, M. Kang, G. Dinh, T. Norell, A. Kalaiah, J. Demmel, J. Wawrzynek, and Y. Shao. 2021. CoSA: Scheduling by Constrained Optimization for Spatial Accelerators. In International Symposium on Computer Architecture. 554--566.
[10]
Y. Huang, Y. Cheng, A. Bapna, O. Firat, M. X. Chen, D. Chen, H. Lee, J. Ngiam, Q. V. Le, Y. Wu, and Z. Chen. 2019. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism. In Conference on Neural Information Processing Systems. 103--112.
[11]
E. Jiang, S. Gu, and B. Poole. 2017. Categorical Reparameterization with Gumbel-Softmax. arXiv:1611.01144
[12]
W. Jiang, L. Yang, E. Sha, Q. Zhuge, S. Gu, S. Dasgupta, Y. Shi, and J. Hu. 2020. Hardware/Software Co-Exploration of Neural Architectures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2020), 1--6.
[13]
Norman P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Annual International Symposium on Computer Architecture. 1--12.
[14]
S. Kao, G. Jeong, and T. Krishna. 2020. ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning. In International Symposium on Microarchitecture. 622--636.
[15]
S.-C. Kao and T. Krishna. 2020. GAMMA: Automating the HW Mapping of DNN Models on Accelerators via Genetic Algorithm. In International Conference On Computer Aided Design. 1--9.
[16]
H. Kwon, P. Chatarasi, V. Sarkar, T. Krishna, M. Pellauer, and A. Parashar. 2020. MAESTRO: A Data-Centric Approach to Understand Reuse, Performance, and Hardware Cost of DNN Mappings. International Symposium on Microarchitecture 40, 3 (2020), 20--29.
[17]
H. Kwon, A. Samajdar, and T. Krishna. 2018. Maeri: Enabling flexible dataflow mapping over dnn accelerators via reconfigurable interconnects. ACM SIGPLAN Notices 53, 2 (2018), 461--475.
[18]
Y. Li, C. Hao, X. Zhang, X. Liu, Y. Chen, J. Xiong, W. Hwu, and D. Chen. 2020. EDD: Efficient Differentiable DNN Architecture and Implementation Co-search for Embedded AI Solutions. In Design Automation Conference. 1--6.
[19]
Y. Lin, Z. Jiang, J. Gu, W. Li, S. Dhar, H. Ren, B. Khailany, and D. Pan. 2020. Dreamplace: Deep learning toolkit-enabled gpu acceleration for modern vlsi placement. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 40, 4 (2020), 748--761.
[20]
A. Parashar, P. Raina, Y.-S. Shao, Y.-H. Chen, V. A. Ying, A. Mukkara, R. Venkatesan, B. Khailany, S. W. Keckler, and J. Emer. 2019. Timeloop: A Systematic Approach to DNN Accelerator Evaluation. In International Symposium on Performance Analysis of Systems and Software. 304--315.
[21]
A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. Emer, S. Keckler, and S. Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. In International Symposium on Computer Architecture. 27--40.
[22]
M. Parsa, J. P. Mitchell, C. D. Schuman, R. M. Patton, T. E. Potok, and K. Roy. 2020. Bayesian Multi-objective Hyperparameter Optimization for Accurate, Fast, and Efficient Neural Network Accelerator Design. Frontiers in Neuroscience 14 (2020).
[23]
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019).
[24]
B. Reagen, J. M. Hernandez-Lobato, R. Adolf, M. Gelbart, P. Whatmough, G.-Y. Wei, and D. Brooks. 2017. A Case for Efficient Accelerator Design Space Exploration via Bayesian Optimization. In International Symposium on Low Power Electronics and Design. 1--6.
[25]
O. Sener and V. Koltun. 2018. Multi-task learning as multi-objective optimization. Advances in neural information processing systems (2018), 525--536.
[26]
A. Stoutchinin, F. Conti, and L. Benini. 2019. Optimally Scheduling CNN Convolutions for Efficient Memory Access. arXiv:1902.01492
[27]
S. Venkataramani, J. Choi, et al. 2019. DeepTools: Compiler and Execution Runtime Extensions for RaPiD AI Accelerator. International Symposium on Microarchitecture 39, 5 (2019), 102--111.
[28]
L. Waeijen, S. Sioutas, M. Peemen, M. Lindwer, and H. Corporaal. 2021. ConvFusion: A Model for Layer Fusion in Convolutional Neural Networks. IEEE Access 9 (2021), 168245--168267.
[29]
Y. N. Wu, P. A. Tsai, A. Parashar, V. Sze, and J. S. Emer. 2021. Sparseloop: An Analytical, Energy-Focused Design Space Exploration Methodology for Sparse Tensor Accelerators. In International Symposium on Performance Analysis of Systems and Software. 232--234.
[30]
L. Yang, Z. Yan, M. Li, H. Kwon, L. Lai, T. Krishna, V. Chandra, W. Jiang, and Y. Shi. 2020. Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks. In Design Automation Conference. 1--6.
[31]
X. Yang, M. Gao, Q. Liu, J. Setter, J. Pu, A. Nayak, S. Bell, K. Cao, H. Ha, P. Raina, et al. 2020. Interstellar: Using halide's scheduling language to analyze dnn accelerators. In International Conference on Architectural Support for Programming Languages and Operating Systems. 369--383.
[32]
Y. Zhao, C. Li, Y. Wang, P. Xu, Y. Zhang, and Y. Lin. 2020. DNN-Chip Predictor: An Analytical Performance Predictor for DNN Accelerators with Various Dataflows and Hardware Architectures. In International Conference on Acoustics, Speech and Signal Processing.
[33]
Y. Zhou, X. Dong, B. Akin, M. Tan, D. Peng, T. Meng, A. Yazdanbakhsh, D. Huang, and R. Narayanaswami. 2021. Rethinking Co-design of Neural Architectures and Hardware Accelerators. arXiv:2102.08619

Cited By

View all
  • (2023)Lightning Talk: Power and Performance Reconciliation – from Tradeoff to Win-Win2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247854(1-2)Online publication date: 9-Jul-2023

Index Terms

  1. Deep Learning Toolkit-Accelerated Analytical Co-Optimization of CNN Hardware and Dataflow

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design
    October 2022
    1467 pages
    ISBN:9781450392174
    DOI:10.1145/3508352
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • IEEE-EDS: Electronic Devices Society
    • IEEE CAS
    • IEEE CEDA

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 December 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. CNN accelerator
    2. CNN compiler optimization
    3. CNN dataflow

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ICCAD '22
    Sponsor:
    ICCAD '22: IEEE/ACM International Conference on Computer-Aided Design
    October 30 - November 3, 2022
    California, San Diego

    Acceptance Rates

    Overall Acceptance Rate 457 of 1,762 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)195
    • Downloads (Last 6 weeks)15
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Lightning Talk: Power and Performance Reconciliation – from Tradeoff to Win-Win2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247854(1-2)Online publication date: 9-Jul-2023

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media