research-article

ACDSE: A Design Space Exploration Method for CNN Accelerator based on Adaptive Compression Mechanism

Authors:

Jiangfei LiAuthors Info & Claims

ACM Transactions on Embedded Computing Systems, Volume 22, Issue 6

Article No.: 95, Pages 1 - 26

https://doi.org/10.1145/3545177

Published: 09 November 2023 Publication History

Abstract

Customized accelerators for Convolutional Neural Network (CNN) can achieve better energy efficiency than general computing platforms. However, the design of a high-performance accelerator should take into account a variety of parameters and physical constraints. The increasing parameters and tighter constraints gradually complicate the design space, which poses new challenges to the capacity and efficiency of design space exploration methods. In this paper, we provide a novel design space exploration method named ACDSE for optimizing the design process of CNN accelerators. ACDSE implements the adaptive compression mechanism to dynamically adjust the search range and prune low-value design points according to the exploration states. As a result, it can focus on valuable subspace while also improving exploration capacity and efficiency. Additionally, we implement ACDSE to address the problem of CNN accelerator latency optimization. The experiment indicates that, compared to former DSE methods, ACDSE can reduce latency and increase efficiency by 1.39x-5.07x and 2.07x-43.87x, respectively, under the most stringent constraint conditions, demonstrating its superior adaptability to the complicated design space.

References

[1]

Mohamed S. Abdelfattah, Łukasz Dudziak, Thomas Chau, Royson Lee, Hyeji Kim, and Nicholas D. Lane. 2020. Best of both worlds: AutoML codesign of a CNN and its hardware accelerator. In 2020 57th ACM/IEEE Design Automation Conference (DAC). 1–6. DOI:

[2]

Joshua Achiam. 2018. Spinning up in Deep Reinforcement Learning. (2018).

[3]

J. Bergstra, D. Yamins, and D. D. Cox. 2013. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28 (ICML’13). JMLR.org, I-115–I-123.

[4]

Jason Cong and Jie Wang. 2018. PolySA: Polyhedral-based systolic array auto-compilation. In 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE Press, 1–8. DOI:

Digital Library

[5]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248–255. DOI:

[6]

Kaijie Feng, Xiaoya Fan, Jianfeng An, Xiping Wang, Kaiyue Di, Jiangfei Li, Minghao Lu, and Chuxi Li. 2021. ERDSE: Efficient reinforcement learning based design space exploration method for CNN accelerator on resource limited platform. Graphics and Visual Computing 4 (2021), 200024. DOI:

[7]

Yuanxiang Gao, Li Chen, and Baochun Li. 2018. Spotlight: Optimizing device placement for training deep neural networks. In International Conference on Machine Learning. PMLR, 1676–1684.

[8]

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Jennifer Dy and Andreas Krause (Eds.), Vol. 80. PMLR, 1861–1870. https://proceedings.mlr.press/v80/haarnoja18b.html.

[9]

Andrew Howard, Mark Sandler, Bo Chen, Weijun Wang, Liang-Chieh Chen, Mingxing Tan, Grace Chu, Vijay Vasudevan, Yukun Zhu, Ruoming Pang, Hartwig Adam, and Quoc Le. 2019. Searching for MobileNetV3. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 1314–1324. DOI:

[10]

Jazzbin. 2020. Geatpy: The genetic and evolutionary algorithm toolbox with high performance in Python. (2020). http://www.geatpy.com/.

[11]

Weiwen Jiang, Lei Yang, Edwin Hsing-Mean Sha, Qingfeng Zhuge, Shouzhen Gu, Sakyasingha Dasgupta, Yiyu Shi, and Jingtong Hu. 2020. Hardware/software co-exploration of neural architectures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 12 (2020), 4805–4815. DOI:

[12]

Sheng-Chun Kao, Geonhwa Jeong, and Tushar Krishna. 2020. ConfuciuX: Autonomous hardware resource assignment for DNN accelerators using reinforcement learning. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 622–636. DOI:

[13]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. (2014). DOI: arXiv:1412.6980.

[14]

Hyoukjun Kwon, Prasanth Chatarasi, Michael Pellauer, Angshuman Parashar, Vivek Sarkar, and Tushar Krishna. 2019. Understanding reuse, performance, and hardware cost of DNN dataflow: A data-centric approach. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’52). Association for Computing Machinery, New York, NY, USA, 754–768. DOI:

Digital Library

[15]

Hyoukjun Kwon, Ananda Samajdar, and Tushar Krishna. 2018. MAERI: Enabling flexible dataflow mapping over DNN accelerators via reconfigurable interconnects. SIGPLAN Not. 53, 2 (Mar.2018), 461–475. DOI:

Digital Library

[16]

Chuxi Li, Xiaoya Fan, Yuling Geng, Zhao Yang, Danghui Wang, and Meng Zhang. 2020. ENAS oriented layer adaptive data scheduling strategy for resource limited hardware. Neurocomput. 381, C (Mar.2020), 29–39. DOI:

Digital Library

[17]

Haitong Li, Mudit Bhargava, Paul N. Whatmough, and H.-S. Philip Wong. 2019. On-chip memory technology design space explorations for mobile deep neural network accelerators. In Proceedings of the 56th Annual Design Automation Conference 2019 (DAC’19). Association for Computing Machinery, New York, NY, USA, Article 131, 6 pages.

Digital Library

[18]

Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. (2015). DOI: arXiv:1509.02971.

[19]

Weina Lu, Yu Hu, Jing Ye, and Xiaowei Li. 2018. Throughput-oriented automatic design of FPGA accelerator for convolutional neural networks. Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics 30 (112018), 2164–2173. DOI:

[20]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing Atari with Deep Reinforcement Learning. (2013). DOI: arXiv:1312.5602.

[21]

Francisco Muñoz-Martínez, José L. Abellán, Manuel E. Acacio, and Tushar Krishna. 2021. STONNE: Enabling cycle-level microarchitectural simulation for DNN inference accelerators. IEEE Computer Architecture Letters 20, 2 (2021), 122–125. DOI:

[22]

Luigi Nardi, David Koeplinger, and Kunle Olukotun. 2019. Practical design space exploration. In 2019 IEEE 27th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). 347–358. DOI:

[23]

Angshuman Parashar, Priyanka Raina, Yakun Sophia Shao, Yu-Hsin Chen, Victor A. Ying, Anurag Mukkara, Rangharajan Venkatesan, Brucek Khailany, Stephen W. Keckler, and Joel Emer. 2019. TimeLoop: A systematic approach to DNN accelerator evaluation. In 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 304–315. DOI:

[24]

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in Pytorch. (2017).

[25]

Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, and Jeff Dean. 2018. Efficient neural architecture search via parameter sharing. (2018). DOI: arXiv:1802.03268.

[26]

Brandon Reagen, José Miguel Hernández-Lobato, Robert Adolf, Michael Gelbart, Paul Whatmough, Gu-Yeon Wei, and David Brooks. 2017. A case for efficient accelerator design space exploration via Bayesian optimization. In 2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). 1–6. DOI:

[27]

Giulia Santoro, Mario R. Casu, Valentino Peluso, Andrea Calimera, and Massimo Alioto. 2018. Energy-performance design exploration of a low-power microprogrammed deep-learning accelerator. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1151–1154. DOI:

[28]

John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research), Francis Bach and David Blei (Eds.), Vol. 37. PMLR, Lille, France, 1889–1897. https://proceedings.mlr.press/v37/schulman15.html.

[29]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. (2017). DOI: arXiv:1707.06347.

[30]

Yakun Sophia Shao, Sam Likun Xi, Vijayalakshmi Srinivasan, Gu-Yeon Wei, and David Brooks. 2016. Co-designing accelerators and SoC interfaces using gem5-Aladdin. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1–12. DOI:

[31]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. (2014). DOI: arXiv:1409.1556.

[32]

Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. MIT Press.

Digital Library

[33]

Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V. Le. 2019. MnasNet: Platform-aware neural architecture search for mobile. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2815–2823. DOI:

[34]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 86 (2008), 2579–2605. http://jmlr.org/papers/v9/vandermaaten08a.html.

[35]

Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, and Song Han. 2019. HAQ: Hardware-aware automated quantization with mixed precision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8612–8620.

[36]

Xuechao Wei, Yun Liang, Xiuhong Li, Cody Hao Yu, Peng Zhang, and Jason Cong. 2018. TGPA: Tile-grained pipeline architecture for low latency CNN inference. In 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE Press, 1–8. DOI:

Digital Library

[37]

Ronald J. Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 3–4 (1992), 229–256. DOI:

Digital Library

[38]

Lei Yang, Zheyu Yan, Meng Li, Hyoukjun Kwon, Liangzhen Lai, Tushar Krishna, Vikas Chandra, Weiwen Jiang, and Yiyu Shi. 2020. Co-exploration of neural architectures and heterogeneous ASIC accelerator designs targeting multiple tasks. In 2020 57th ACM/IEEE Design Automation Conference (DAC). 1–6. DOI:

[39]

Xuan Yang, Mingyu Gao, Qiaoyi Liu, Jeff Setter, Jing Pu, Ankita Nayak, Steven Bell, Kaidi Cao, Heonjae Ha, Priyanka Raina, Christos Kozyrakis, and Mark Horowitz. 2020. Interstellar: Using Halide’s Scheduling Language to Analyze DNN Accelerators. Association for Computing Machinery, New York, NY, USA, 369–383.

Digital Library

[40]

A. Yazdanbakhsh, Christof Angermüller, Berkin Akin, Yanqi Zhou, Albin Jones, Milad Hashemi, Kevin Swersky, S. Chatterjee, Ravi Narayanaswami, and J. Laudon. 2021. Apollo: Transferable architecture exploration. (2021). arXiv:2102.01723.

[41]

Ye Yu, Yingmin Li, Shuai Che, Niraj K. Jha, and Weifeng Zhang. 2021. Software-defined design space exploration for an efficient DNN accelerator architecture. IEEE Trans. Comput. 70, 1 (2021), 45–56. DOI:

Digital Library

[42]

Size Zheng, Yun Liang, Shuo Wang, Renze Chen, and Kaiwen Sheng. 2020. FlexTensor: An automatic schedule exploration and optimization framework for tensor computation on heterogeneous system(ASPLOS’20). Association for Computing Machinery, New York, NY, USA, 859–873.

[43]

Barret Zoph and Quoc V. Le. 2016. Neural architecture search with reinforcement learning. (2016). DOI: arXiv:1611.01578.

Cited By

Feng KFan XAn JWang HLi C(2025)CSDSE: An efficient design space exploration framework for deep neural network accelerator based on cooperative searchNeurocomputing10.1016/j.neucom.2025.129366(129366)Online publication date: Jan-2025
https://doi.org/10.1016/j.neucom.2025.129366
Feng KFan XAn J(2024)LCDSE: Enable Efficient Design Space Exploration for DCNN Accelerator Based on Layer ClusteringIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.339398671:10(4486-4490)Online publication date: Oct-2024
https://doi.org/10.1109/TCSII.2024.3393986
Liu YHsieh CKuo S(2023)Boomerang: Physical-Aware Design Space Exploration Framework on RISC-V SonicBOOM Microarchitecture2023 IEEE 34th International Conference on Application-specific Systems, Architectures and Processors (ASAP)10.1109/ASAP57973.2023.00026(85-93)Online publication date: Jul-2023
https://doi.org/10.1109/ASAP57973.2023.00026

Index Terms

ACDSE: A Design Space Exploration Method for CNN Accelerator based on Adaptive Compression Mechanism
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
2. Hardware
  1. Electronic design automation
    1. Methodologies for EDA
      1. Best practices for EDA

Recommendations

Fuzzy logic based energy and throughput aware design space exploration for MPSoCs

Multicore architectures were introduced to mitigate the issue of increase in power dissipation with clock frequency. Introduction of deeper pipelines, speculative threading etc. for single core systems were not able to bring much increase in performance ...
GANDSE: Generative Adversarial Network-based Design Space Exploration for Neural Network Accelerator Design
With the popularity of deep learning, the hardware implementation platform of deep learning has received increasing interest. Unlike the general purpose devices, e.g., CPU or GPU, where the deep learning algorithms are executed at the software level, ...
A Theoretical Framework for Design Space Exploration of Manycore Processors
MASCOTS '11: Proceedings of the 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems

With ever expanding design space and workload space in multicore era, it is a challenge to identify optimal design points quickly, desirable during the early stage of multicore processor design or programming phase. To meet this challenge, this paper ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 22, Issue 6

November 2023

428 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/3632298

Editor:
Tulika Mitra
National University of Singapore, Singapore

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 09 November 2023

Online AM: 28 June 2022

Accepted: 12 June 2022

Revised: 12 April 2022

Received: 31 December 2021

Published in TECS Volume 22, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
445
Total Downloads

Downloads (Last 12 months)154
Downloads (Last 6 weeks)5

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Feng KFan XAn JWang HLi C(2025)CSDSE: An efficient design space exploration framework for deep neural network accelerator based on cooperative searchNeurocomputing10.1016/j.neucom.2025.129366(129366)Online publication date: Jan-2025
https://doi.org/10.1016/j.neucom.2025.129366
Feng KFan XAn J(2024)LCDSE: Enable Efficient Design Space Exploration for DCNN Accelerator Based on Layer ClusteringIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.339398671:10(4486-4490)Online publication date: Oct-2024
https://doi.org/10.1109/TCSII.2024.3393986
Liu YHsieh CKuo S(2023)Boomerang: Physical-Aware Design Space Exploration Framework on RISC-V SonicBOOM Microarchitecture2023 IEEE 34th International Conference on Application-specific Systems, Architectures and Processors (ASAP)10.1109/ASAP57973.2023.00026(85-93)Online publication date: Jul-2023
https://doi.org/10.1109/ASAP57973.2023.00026

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents