skip to main content
10.1145/3400302.3415688acmconferencesArticle/Chapter ViewAbstractPublication PagesiccadConference Proceedingsconference-collections
research-article

CU.POKer: placing DNNs on wafer-scale AI accelerator with optimal kernel sizing

Published: 17 December 2020 Publication History

Abstract

The tremendous growth in deep learning (DL) applications has created an exponential demand for computing power, which leads to the rise of AI-specific hardware. Targeted towards accelerating computation-intensive deep learning applications, AI hardware, including but not limited to GPGPU, TPU, ASICs, etc., have been adopted ubiquitously. As a result, domain-specific CAD tools play more and more important roles and have been deeply involved in both the design and compilation stages of modern AI hardware. Recently, ISPD 2020 contest introduced a special challenge targeting at the physical mapping of neural network workloads onto the largest commercial deep learning accelerator, CS-1 Wafer-Scale Engine (WSE). In this paper, we proposed CU.POKer, a high-performance engine fully-customized for WSE's DNN workload placement challenge. A provably optimal placeable kernel candidate searching scheme and a data-flow-aware placement tool are developed accordingly to ensure the state-of-the-art quality on the real industrial benchmarks. Experimental results on ISPD 2020 contest evaluation suites [1] demonstrated the superiority of our proposed framework over other contestants.

References

[1]
"ISPD 2020 Contest: Wafer-Scale Deep Learning Accelerator Placement," https://www.cerebras.net/ispd-2020-contest/.
[2]
D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis, "Mastering chess and shogi by self-play with a general reinforcement learning algorithm," 2017.
[3]
T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., "Language models are few-shot learners" arXiv preprint arXiv:2005.14165, 2020.
[4]
O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation" in Proc. MICCAI, 2015, pp. 234--241.
[5]
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770--778.
[6]
D. Amodei and D. Hernandez, "AI and Compute," https://openai.com/blog/ai-and-compute/, May 16, 2018, accessed on 20-May-2020.
[7]
M. James, M. Tom, P. Groeneveld, and V. Kibardin, "Ispd 2020 physical mapping of neural networks on a wafer-scale deep learning accelerator" in Proceedings of the 2020 International Symposium on Physical Design, ser. ISPD '20. New York, NY, USA: Association for Computing Machinery, 2020, p. 145--149.
[8]
Y. Lin, S. Dhar, W. Li, H. Ren, B. Khailany, and D. Z. Pan, "Dreamplace: Deep learning toolkit-enabled gpu acceleration for modern vlsi placement," in Proceedings of the 56th Annual Design Automation Conference 2019, 2019, pp. 1--6.
[9]
A. Mirhoseini, A. Goldie, M. Yazgan, J. Jiang, E. Songhori, S. Wang, Y.-J. Lee, E. Johnson, O. Pathak, S. Bae et al., "Chip placement with deep reinforcement learning" arXiv preprint arXiv:2004.10746, 2020.
[10]
H.-M. Chen, M. D. Wong, H. Zhou, F.-Y. Young, H. H. Yang, and N. Sherwani, "Integrated floorplanning and interconnect planning" in Layout optimization in VLSI design. Springer, 2001, pp. 1--18.
[11]
E. F. Young, C. C. Chu, and Z. C. Shen, "Twin binary sequences: a nonredundant representation for general nonslicing floorplan," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 22, no. 4, pp. 457--469, 2003.

Cited By

View all
  • (2024)Wafer-Scale Computing: Advancements, Challenges, and Future Perspectives [Feature]IEEE Circuits and Systems Magazine10.1109/MCAS.2024.334966924:1(52-81)Online publication date: Sep-2025
  • (2022)Kernel Mapping Techniques for Deep Learning Neural Network AcceleratorsProceedings of the 2022 International Symposium on Physical Design10.1145/3505170.3506730(21-28)Online publication date: 13-Apr-2022
  • (2022)CU.POKer: Placing DNNs on WSE With Optimal Kernel Sizing and Efficient Protocol OptimizationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.309645841:6(1888-1901)Online publication date: Jun-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICCAD '20: Proceedings of the 39th International Conference on Computer-Aided Design
November 2020
1396 pages
ISBN:9781450380263
DOI:10.1145/3400302
  • General Chair:
  • Yuan Xie
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE CAS
  • IEEE CEDA
  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 December 2020

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

ICCAD '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 457 of 1,762 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)37
  • Downloads (Last 6 weeks)7
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Wafer-Scale Computing: Advancements, Challenges, and Future Perspectives [Feature]IEEE Circuits and Systems Magazine10.1109/MCAS.2024.334966924:1(52-81)Online publication date: Sep-2025
  • (2022)Kernel Mapping Techniques for Deep Learning Neural Network AcceleratorsProceedings of the 2022 International Symposium on Physical Design10.1145/3505170.3506730(21-28)Online publication date: 13-Apr-2022
  • (2022)CU.POKer: Placing DNNs on WSE With Optimal Kernel Sizing and Efficient Protocol OptimizationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.309645841:6(1888-1901)Online publication date: Jun-2022
  • (2021)Physical Synthesis for Advanced Neural Network ProcessorsProceedings of the 26th Asia and South Pacific Design Automation Conference10.1145/3394885.3431625(833-840)Online publication date: 18-Jan-2021
  • (2021)VLSI Structure-aware Placement for Convolutional Neural Network Accelerator Units2021 58th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC18074.2021.9586294(1117-1122)Online publication date: 5-Dec-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media