skip to main content
10.1145/3287624.3287638acmconferencesArticle/Chapter ViewAbstractPublication PagesaspdacConference Proceedingsconference-collections
research-article

Addressing the issue of processing element under-utilization in general-purpose systolic deep learning accelerators

Published: 21 January 2019 Publication History

Abstract

As an energy-efficient hardware solution for deep neural network (DNN) inference, systolic accelerators are particularly popular in both embedded and datacenter computing scenarios. Despite their excellent performance and energy efficiency, however, systolic DNN accelerators are naturally facing a resource under-utilization problem - not all DNN models can well match the fixed processing elements (PEs) in a systolic array implementation, because typical DNN models vary significantly from applications to applications. Consequently, state-of-the-art hardware solutions are not expected to deliver the nominal (peak) performance and energy efficiency as claimed because of resource under-utilization. To deal with this dilemma, this study proposes a novel systolic DNN accelerator with a flexible computation mapping and dataflow scheme. By providing three types of parallelism and dynamically switching among them: channel-direction mapping, planar mapping, and hybrid, our accelerator offers the adaptability to match various DNN models to the fixed hardware resources, and thus, enables flexibly exploiting PE provision and data reuse for a wide range of DNN models to achieve optimal performance and energy efficiency.

References

[1]
K. He, et al., "Deep residual learning for image recognition," in CVPR, 2016, pp. 770--778.
[2]
M. L. Seltzer, et al., "An investigation of deep neural networks for noise robust speech recognition," in ICASSP, 2013, pp. 7398--7402.
[3]
X. Zhang, et al., "Character-level convolutional networks for text classification," in NIPS, 2015, pp. 649--657.
[4]
T. Chen, et al., "Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning," in ASPLOS, 2014, pp. 269--284.
[5]
B. Moons and M. Verhelst, "A 0.3--2.6 TOPS/W precision-scalable processor for real-time large-scale ConvNets," in VLSI-Circuits, 2016, pp. 1--2.
[6]
Y.-H. Chen, et al., "Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks," in ISCA, 2016, pp. 367--379.
[7]
M. Gao, et al., "Tetris: Scalable and efficient neural network acceleration with 3d memory," ASPLOS, pp. 751--764, 2017.
[8]
Z. Du, et al., "ShiDianNao: Shifting vision processing closer to the sensor," in ISCA, 2015, pp. 92--104.
[9]
N. P. Jouppi, et al., "In-datacenter performance analysis of a tensor processing unit," in ISCA, 2017, pp. 1--12.
[10]
X. Wei, et al., "Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs," in DAC, 2017, pp. 1--6.
[11]
A. Krizhevsky, et al., "Imagenet classification with deep convolutional neural networks," in NIPS, 2012, pp. 1097--1105.
[12]
C. Szegedy, et al., "Going deeper with convolutions," in CVPR, 2015, pp. 1--9.
[13]
F. N. Iandola, et al., "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 1MB model size," arXiv preprint arXiv:160207360, 2016.
[14]
N. Srivastava, et al., "Dropout: a simple way to prevent neural networks from overfitting," J. Machine Learning R., vol. 15, pp. 1929--1958, 2014.
[15]
G. E. Hinton, et al., "A fast learning algorithm for deep belief nets," Neural computation, vol. 18, pp. 1527--1554, 2006.
[16]
N. Muralimanohar, et al., "Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0," in MICRO, 2007, pp. 3--14.

Cited By

View all
  • (2024)YOLOv7-BW: 基于遥感图像的密集小目标高效检测器智能机器人10.52810/JIR.2024.0041:1(39-54)Online publication date: 30-May-2024
  • (2024)基于机器学习和深度学习的抗菌肽预测研究进展人工智能前沿与应用10.52810/FAAI.2024.0051:1(54-68)Online publication date: 15-Jun-2024
  • (2024)基于拮抗特性模型的夜视微光图像与红外图像彩色融合人工智能前沿与应用10.52810/FAAI.2024.0041:1(45-53)Online publication date: 28-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPDAC '19: Proceedings of the 24th Asia and South Pacific Design Automation Conference
January 2019
794 pages
ISBN:9781450360074
DOI:10.1145/3287624
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEICE ESS: Institute of Electronics, Information and Communication Engineers, Engineering Sciences Society
  • IEEE CAS
  • IEEE CEDA
  • IPSJ SIG-SLDM: Information Processing Society of Japan, SIG System LSI Design Methodology

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 January 2019

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

ASPDAC '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 466 of 1,454 submissions, 32%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)42
  • Downloads (Last 6 weeks)3
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)YOLOv7-BW: 基于遥感图像的密集小目标高效检测器智能机器人10.52810/JIR.2024.0041:1(39-54)Online publication date: 30-May-2024
  • (2024)基于机器学习和深度学习的抗菌肽预测研究进展人工智能前沿与应用10.52810/FAAI.2024.0051:1(54-68)Online publication date: 15-Jun-2024
  • (2024)基于拮抗特性模型的夜视微光图像与红外图像彩色融合人工智能前沿与应用10.52810/FAAI.2024.0041:1(45-53)Online publication date: 28-May-2024
  • (2024)基于机器学习和深度学习的蛋白质结构预测研究进展人工智能前沿与应用10.52810/FAAI.2024.0031:1(32-44)Online publication date: 20-May-2024
  • (2024)基于GPS的堆叠串行LSTM组合神经网络目标跟踪方法人工智能前沿与应用10.52810/FAAI.2024.0021:1(16-31)Online publication date: 18-Apr-2024
  • (2024)PipeCIM: A High-Throughput Computing-In-Memory Microprocessor With Nested Pipeline and RISC-V Extended InstructionsIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2024.338427171:7(3214-3227)Online publication date: Jul-2024
  • (2024)ReDas: A Lightweight Architecture for Supporting Fine-Grained Reshaping and Multiple Dataflows on Systolic ArrayIEEE Transactions on Computers10.1109/TC.2024.339850073:8(1997-2011)Online publication date: 1-Aug-2024
  • (2023)Rei: A Reconfigurable Interconnection Unit for Array-Based CNN AcceleratorsIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2023.329013811:4(895-906)Online publication date: Oct-2023
  • (2023)Enabling Fine-Grained Spatial Multitasking on Systolic-Array NPUs Using Dataflow MirroringIEEE Transactions on Computers10.1109/TC.2023.329903072:12(3383-3398)Online publication date: Dec-2023
  • (2023)DyNNamic: Dynamically Reshaping, High Data-Reuse Accelerator for Compact DNNsIEEE Transactions on Computers10.1109/TC.2022.318427272:3(880-892)Online publication date: 1-Mar-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media