research-article

Addressing the issue of processing element under-utilization in general-purpose systolic deep learning accelerators

Authors:

Xiaowei LiAuthors Info & Claims

ASPDAC '19: Proceedings of the 24th Asia and South Pacific Design Automation Conference

Pages 733 - 738

https://doi.org/10.1145/3287624.3287638

Published: 21 January 2019 Publication History

Abstract

As an energy-efficient hardware solution for deep neural network (DNN) inference, systolic accelerators are particularly popular in both embedded and datacenter computing scenarios. Despite their excellent performance and energy efficiency, however, systolic DNN accelerators are naturally facing a resource under-utilization problem - not all DNN models can well match the fixed processing elements (PEs) in a systolic array implementation, because typical DNN models vary significantly from applications to applications. Consequently, state-of-the-art hardware solutions are not expected to deliver the nominal (peak) performance and energy efficiency as claimed because of resource under-utilization. To deal with this dilemma, this study proposes a novel systolic DNN accelerator with a flexible computation mapping and dataflow scheme. By providing three types of parallelism and dynamically switching among them: channel-direction mapping, planar mapping, and hybrid, our accelerator offers the adaptability to match various DNN models to the fixed hardware resources, and thus, enables flexibly exploiting PE provision and data reuse for a wide range of DNN models to achieve optimal performance and energy efficiency.

References

[1]

K. He, et al., "Deep residual learning for image recognition," in CVPR, 2016, pp. 770--778.

[2]

M. L. Seltzer, et al., "An investigation of deep neural networks for noise robust speech recognition," in ICASSP, 2013, pp. 7398--7402.

[3]

X. Zhang, et al., "Character-level convolutional networks for text classification," in NIPS, 2015, pp. 649--657.

Digital Library

[4]

T. Chen, et al., "Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning," in ASPLOS, 2014, pp. 269--284.

Digital Library

[5]

B. Moons and M. Verhelst, "A 0.3--2.6 TOPS/W precision-scalable processor for real-time large-scale ConvNets," in VLSI-Circuits, 2016, pp. 1--2.

[6]

Y.-H. Chen, et al., "Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks," in ISCA, 2016, pp. 367--379.

Digital Library

[7]

M. Gao, et al., "Tetris: Scalable and efficient neural network acceleration with 3d memory," ASPLOS, pp. 751--764, 2017.

Digital Library

[8]

Z. Du, et al., "ShiDianNao: Shifting vision processing closer to the sensor," in ISCA, 2015, pp. 92--104.

Digital Library

[9]

N. P. Jouppi, et al., "In-datacenter performance analysis of a tensor processing unit," in ISCA, 2017, pp. 1--12.

Digital Library

[10]

X. Wei, et al., "Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs," in DAC, 2017, pp. 1--6.

Digital Library

[11]

A. Krizhevsky, et al., "Imagenet classification with deep convolutional neural networks," in NIPS, 2012, pp. 1097--1105.

Digital Library

[12]

C. Szegedy, et al., "Going deeper with convolutions," in CVPR, 2015, pp. 1--9.

[13]

F. N. Iandola, et al., "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 1MB model size," arXiv preprint arXiv:160207360, 2016.

[14]

N. Srivastava, et al., "Dropout: a simple way to prevent neural networks from overfitting," J. Machine Learning R., vol. 15, pp. 1929--1958, 2014.

Digital Library

[15]

G. E. Hinton, et al., "A fast learning algorithm for deep belief nets," Neural computation, vol. 18, pp. 1527--1554, 2006.

Digital Library

[16]

N. Muralimanohar, et al., "Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0," in MICRO, 2007, pp. 3--14.

Digital Library

Cited By

葛旭金学马慧邹天(2024)YOLOv7-BW: 基于遥感图像的密集小目标高效检测器智能机器人10.52810/JIR.2024.0041:1(39-54)Online publication date: 30-May-2024
https://doi.org/10.52810/JIR.2024.004
耿浩(2024)基于机器学习和深度学习的抗菌肽预测研究进展人工智能前沿与应用10.52810/FAAI.2024.0051:1(54-68)Online publication date: 15-Jun-2024
https://doi.org/10.52810/FAAI.2024.005
曹欣马慧(2024)基于拮抗特性模型的夜视微光图像与红外图像彩色融合人工智能前沿与应用10.52810/FAAI.2024.0041:1(45-53)Online publication date: 28-May-2024
https://doi.org/10.52810/FAAI.2024.004
Show More Cited By

Recommendations

A General-Purpose FPGA-Based Reconfigurable Platform for Video and Image Processing
ISNN 2009: Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks - Part III

This paper presents a general-purpose, multi-task, and reconfigurable platform for video and image processing. With the increasing requirements of processing power in many of today's video and image processing applications, it is important to go beyond ...
General-Purpose Computing with Soft GPUs on FPGAs
Special Section on FCCM 2016 and Regular Papers

Using field-programmable gate arrays (FPGAs) as a substrate to deploy soft graphics processing units (GPUs) would enable offering the FPGA compute power in a very flexible GPU-like tool flow. Application-specific adaptations like selective hardening of ...
Deep Learning Inferencing with High-performance Hardware Accelerators
As computer architectures continue to integrate application-specific hardware, it is critical to understand the relative performance of devices for maximum app acceleration. The goal of benchmarking suites, such as MLPerf for analyzing machine learning (...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPDAC '19: Proceedings of the 24th Asia and South Pacific Design Automation Conference

January 2019

794 pages

ISBN:9781450360074

DOI:10.1145/3287624

General Chair:
Toshiyuki Shibuya
Fujitsu Laboratories

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

In-Cooperation

IEICE ESS: Institute of Electronics, Information and Communication Engineers, Engineering Sciences Society
IEEE CAS
IEEE CEDA
IPSJ SIG-SLDM: Information Processing Society of Japan, SIG System LSI Design Methodology

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 January 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Beijing Municipal Science and Technology Commission
Strategic Priority Research Program of the Chinese Academy of Sciences

Conference

ASPDAC '19

Sponsor:

SIGDA

ASPDAC '19: 24th Asia and South Pacific Design Automation Conference

January 21 - 24, 2019

Tokyo, Japan

Acceptance Rates

Overall Acceptance Rate 466 of 1,454 submissions, 32%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
510
Total Downloads

Downloads (Last 12 months)42
Downloads (Last 6 weeks)3

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

葛旭金学马慧邹天(2024)YOLOv7-BW: 基于遥感图像的密集小目标高效检测器智能机器人10.52810/JIR.2024.0041:1(39-54)Online publication date: 30-May-2024
https://doi.org/10.52810/JIR.2024.004
耿浩(2024)基于机器学习和深度学习的抗菌肽预测研究进展人工智能前沿与应用10.52810/FAAI.2024.0051:1(54-68)Online publication date: 15-Jun-2024
https://doi.org/10.52810/FAAI.2024.005
曹欣马慧(2024)基于拮抗特性模型的夜视微光图像与红外图像彩色融合人工智能前沿与应用10.52810/FAAI.2024.0041:1(45-53)Online publication date: 28-May-2024
https://doi.org/10.52810/FAAI.2024.004
崔佳(2024)基于机器学习和深度学习的蛋白质结构预测研究进展人工智能前沿与应用10.52810/FAAI.2024.0031:1(32-44)Online publication date: 20-May-2024
https://doi.org/10.52810/FAAI.2024.003
金学刘嵩(2024)基于GPS的堆叠串行LSTM组合神经网络目标跟踪方法人工智能前沿与应用10.52810/FAAI.2024.0021:1(16-31)Online publication date: 18-Apr-2024
https://doi.org/10.52810/FAAI.2024.002
Chen TWang WChen JFu HYi WCheng BZhang HPan B(2024)PipeCIM: A High-Throughput Computing-In-Memory Microprocessor With Nested Pipeline and RISC-V Extended InstructionsIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2024.338427171:7(3214-3227)Online publication date: Jul-2024
https://doi.org/10.1109/TCSI.2024.3384271
Han MWang LXiao LCai TWang ZXu XZhang C(2024)ReDas: A Lightweight Architecture for Supporting Fine-Grained Reshaping and Multiple Dataflows on Systolic ArrayIEEE Transactions on Computers10.1109/TC.2024.339850073:8(1997-2011)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1109/TC.2024.3398500
Darbani PBeitollahi HLotfi-Kamran P(2023)Rei: A Reconfigurable Interconnection Unit for Array-Based CNN AcceleratorsIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2023.329013811:4(895-906)Online publication date: Oct-2023
https://doi.org/10.1109/TETC.2023.3290138
Choi JHa YLee JLee SLee JJang HKim Y(2023)Enabling Fine-Grained Spatial Multitasking on Systolic-Array NPUs Using Dataflow MirroringIEEE Transactions on Computers10.1109/TC.2023.329903072:12(3383-3398)Online publication date: Dec-2023
https://doi.org/10.1109/TC.2023.3299030
Hanson ELi SQian XLi HChen Y(2023)DyNNamic: Dynamically Reshaping, High Data-Reuse Accelerator for Compact DNNsIEEE Transactions on Computers10.1109/TC.2022.318427272:3(880-892)Online publication date: 1-Mar-2023
https://doi.org/10.1109/TC.2022.3184272
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten