research-article

MT-DLA: An Efficient Multi-Task Deep Learning Accelerator Design

Authors:

Lei ZhangAuthors Info & Claims

GLSVLSI '21: Proceedings of the 2021 Great Lakes Symposium on VLSI

Pages 1 - 8

https://doi.org/10.1145/3453688.3461514

Published: 22 June 2021 Publication History

Abstract

Multi-task learning systems are commonly adopted in many real-world AI applications such as intelligent robots and self-driving vehicles. Instead of improving single-network performance, this work proposes a specialized Multi-Task Deep Learning Accelerator architecture, MT-DLA, to improve the performance of concurrent networks by exploiting the shared feature and parameters across these models. It is shown in our evaluation with realistic multi-task workloads, MT-DLA dramatically eliminates the memory and computation overhead caused by the shared parameters, activations and computation result. In the experiments with real-world multi-task learning workloads, MT-DLA brings about 1.4x-7.0x energy efficiency boost when compared to the baseline neural network accelerator without multi-task support.

Supplemental Material

MP4 File

Presentation video This is the presentation video of "MT-DLA: An Efficient Multi-Task Deep Learning Accelerator Design", including background of multi-task learning system, motivation of our work, and the method description of our MT-DLA. We also present our evaluation results and conclusion.

Download
48.34 MB

References

[1]

J. Redmon et al., "You Only Look Once: Unified, Real-Time Object Detection," in CVPR, 2016.

[2]

V. Badrinarayanan et al., "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.

[3]

D. Held et al., "Learning to Track at 100 FPS with Deep Regression Networks," in ECCV, 2016.

[4]

K. He, et al., "Mask R-CNN," arxiv, 2017.

[5]

Y. Fang, et al., "Dynamic Multi-Task Learning with Convolutional Neural Network," in IJCAI, 2017.

[6]

Y. Gao, et al., "NDDR-CNN: Layerwise Feature Fusing in Multi-Task CNNs by Neural Discriminative Dimensionality Reduction," in CVPR.

[7]

A. Mallya et al., "Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights," in ECCV, 2018.

[8]

M. Teichmann, "MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving," in IV, 2018.

[9]

X. He et al., "Multi-Task Zipping via Layer-wise Neuron Sharing," in NIPS, 2018.

[10]

X. Zhou et al., "Cambricon-S: Addressing Irregularity in Sparse Neural Networks through A Cooperative Software/Hardware Approach," in MICRO, 2018.

[11]

S. Han et al., "EIE: efficient inference engine on compressed deep neural network," in ISCA, 2016.

Digital Library

[12]

D. Wang et al., "ABM-SpConv: A Novel Approach to FPGA-Based Acceleration of Convolutional Neural Network Inference," in DAC, 2019.

Digital Library

[13]

Y. Wang et al., "Systolic Cube: A Spatial 3D CNN Accelerator Architecture for Low Power Video Analysis," in DAC, 2019.

Digital Library

[14]

J. Albericio et al., "Cnvlutin: Ineffectual-neuron-free Deep Neural Network Computing," in ISCA, 2016.

Digital Library

[15]

A. Parashar et al., "SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks," in ISCA, 2017.

Digital Library

[16]

J. Albericio et al., "Bit-Pragmatic Deep Neural Network Computing," in MICRO, 2017.

[17]

Y. Wang et al., "A None-Sparse Inference Accelerator that Distills and Reuses the Computation Redundancy in CNNs," in DAC, 2019.

[18]

K. Hegde et al., "UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition," in ISCA, 2018.

[19]

M. Imani et al., "RAPIDNN: In-Memory Deep Neural Network Acceleration Framework," arXiv, 2018.

[20]

D. Zhang et al., "LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks," in ECCV, 2018.

[21]

Y. Chen et al., "Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks," in ISCA, 2016.

Digital Library

[22]

Eunjin Baek et al., "A multineural network acceleration architecture." in ISCA, 2020.

[23]

S. Lee et al., "Fast and scalable in-memory deep multitask learning via neural weight virtualization," in MobiSys, 2020.

Cited By

Li CFan XWu XYang ZWang MZhang MZhang S(2022)Memory-Computing Decoupling: A DNN Multitasking Accelerator With Adaptive Data ArrangementIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.319749341:11(4112-4123)Online publication date: 1-Nov-2022
https://dl.acm.org/doi/10.1109/TCAD.2022.3197493

Index Terms

MT-DLA: An Efficient Multi-Task Deep Learning Accelerator Design
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
2. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators

Recommendations

Reusing GEMM Hardware for Efficient Execution of Depthwise Separable Convolution on ASIC-Based DNN Accelerators
ASPDAC '23: Proceedings of the 28th Asia and South Pacific Design Automation Conference

Deep learning (DL) accelerators are optimized for standard convolution. However, lightweight convolutional neural networks (CNNs) use depthwise convolution (DwC) in key layers, and the structural difference between DwC and standard convolution leads to ...
Enhanced task attention with adversarial learning for dynamic multi-task CNN
Highlights
- We propose a novel learning framework of multi-task CNN to enhance task attention through tuning the TTC of the shared subnet DMT-CNN with adversarial ...
Abstract
Multi-task deep learning is promising to solve multi-label multi-instance visual recognition tasks. However, flexible information sharing in the task group might bring performance bottlenecks to an individual task. To tackle this ...
Saliency-Regularized Deep Multi-Task Learning
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Multi-task learning (MTL) is a framework that enforces multiple learning tasks to share their knowledge to improve their generalization abilities. While shallow multi-task learning can learn task relations, it can only handle pre-defined features. Modern ...

Comments

Information & Contributors

Information

Published In

GLSVLSI '21: Proceedings of the 2021 Great Lakes Symposium on VLSI

June 2021

504 pages

ISBN:9781450383936

DOI:10.1145/3453688

General Chairs:
Yiran Chen
Duke University, USA
,
Victor Zhirnov
Semiconductor Research Corporation, USA
,
Program Chairs:
Avesta Sasan
George Mason University, USA
,
Ioannis Savidis
Drexel University, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Honorable Mention

Author Tags

Qualifiers

Research-article

Data Availability

Presentation video This is the presentation video of "MT-DLA: An Efficient Multi-Task Deep Learning Accelerator Design", including background of multi-task learning system, motivation of our work, and the method description of our MT-DLA. We also present our evaluation results and conclusion. https://dl.acm.org/doi/10.1145/3453688.3461514#GLSVLSI21-130.mp4.mp4

Conference

GLSVLSI '21

Sponsor:

SIGDA

GLSVLSI '21: Great Lakes Symposium on VLSI 2021

June 22 - 25, 2021

Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 312 of 1,156 submissions, 27%

Upcoming Conference

GLSVLSI '25

Sponsor:
sigda

Great Lakes Symposium on VLSI 2025

June 30 - July 2, 2025

New Orleans , LA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
441
Total Downloads

Downloads (Last 12 months)31
Downloads (Last 6 weeks)1

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li CFan XWu XYang ZWang MZhang MZhang S(2022)Memory-Computing Decoupling: A DNN Multitasking Accelerator With Adaptive Data ArrangementIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.319749341:11(4112-4123)Online publication date: 1-Nov-2022
https://dl.acm.org/doi/10.1109/TCAD.2022.3197493

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten