skip to main content
10.1145/3453688.3461514acmconferencesArticle/Chapter ViewAbstractPublication PagesglsvlsiConference Proceedingsconference-collections
research-article

MT-DLA: An Efficient Multi-Task Deep Learning Accelerator Design

Published: 22 June 2021 Publication History

Abstract

Multi-task learning systems are commonly adopted in many real-world AI applications such as intelligent robots and self-driving vehicles. Instead of improving single-network performance, this work proposes a specialized Multi-Task Deep Learning Accelerator architecture, MT-DLA, to improve the performance of concurrent networks by exploiting the shared feature and parameters across these models. It is shown in our evaluation with realistic multi-task workloads, MT-DLA dramatically eliminates the memory and computation overhead caused by the shared parameters, activations and computation result. In the experiments with real-world multi-task learning workloads, MT-DLA brings about 1.4x-7.0x energy efficiency boost when compared to the baseline neural network accelerator without multi-task support.

Supplemental Material

MP4 File
Presentation video This is the presentation video of "MT-DLA: An Efficient Multi-Task Deep Learning Accelerator Design", including background of multi-task learning system, motivation of our work, and the method description of our MT-DLA. We also present our evaluation results and conclusion.

References

[1]
J. Redmon et al., "You Only Look Once: Unified, Real-Time Object Detection," in CVPR, 2016.
[2]
V. Badrinarayanan et al., "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
[3]
D. Held et al., "Learning to Track at 100 FPS with Deep Regression Networks," in ECCV, 2016.
[4]
K. He, et al., "Mask R-CNN," arxiv, 2017.
[5]
Y. Fang, et al., "Dynamic Multi-Task Learning with Convolutional Neural Network," in IJCAI, 2017.
[6]
Y. Gao, et al., "NDDR-CNN: Layerwise Feature Fusing in Multi-Task CNNs by Neural Discriminative Dimensionality Reduction," in CVPR.
[7]
A. Mallya et al., "Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights," in ECCV, 2018.
[8]
M. Teichmann, "MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving," in IV, 2018.
[9]
X. He et al., "Multi-Task Zipping via Layer-wise Neuron Sharing," in NIPS, 2018.
[10]
X. Zhou et al., "Cambricon-S: Addressing Irregularity in Sparse Neural Networks through A Cooperative Software/Hardware Approach," in MICRO, 2018.
[11]
S. Han et al., "EIE: efficient inference engine on compressed deep neural network," in ISCA, 2016.
[12]
D. Wang et al., "ABM-SpConv: A Novel Approach to FPGA-Based Acceleration of Convolutional Neural Network Inference," in DAC, 2019.
[13]
Y. Wang et al., "Systolic Cube: A Spatial 3D CNN Accelerator Architecture for Low Power Video Analysis," in DAC, 2019.
[14]
J. Albericio et al., "Cnvlutin: Ineffectual-neuron-free Deep Neural Network Computing," in ISCA, 2016.
[15]
A. Parashar et al., "SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks," in ISCA, 2017.
[16]
J. Albericio et al., "Bit-Pragmatic Deep Neural Network Computing," in MICRO, 2017.
[17]
Y. Wang et al., "A None-Sparse Inference Accelerator that Distills and Reuses the Computation Redundancy in CNNs," in DAC, 2019.
[18]
K. Hegde et al., "UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition," in ISCA, 2018.
[19]
M. Imani et al., "RAPIDNN: In-Memory Deep Neural Network Acceleration Framework," arXiv, 2018.
[20]
D. Zhang et al., "LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks," in ECCV, 2018.
[21]
Y. Chen et al., "Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks," in ISCA, 2016.
[22]
Eunjin Baek et al., "A multineural network acceleration architecture." in ISCA, 2020.
[23]
S. Lee et al., "Fast and scalable in-memory deep multitask learning via neural weight virtualization," in MobiSys, 2020.

Cited By

View all
  • (2022)Memory-Computing Decoupling: A DNN Multitasking Accelerator With Adaptive Data ArrangementIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.319749341:11(4112-4123)Online publication date: 1-Nov-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GLSVLSI '21: Proceedings of the 2021 Great Lakes Symposium on VLSI
June 2021
504 pages
ISBN:9781450383936
DOI:10.1145/3453688
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 June 2021

Permissions

Request permissions for this article.

Check for updates

Badges

  • Honorable Mention

Author Tags

  1. coordinated compression
  2. deep learning accelerator
  3. multi-task learning

Qualifiers

  • Research-article

Data Availability

Presentation video This is the presentation video of "MT-DLA: An Efficient Multi-Task Deep Learning Accelerator Design", including background of multi-task learning system, motivation of our work, and the method description of our MT-DLA. We also present our evaluation results and conclusion. https://dl.acm.org/doi/10.1145/3453688.3461514#GLSVLSI21-130.mp4.mp4

Conference

GLSVLSI '21
Sponsor:
GLSVLSI '21: Great Lakes Symposium on VLSI 2021
June 22 - 25, 2021
Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 312 of 1,156 submissions, 27%

Upcoming Conference

GLSVLSI '25
Great Lakes Symposium on VLSI 2025
June 30 - July 2, 2025
New Orleans , LA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)31
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Memory-Computing Decoupling: A DNN Multitasking Accelerator With Adaptive Data ArrangementIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.319749341:11(4112-4123)Online publication date: 1-Nov-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media