skip to main content
10.1145/3453688.3461747acmconferencesArticle/Chapter ViewAbstractPublication PagesglsvlsiConference Proceedingsconference-collections
research-article

A Comprehensive Analysis of Low-Impact Computations in Deep Learning Workloads

Published: 22 June 2021 Publication History

Abstract

Deep Neural Networks (DNNs) have achieved great successes in various machine learning tasks involving a wide range of domains. Though there are multiple hardware platforms available, such as GPUs, CPUs, FPGAs, and etc, CPUs are still preferred choices for machine learning applications, especially in low-power and resource-constrained computation environments such as embedded systems. However, the power and performance efficiency become critical issues in such computation environments when applying DNN techniques. An attractive optimization to DNNs is to remove redundant computations to enhance the execution efficiency. To this end, this paper conducts extensive experiments and analyses on popular state-of-the-art deep learning models. The experimental results include the numbers of instructions, branches, branch prediction misses, cache misses, and etc, during the execution of the models. Besides, we also investigate the performance and sparsity of each layer in the models. Based on the analysis results, this paper also proposes an instruction-level optimization, which achieves the performance improvement ranging from 10.26% to 28.0% for certain convolution layers.

Supplemental Material

MP4 File
The report introduces a comprehensive analysis on the hardware characteristics and layer-wise performance of representative DNNs on SIMD-CPU architecture, including: the numbers of instructions, branches, branch prediction misses, cache misses, and etc., the layer-wise time performance and sparsity. Based on the analysis results, we propose an instruction-level optimization, which achieves the performance improvement ranging from 10.26% to 28.00% for certain convolution layers. Although the proposal reduces the number of instructions, it also brings lots of branch instructions and branch mis-prediction. This points out an interesting research direction for the future design of DNN accelerators. We also can design a dedicated branch predictor for DNNs. The research provides a guideline for optimizing DNNs on CPUs with SIMD extensions, as well as the potential hardware solutions based on FPGAs and heterogeneous accelerators.

References

[1]
Alex Krizhevsky et al. Imagenet classification with deep convolutional neural networks. volume 25. Curran Associates, Inc., 2012.
[2]
Christian Szegedy et al. Going deeper with convolutions. CVPR2015, pages 1--9, 2015.
[3]
Lin Meng et al. Oracle bone inscription detector based on ssd. ICIAP2019, pages 126--136, 2019.
[4]
Lin Meng et al. Underwater-drone with panoramic camera for automatic fish recognition based on deep learning. Access, pages 17880--17886, 2018.
[5]
Jongsoo Park et al. Deep learning inference in facebook data centers: Characterization, performance optimizations and hardware implications. arXiv:1811.09886, 2018.
[6]
Karen Simonyan et al. Very deep convolutional networks for large-scale image recognition. ICLR2015, 2015.
[7]
Kaiming He et al. Deep residual learning for image recognition. CVPR2016, pages 770--778, 2016.
[8]
Gao Huang et al. Densely connected convolutional networks. CVPR2017, pages 2261--2269, 2017.
[9]
Masahiko Atsumi et al. A comprehensive analysis of low-impact computations in deep learning applications. FIT2020, 2020.
[10]
Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. The 32nd Int.Conf., 2015.
[11]
Christian Szegedy et al. Rethinking the inception architecture for computer vision. CVPR2016, pages 2818--2826, 2016.
[12]
Xiangyu Zhang et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices. CVPR2018, pages 6848--6856, 2018.
[13]
Ningning Ma et al. Shufflenet V2: practical guidelines for efficient CNN architecture design. In ECCV2018, volume 11218, pages 122--138, 2018.
[14]
Andrew G. Howard et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR, 2017.
[15]
Mingxing Tan et al. Mnasnet: Platform-aware neural architecture search for mobile. In CVPR2019, pages 2820--2828, 2019.
[16]
Center for Open Data in the Humanities:. Kuzushiji dataset (in japanese). http://codh.rois.ac.jp/char-shape/book/100249416/, 2019. (Apr. 22, 2021 accessed).
[17]
Lin Meng et al. A novel branch predictor using local history for miss-prediction. CDES2012, pages 77--83, 2012.
[18]
Xuda Zhou et al. Cambricon-s: Addressing irregularity in sparse neural networks through a cooperative software/hardware approach. In Proc. 51st IEEE/ACM Int. Symp. Microarchitecture, pages 15--28, 2018.
[19]
Stijin Eyerman et al. Characterizing the branch misprediction penalty. In IS-PASS2006, pages 48--58, 2006

Cited By

View all
  • (2024)Deep Convolutional Neural Networks Based on Knowledge Distillation for Offline Handwritten Chinese Character RecognitionJournal of Advanced Computational Intelligence and Intelligent Informatics10.20965/jaciii.2024.p023128:2(231-238)Online publication date: 20-Mar-2024
  • (2024)Personalized Gait Generation Using Convolutional Neural Network for Lower Limb Rehabilitation Robots2024 IEEE International Conference on Real-time Computing and Robotics (RCAR)10.1109/RCAR61438.2024.10670989(617-622)Online publication date: 24-Jun-2024
  • (2024)Special Session: Estimation and Optimization of DNNs for Embedded Platforms2024 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)10.1109/CODES-ISSS60120.2024.00013(21-30)Online publication date: 29-Sep-2024
  • Show More Cited By

Index Terms

  1. A Comprehensive Analysis of Low-Impact Computations in Deep Learning Workloads

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    GLSVLSI '21: Proceedings of the 2021 Great Lakes Symposium on VLSI
    June 2021
    504 pages
    ISBN:9781450383936
    DOI:10.1145/3453688
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 June 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. comprehensive analysis
    2. deep learning neural network
    3. instruction-level
    4. model optimization
    5. simd-cpu-architecture

    Qualifiers

    • Research-article

    Data Availability

    The report introduces a comprehensive analysis on the hardware characteristics and layer-wise performance of representative DNNs on SIMD-CPU architecture, including: the numbers of instructions, branches, branch prediction misses, cache misses, and etc., the layer-wise time performance and sparsity. Based on the analysis results, we propose an instruction-level optimization, which achieves the performance improvement ranging from 10.26% to 28.00% for certain convolution layers. Although the proposal reduces the number of instructions, it also brings lots of branch instructions and branch mis-prediction. This points out an interesting research direction for the future design of DNN accelerators. We also can design a dedicated branch predictor for DNNs. The research provides a guideline for optimizing DNNs on CPUs with SIMD extensions, as well as the potential hardware solutions based on FPGAs and heterogeneous accelerators. https://dl.acm.org/doi/10.1145/3453688.3461747#GLSVLSI21-vlsi32s.mp4

    Conference

    GLSVLSI '21
    Sponsor:
    GLSVLSI '21: Great Lakes Symposium on VLSI 2021
    June 22 - 25, 2021
    Virtual Event, USA

    Acceptance Rates

    Overall Acceptance Rate 312 of 1,156 submissions, 27%

    Upcoming Conference

    GLSVLSI '25
    Great Lakes Symposium on VLSI 2025
    June 30 - July 2, 2025
    New Orleans , LA , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)17
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 19 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Deep Convolutional Neural Networks Based on Knowledge Distillation for Offline Handwritten Chinese Character RecognitionJournal of Advanced Computational Intelligence and Intelligent Informatics10.20965/jaciii.2024.p023128:2(231-238)Online publication date: 20-Mar-2024
    • (2024)Personalized Gait Generation Using Convolutional Neural Network for Lower Limb Rehabilitation Robots2024 IEEE International Conference on Real-time Computing and Robotics (RCAR)10.1109/RCAR61438.2024.10670989(617-622)Online publication date: 24-Jun-2024
    • (2024)Special Session: Estimation and Optimization of DNNs for Embedded Platforms2024 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)10.1109/CODES-ISSS60120.2024.00013(21-30)Online publication date: 29-Sep-2024
    • (2024)Improved yolov5 algorithm combined with depth camera and embedded system for blind indoor visual assistanceScientific Reports10.1038/s41598-024-74416-214:1Online publication date: 3-Oct-2024
    • (2023)Deep Learning Architecture Improvement Based on Dynamic Pruning and Layer FusionElectronics10.3390/electronics1205120812:5(1208)Online publication date: 2-Mar-2023
    • (2023)Model Compression for Deep Neural Networks: A SurveyComputers10.3390/computers1203006012:3(60)Online publication date: 12-Mar-2023
    • (2023)Particle Swarm Optimization-Based Convolutional Neural Network for Handwritten Chinese Character RecognitionJournal of Advanced Computational Intelligence and Intelligent Informatics10.20965/jaciii.2023.p016527:2(165-172)Online publication date: 20-Mar-2023
    • (2023)An Ultralightweight Object Detection Network for Empty-Dish Recycling RobotsIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2023.324107872(1-12)Online publication date: 2023
    • (2023)Convolutional Neural Network Compression Method Based on Multi-Factor Channel Pruning2023 9th International Conference on Systems and Informatics (ICSAI)10.1109/ICSAI61474.2023.10423320(1-6)Online publication date: 16-Dec-2023
    • (2022)YOLO-GD: A Deep Learning-Based Object Detection Algorithm for Empty-Dish Recycling RobotsMachines10.3390/machines1005029410:5(294)Online publication date: 22-Apr-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media