research-article

A Comprehensive Analysis of Low-Impact Computations in Deep Learning Workloads

Authors:

Lin MengAuthors Info & Claims

GLSVLSI '21: Proceedings of the 2021 Great Lakes Symposium on VLSI

Pages 385 - 390

https://doi.org/10.1145/3453688.3461747

Published: 22 June 2021 Publication History

Get Access

Abstract

Deep Neural Networks (DNNs) have achieved great successes in various machine learning tasks involving a wide range of domains. Though there are multiple hardware platforms available, such as GPUs, CPUs, FPGAs, and etc, CPUs are still preferred choices for machine learning applications, especially in low-power and resource-constrained computation environments such as embedded systems. However, the power and performance efficiency become critical issues in such computation environments when applying DNN techniques. An attractive optimization to DNNs is to remove redundant computations to enhance the execution efficiency. To this end, this paper conducts extensive experiments and analyses on popular state-of-the-art deep learning models. The experimental results include the numbers of instructions, branches, branch prediction misses, cache misses, and etc, during the execution of the models. Besides, we also investigate the performance and sparsity of each layer in the models. Based on the analysis results, this paper also proposes an instruction-level optimization, which achieves the performance improvement ranging from 10.26% to 28.0% for certain convolution layers.

Supplemental Material

MP4 File

The report introduces a comprehensive analysis on the hardware characteristics and layer-wise performance of representative DNNs on SIMD-CPU architecture, including: the numbers of instructions, branches, branch prediction misses, cache misses, and etc., the layer-wise time performance and sparsity. Based on the analysis results, we propose an instruction-level optimization, which achieves the performance improvement ranging from 10.26% to 28.00% for certain convolution layers. Although the proposal reduces the number of instructions, it also brings lots of branch instructions and branch mis-prediction. This points out an interesting research direction for the future design of DNN accelerators. We also can design a dedicated branch predictor for DNNs. The research provides a guideline for optimizing DNNs on CPUs with SIMD extensions, as well as the potential hardware solutions based on FPGAs and heterogeneous accelerators.

Download
20.66 MB

References

[1]

Alex Krizhevsky et al. Imagenet classification with deep convolutional neural networks. volume 25. Curran Associates, Inc., 2012.

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

An architecture-level analysis on deep learning models for low-impact computations

Optimizing Deep Learning Workloads on ARM GPU with TVM

Advanced Deep Learning with Keras: Apply deep learning techniques, autoencoders, GANs, variational autoencoders, deep reinforcement learning, policy gradients, and more

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Data Availability

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations