skip to main content
10.1145/3339186.3339194acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Accelerate DNN Performance with Sparse Matrix Compression in Halide

Published: 05 August 2019 Publication History

Abstract

Machine learning nowadays is profoundly impacting every aspect of our lives. With the evolution of the machine learning, many techniques, such as deep learning, improve the accuracy and performance of machine learning. Deep learning is a set of ML techniques that use layers of transformation and consist of neural networks. The power consumption of deep learning becomes a serious problem when it comes to edge computing. One of the most computationally demanding operation of DNN is convolution which preserve the image arrangement and obtain partial image as an input feature. Our goal is to find an effective way for programmers to improve the performance of convolution operation. In this paper, we proposed the design of sparse matrix compression schedule primitives in Halide and find a way to improve convolution operation with im2col method. Halide is an image processing programming language that separates algorithm from its schedule. With this design, we can compress the result of im2col matrix to achieve performance improvements. In our experiments, results show that convolution operation can achieve 20X speedup with our implementation.

References

[1]
Rong-Guey Chang, Tyng-Ruey Chuang, and Jenq Kuen Lee. 2001. Parallel sparse supports for array intrinsic functions of Fortran 90. The Journal of supercomputing 18, 3 (2001), 305--339.
[2]
Rong-Guey Chang, Tyng-Ruey Chuang, and Jenq Kuen Lee. 2004. Support and optimization for parallel sparse programs with array intrinsics of Fortran 90. Parallel Comput. 30, 4 (2004), 527--550.
[3]
Rong-Guey Chang, Jia-Shin Li, Jenq Kuen Lee, and Tyng-Ruey Chuang. 2001. Probabilistic inference schemes for sparsity structures of Fortran 90 array intrinsics. In International Conference on Parallel Processing, 2001. IEEE, 61--68.
[4]
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. 2018. {TVM}: An automated end-to-end optimizing compiler for deep learning. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 578--594.
[5]
MIT CSAIL. 2019. Halide - a language for fast, portable computation on images and tensors. https://halide-lang.org/
[6]
Tim Davis, WW Hager, and IS Duff. 2014. SuiteSparse. URL: faculty. cse. tamu. edu/davis/suitesparse. html (2014).
[7]
Khronos Group. 2019. Neural Network Exchange Format (NNEF). https://www.khronos.org/nnef
[8]
Chia-Chen Hsu, Cheng-Yen Lin, Shin Kai Chen, Chih-Wei Liu, and Jenq-Kuen Lee. 2014. Optimized memory access support for data layout conversion on heterogeneous multi-core systems. In 2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia). IEEE, 128--137.
[9]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014).
[10]
Chao-Lin Lee, Min-Yih Hsu, Bing-Sung Lu, and Jenq-Kuen Lee. 2018. Enable the Flow for GPGPU-Sim Simulators with Fixed-Point Instructions. In Proceedings of the 47th International Conference on Parallel Processing Companion. ACM, 12.
[11]
Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Frédo Durand. 2012. Decoupling algorithms from schedules for easy optimization of image processing pipelines. (2012).
[12]
Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ACM SIGPLAN Notices 48, 6 (2013), 519--530.

Cited By

View all
  • (2023)Accelerating Convolutional Neural Network by Exploiting Sparsity on GPUsACM Transactions on Architecture and Code Optimization10.1145/360009220:3(1-26)Online publication date: 27-May-2023
  • (2021)Support Convolution of CNN with Compression Sparse Matrix Multiplication Flow in TVM50th International Conference on Parallel Processing Workshop10.1145/3458744.3473352(1-7)Online publication date: 9-Aug-2021
  • (2020)Devise Sparse Compression Schedulers to Enhance FastText MethodsWorkshop Proceedings of the 49th International Conference on Parallel Processing10.1145/3409390.3409394(1-8)Online publication date: 17-Aug-2020

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP Workshops '19: Workshop Proceedings of the 48th International Conference on Parallel Processing
August 2019
241 pages
ISBN:9781450371964
DOI:10.1145/3339186
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • University of Tsukuba: University of Tsukuba

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 August 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Deep Learning
  2. Halide
  3. OpenCL
  4. Sparse Matrix

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICPP 2019
ICPP 2019: Workshops
August 5 - 8, 2019
Kyoto, Japan

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Accelerating Convolutional Neural Network by Exploiting Sparsity on GPUsACM Transactions on Architecture and Code Optimization10.1145/360009220:3(1-26)Online publication date: 27-May-2023
  • (2021)Support Convolution of CNN with Compression Sparse Matrix Multiplication Flow in TVM50th International Conference on Parallel Processing Workshop10.1145/3458744.3473352(1-7)Online publication date: 9-Aug-2021
  • (2020)Devise Sparse Compression Schedulers to Enhance FastText MethodsWorkshop Proceedings of the 49th International Conference on Parallel Processing10.1145/3409390.3409394(1-8)Online publication date: 17-Aug-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media