research-article

Accelerate DNN Performance with Sparse Matrix Compression in Halide

Authors:

Chung-Wen HuangAuthors Info & Claims

ICPP Workshops '19: Workshop Proceedings of the 48th International Conference on Parallel Processing

Article No.: 14, Pages 1 - 6

https://doi.org/10.1145/3339186.3339194

Published: 05 August 2019 Publication History

Get Access

Abstract

Machine learning nowadays is profoundly impacting every aspect of our lives. With the evolution of the machine learning, many techniques, such as deep learning, improve the accuracy and performance of machine learning. Deep learning is a set of ML techniques that use layers of transformation and consist of neural networks. The power consumption of deep learning becomes a serious problem when it comes to edge computing. One of the most computationally demanding operation of DNN is convolution which preserve the image arrangement and obtain partial image as an input feature. Our goal is to find an effective way for programmers to improve the performance of convolution operation. In this paper, we proposed the design of sparse matrix compression schedule primitives in Halide and find a way to improve convolution operation with im2col method. Halide is an image processing programming language that separates algorithm from its schedule. With this design, we can compress the result of im2col matrix to achieve performance improvements. In our experiments, results show that convolution operation can achieve 20X speedup with our implementation.

References

[1]

Rong-Guey Chang, Tyng-Ruey Chuang, and Jenq Kuen Lee. 2001. Parallel sparse supports for array intrinsic functions of Fortran 90. The Journal of supercomputing 18, 3 (2001), 305--339.

Digital Library

Google Scholar

[2]

Rong-Guey Chang, Tyng-Ruey Chuang, and Jenq Kuen Lee. 2004. Support and optimization for parallel sparse programs with array intrinsics of Fortran 90. Parallel Comput. 30, 4 (2004), 527--550.

Digital Library

Google Scholar

[3]

Rong-Guey Chang, Jia-Shin Li, Jenq Kuen Lee, and Tyng-Ruey Chuang. 2001. Probabilistic inference schemes for sparsity structures of Fortran 90 array intrinsics. In International Conference on Parallel Processing, 2001. IEEE, 61--68.

Digital Library

Google Scholar

[4]

Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. 2018. {TVM}: An automated end-to-end optimizing compiler for deep learning. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 578--594.

Digital Library

Google Scholar

[5]

MIT CSAIL. 2019. Halide - a language for fast, portable computation on images and tensors. https://halide-lang.org/

Google Scholar

[6]

Tim Davis, WW Hager, and IS Duff. 2014. SuiteSparse. URL: faculty. cse. tamu. edu/davis/suitesparse. html (2014).

Google Scholar

[7]

Khronos Group. 2019. Neural Network Exchange Format (NNEF). https://www.khronos.org/nnef

Google Scholar

[8]

Chia-Chen Hsu, Cheng-Yen Lin, Shin Kai Chen, Chih-Wei Liu, and Jenq-Kuen Lee. 2014. Optimized memory access support for data layout conversion on heterogeneous multi-core systems. In 2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia). IEEE, 128--137.

Crossref

Google Scholar

[9]

Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014).

Google Scholar

[10]

Chao-Lin Lee, Min-Yih Hsu, Bing-Sung Lu, and Jenq-Kuen Lee. 2018. Enable the Flow for GPGPU-Sim Simulators with Fixed-Point Instructions. In Proceedings of the 47th International Conference on Parallel Processing Companion. ACM, 12.

Digital Library

Google Scholar

[11]

Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Frédo Durand. 2012. Decoupling algorithms from schedules for easy optimization of image processing pipelines. (2012).

Google Scholar

[12]

Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ACM SIGPLAN Notices 48, 6 (2013), 519--530.

Digital Library

Google Scholar

Cited By

View all

Xu WSun YFan SYu HFu X(2023)Accelerating Convolutional Neural Network by Exploiting Sparsity on GPUsACM Transactions on Architecture and Code Optimization10.1145/360009220:3(1-26)Online publication date: 27-May-2023
https://dl.acm.org/doi/10.1145/3600092
Liao HLee CLee JLai WHung MHuang C(2021)Support Convolution of CNN with Compression Sparse Matrix Multiplication Flow in TVM50th International Conference on Parallel Processing Workshop10.1145/3458744.3473352(1-7)Online publication date: 9-Aug-2021
https://dl.acm.org/doi/10.1145/3458744.3473352
Chao CChu WLee CLee JHung MSung H(2020)Devise Sparse Compression Schedulers to Enhance FastText MethodsWorkshop Proceedings of the 49th International Conference on Parallel Processing10.1145/3409390.3409394(1-8)Online publication date: 17-Aug-2020
https://dl.acm.org/doi/10.1145/3409390.3409394

Index Terms

Accelerate DNN Performance with Sparse Matrix Compression in Halide

Recommendations

Sparse-Matrix Compression Primitives with OpenCL Framework to Support Halide
IWOCL '19: Proceedings of the International Workshop on OpenCL

Halide and OpenCL now play important roles for heterogeneous multi-core computing. OpenCL provides vendor-level support and Halide provides domain-specific support such as vision processing and AI model (TVM Halide IR). Halide also provides flexible ...
Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs
SCC '12: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis

OpenCL (Open Computing Language) is a framework for general-purpose parallel programming. Programs written in OpenCL are functionally portable across multiple processors including CPUs, GPUs, and also FPGAs. Using an auto-tuning technique makes ...
A Halide-based Synergistic Computing Framework for Heterogeneous Systems

New programming models have been developed to embrace contemporary heterogeneous machines, each of which may contain several types of processors, e.g., CPUs, GPUs, FPGAs and ASICs. Unlike the conventional ones, which use separate programming schemes for ...

Comments

Information & Contributors

Information

Published In

ICPP Workshops '19: Workshop Proceedings of the 48th International Conference on Parallel Processing

August 2019

241 pages

ISBN:9781450371964

DOI:10.1145/3339186

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

University of Tsukuba: University of Tsukuba

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 August 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICPP 2019

ICPP 2019: Workshops

August 5 - 8, 2019

Kyoto, Japan

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
283
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)0

Reflects downloads up to 21 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Xu WSun YFan SYu HFu X(2023)Accelerating Convolutional Neural Network by Exploiting Sparsity on GPUsACM Transactions on Architecture and Code Optimization10.1145/360009220:3(1-26)Online publication date: 27-May-2023
https://dl.acm.org/doi/10.1145/3600092
Liao HLee CLee JLai WHung MHuang C(2021)Support Convolution of CNN with Compression Sparse Matrix Multiplication Flow in TVM50th International Conference on Parallel Processing Workshop10.1145/3458744.3473352(1-7)Online publication date: 9-Aug-2021
https://dl.acm.org/doi/10.1145/3458744.3473352
Chao CChu WLee CLee JHung MSung H(2020)Devise Sparse Compression Schedulers to Enhance FastText MethodsWorkshop Proceedings of the 49th International Conference on Parallel Processing10.1145/3409390.3409394(1-8)Online publication date: 17-Aug-2020
https://dl.acm.org/doi/10.1145/3409390.3409394

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Sparse-Matrix Compression Primitives with OpenCL Framework to Support Halide

Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs

A Halide-based Synergistic Computing Framework for Heterogeneous Systems

Comments

Published In

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Sparse-Matrix Compression Primitives with OpenCL Framework to Support Halide

Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs

A Halide-based Synergistic Computing Framework for Heterogeneous Systems

Comments

Information

Published In

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations