Accelerating AI Applications with Sparse Matrix Compression in Halide

Lee, Chao-Lin; Chao, Chen-Ting; Chu, Wei-Hsu; Hung, Ming-Yu; Lee, Jenq-Kuen

doi:10.1007/s11265-022-01821-z

Accelerating AI Applications with Sparse Matrix Compression in Halide

Published: 03 November 2022

Volume 95, pages 609–622, (2023)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Chao-Lin Lee ORCID: orcid.org/0000-0002-4619-3843¹,
Chen-Ting Chao¹,
Wei-Hsu Chu¹,
Ming-Yu Hung² &
…
Jenq-Kuen Lee¹

480 Accesses
Explore all metrics

Abstract

Machine learning profoundly impacts every aspect of our lives. As machine learning evolves, many techniques, such as deep learning, are improving its accuracy and performance. Nonetheless, large data computations with large memory footprints will always be a bottleneck for deep learning applications. One of the most computationally demanding DNN operations is matrix multiplication, such as the convolution layer and fully connected layer, which preserve the image arrangement and obtain a partial image as an input feature. Our goal is to find an effective method for programmers to improve the performance of such matrix multiplication layers. Halide is an image processing programming language that separates the algorithm from its schedule. With the use of Halide, one can easily enhance the performance of their code with built-in scheduling primitives. In this paper, we propose sparse matrix compression schedule primitives with different compression schemes in Halide and find a method to improve convolution with the im2col method. With this design, we can compress the matrix to enhance the performance of convolution. We can also optimize natural language processing (NLP) with proposed compression scheduling. The word embedding training model can convert words into multidimensional vectors and transform words that do not have meaning into vectors with meaning. We focus on the word representation application in FastText, in which general matrix-vector multiplication (GEMV) is one of the most computationally intensive operations. We refine the software architecture of FastText and preprocess the pretrained model ahead of time. Our experiments show that the convolution and GEMV performance can be enhanced by the proposed design.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 7

Fast and Robust Compression of Deep Convolutional Neural Networks

CMD: controllable matrix decomposition with global optimization for deep neural network compression

Article 06 January 2022

Layer-Wise Training to Create Efficient Convolutional Neural Networks

References

Ragan-Kelley, J., Adams, A., Paris, S., Levoy, M., Amarasinghe, S., & Durand, F. (2012). Decoupling algorithms from schedules for easy optimization of image processing pipelines.
Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., & Amarasinghe, S. (2013). Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ACM SIGPLAN Notices, 48(6), 519–530.
Article Google Scholar
Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2016). Bag of tricks for efficient text classification. Preprint retrieved from http://arxiv.org/abs/1607.01759
Mikolov, T., Chen, K., Corrado, G. S., & Dean, J. (2013). Efficient estimation of word representations in vector space. CoRR. Preprint retrieved from http://arxiv.org/abs/1301.3781
Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532–1543).
Ragan-Kelley, J., et al. (2020). Halide: a language for fast, portable computation on images and tensors.
Chen, T., Moreau, T., Jiang, Z., Zheng, L., Yan, E., Shen, H., Cowan, M., Wang, L., Hu, Y., Ceze, L., et al. (2018). TVM: an automated end-to-end optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) (pp. 578–594).
Hsu, C. -C., Lin, C. -Y., Chen, S. K., Liu, C. -W., & Lee, J. -K. (2014). Optimized memory access support for data layout conversion on heterogeneous multi-core systems. In 2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia) (pp. 128–137). IEEE.
Chang, R.-G., Chuang, T.-R., & Lee, J. K. (2001). Parallel sparse supports for array intrinsic functions of Fortran 90. The Journal of Supercomputing, 18(3), 305–339.
Article MATH Google Scholar
Chang, R. -G., Li, J. -S., Lee, J. K., & Chuang, T. -R. (2001). Probabilistic inference schemes for sparsity structures of fortran 90 array intrinsics. In International Conference on Parallel Processing, 2001 (pp. 61–68). IEEE.
Tinney, W. F., & Walker, J. W. (1967). Direct solutions of sparse network equations by optimally ordered triangular factorization. Proceedings of the IEEE, 55(11), 1801–1809.
Article Google Scholar
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146.
Article Google Scholar
Chang, R. -G., Chuang, T. -R., & Lee, J. K. (2004). Support and optimization for parallel sparse programs with array intrinsics of Fortran 90. Parallel Computing, 30(4), 527–550.
Article Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. Preprint retrieved from http://arxiv.org/abs/1408.5093
Davis, T., Hager, W., & Duff, I. (2014). SuiteSparse. https://people.engr.tamu.edu/davis/suitesparse.html
Group, K. (2019). Neural Network Exchange Format (NNEF). Retrieved April 30, 2019, from https://www.khronos.org/nnef
Lee, C. -L., Hsu, M. -Y., Lu, B. -S., Lee, J. -K. (2018). Enable the flow for GPGPU-Sim simulators with fixed-point instructions. In Proceedings of the 47th International Conference on Parallel Processing Companion (p. 12). ACM.
Guennebaud, G., Jacob, B., et al. (2010). Eigen v3. http://eigen.tuxfamily.org
Maas, A., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011). Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (pp. 142–150).
Liao, H. -H., Lee, C. -L., Lee, J. -K., Lai, W. -C., Hung, M. -Y., & Huang, C. -W. (2021). Support convolution of CNN with compression sparse matrix multiplication flow in TVM. In 50th International Conference on Parallel Processing Workshop (pp. 1–7).
Liao, H. -H. (2021). Support sparse convolutions with compression scheme selections in TVM. In: Master Thesis, Department of Computer Science, National Tsing Hua University, Taiwan. https://hdl.handle.net/11296/8746zu

Download references

Acknowledgements

This work was supported in part by MediaTek and NSTC Taiwan.

Author information

Authors and Affiliations

Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan
Chao-Lin Lee, Chen-Ting Chao, Wei-Hsu Chu & Jenq-Kuen Lee
MediaTek Inc., Hsinchu, Taiwan
Ming-Yu Hung

Authors

Chao-Lin Lee
View author publications
You can also search for this author inPubMed Google Scholar
Chen-Ting Chao
View author publications
You can also search for this author inPubMed Google Scholar
Wei-Hsu Chu
View author publications
You can also search for this author inPubMed Google Scholar
Ming-Yu Hung
View author publications
You can also search for this author inPubMed Google Scholar
Jenq-Kuen Lee
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Chao-Lin Lee.

Ethics declarations

Conflicts of Interest

The authors have no relevant financial or nonfinancial interests to disclose. The authors have no competing interests to declare that they are relevant to the content of this article. All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or nonfinancial interest in the subject matter or materials discussed in this manuscript. The authors have no financial or proprietary interests in any material discussed in this article.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lee, CL., Chao, CT., Chu, WH. et al. Accelerating AI Applications with Sparse Matrix Compression in Halide. J Sign Process Syst 95, 609–622 (2023). https://doi.org/10.1007/s11265-022-01821-z

Download citation

Received: 10 May 2022
Revised: 25 September 2022
Accepted: 18 October 2022
Published: 03 November 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s11265-022-01821-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerating AI Applications with Sparse Matrix Compression in Halide

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Fast and Robust Compression of Deep Convolutional Neural Networks

CMD: controllable matrix decomposition with global optimization for deep neural network compression

Layer-Wise Training to Create Efficient Convolutional Neural Networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of Interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now