Skip to main content

Accelerating AI Applications with Sparse Matrix Compression in Halide

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Machine learning profoundly impacts every aspect of our lives. As machine learning evolves, many techniques, such as deep learning, are improving its accuracy and performance. Nonetheless, large data computations with large memory footprints will always be a bottleneck for deep learning applications. One of the most computationally demanding DNN operations is matrix multiplication, such as the convolution layer and fully connected layer, which preserve the image arrangement and obtain a partial image as an input feature. Our goal is to find an effective method for programmers to improve the performance of such matrix multiplication layers. Halide is an image processing programming language that separates the algorithm from its schedule. With the use of Halide, one can easily enhance the performance of their code with built-in scheduling primitives. In this paper, we propose sparse matrix compression schedule primitives with different compression schemes in Halide and find a method to improve convolution with the im2col method. With this design, we can compress the matrix to enhance the performance of convolution. We can also optimize natural language processing (NLP) with proposed compression scheduling. The word embedding training model can convert words into multidimensional vectors and transform words that do not have meaning into vectors with meaning. We focus on the word representation application in FastText, in which general matrix-vector multiplication (GEMV) is one of the most computationally intensive operations. We refine the software architecture of FastText and preprocess the pretrained model ahead of time. Our experiments show that the convolution and GEMV performance can be enhanced by the proposed design.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14

Similar content being viewed by others

References

  1. Ragan-Kelley, J., Adams, A., Paris, S., Levoy, M., Amarasinghe, S., & Durand, F. (2012). Decoupling algorithms from schedules for easy optimization of image processing pipelines.

  2. Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., & Amarasinghe, S. (2013). Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ACM SIGPLAN Notices, 48(6), 519–530.

    Article  Google Scholar 

  3. Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2016). Bag of tricks for efficient text classification. Preprint retrieved from http://arxiv.org/abs/1607.01759

  4. Mikolov, T., Chen, K., Corrado, G. S., & Dean, J. (2013). Efficient estimation of word representations in vector space. CoRR. Preprint retrieved from http://arxiv.org/abs/1301.3781

  5. Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532–1543).

  6. Ragan-Kelley, J., et al. (2020). Halide: a language for fast, portable computation on images and tensors.

  7. Chen, T., Moreau, T., Jiang, Z., Zheng, L., Yan, E., Shen, H., Cowan, M., Wang, L., Hu, Y., Ceze, L., et al. (2018). TVM: an automated end-to-end optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) (pp. 578–594).

  8. Hsu, C. -C., Lin, C. -Y., Chen, S. K., Liu, C. -W., & Lee, J. -K. (2014). Optimized memory access support for data layout conversion on heterogeneous multi-core systems. In 2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia) (pp. 128–137). IEEE.

  9. Chang, R.-G., Chuang, T.-R., & Lee, J. K. (2001). Parallel sparse supports for array intrinsic functions of Fortran 90. The Journal of Supercomputing, 18(3), 305–339.

    Article  MATH  Google Scholar 

  10. Chang, R. -G., Li, J. -S., Lee, J. K., & Chuang, T. -R. (2001). Probabilistic inference schemes for sparsity structures of fortran 90 array intrinsics. In International Conference on Parallel Processing, 2001 (pp. 61–68). IEEE.

  11. Tinney, W. F., & Walker, J. W. (1967). Direct solutions of sparse network equations by optimally ordered triangular factorization. Proceedings of the IEEE, 55(11), 1801–1809.

    Article  Google Scholar 

  12. Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146.

    Article  Google Scholar 

  13. Chang, R. -G., Chuang, T. -R., & Lee, J. K. (2004). Support and optimization for parallel sparse programs with array intrinsics of Fortran 90. Parallel Computing, 30(4), 527–550.

    Article  Google Scholar 

  14. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. Preprint retrieved from http://arxiv.org/abs/1408.5093

  15. Davis, T., Hager, W., & Duff, I. (2014). SuiteSparsehttps://people.engr.tamu.edu/davis/suitesparse.html

  16. Group, K. (2019). Neural Network Exchange Format (NNEF). Retrieved April 30, 2019, from https://www.khronos.org/nnef

  17. Lee, C. -L., Hsu, M. -Y., Lu, B. -S., Lee, J. -K. (2018). Enable the flow for GPGPU-Sim simulators with fixed-point instructions. In Proceedings of the 47th International Conference on Parallel Processing Companion (p. 12). ACM.

  18. Guennebaud, G., Jacob, B., et al. (2010). Eigen v3. http://eigen.tuxfamily.org

  19. Maas, A., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011). Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (pp. 142–150).

  20. Liao, H. -H., Lee, C. -L., Lee, J. -K., Lai, W. -C., Hung, M. -Y., & Huang, C. -W. (2021). Support convolution of CNN with compression sparse matrix multiplication flow in TVM. In 50th International Conference on Parallel Processing Workshop (pp. 1–7).

  21. Liao, H. -H. (2021). Support sparse convolutions with compression scheme selections in TVM. In: Master Thesis, Department of Computer Science, National Tsing Hua University, Taiwan. https://hdl.handle.net/11296/8746zu

Download references

Acknowledgements

This work was supported in part by MediaTek and NSTC Taiwan.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chao-Lin Lee.

Ethics declarations

Conflicts of Interest

The authors have no relevant financial or nonfinancial interests to disclose. The authors have no competing interests to declare that they are relevant to the content of this article. All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or nonfinancial interest in the subject matter or materials discussed in this manuscript. The authors have no financial or proprietary interests in any material discussed in this article.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, CL., Chao, CT., Chu, WH. et al. Accelerating AI Applications with Sparse Matrix Compression in Halide. J Sign Process Syst 95, 609–622 (2023). https://doi.org/10.1007/s11265-022-01821-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-022-01821-z

Keywords