Abstract
Machine learning profoundly impacts every aspect of our lives. As machine learning evolves, many techniques, such as deep learning, are improving its accuracy and performance. Nonetheless, large data computations with large memory footprints will always be a bottleneck for deep learning applications. One of the most computationally demanding DNN operations is matrix multiplication, such as the convolution layer and fully connected layer, which preserve the image arrangement and obtain a partial image as an input feature. Our goal is to find an effective method for programmers to improve the performance of such matrix multiplication layers. Halide is an image processing programming language that separates the algorithm from its schedule. With the use of Halide, one can easily enhance the performance of their code with built-in scheduling primitives. In this paper, we propose sparse matrix compression schedule primitives with different compression schemes in Halide and find a method to improve convolution with the im2col method. With this design, we can compress the matrix to enhance the performance of convolution. We can also optimize natural language processing (NLP) with proposed compression scheduling. The word embedding training model can convert words into multidimensional vectors and transform words that do not have meaning into vectors with meaning. We focus on the word representation application in FastText, in which general matrix-vector multiplication (GEMV) is one of the most computationally intensive operations. We refine the software architecture of FastText and preprocess the pretrained model ahead of time. Our experiments show that the convolution and GEMV performance can be enhanced by the proposed design.
Similar content being viewed by others
References
Ragan-Kelley, J., Adams, A., Paris, S., Levoy, M., Amarasinghe, S., & Durand, F. (2012). Decoupling algorithms from schedules for easy optimization of image processing pipelines.
Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., & Amarasinghe, S. (2013). Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ACM SIGPLAN Notices, 48(6), 519–530.
Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2016). Bag of tricks for efficient text classification. Preprint retrieved from http://arxiv.org/abs/1607.01759
Mikolov, T., Chen, K., Corrado, G. S., & Dean, J. (2013). Efficient estimation of word representations in vector space. CoRR. Preprint retrieved from http://arxiv.org/abs/1301.3781
Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532–1543).
Ragan-Kelley, J., et al. (2020). Halide: a language for fast, portable computation on images and tensors.
Chen, T., Moreau, T., Jiang, Z., Zheng, L., Yan, E., Shen, H., Cowan, M., Wang, L., Hu, Y., Ceze, L., et al. (2018). TVM: an automated end-to-end optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) (pp. 578–594).
Hsu, C. -C., Lin, C. -Y., Chen, S. K., Liu, C. -W., & Lee, J. -K. (2014). Optimized memory access support for data layout conversion on heterogeneous multi-core systems. In 2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia) (pp. 128–137). IEEE.
Chang, R.-G., Chuang, T.-R., & Lee, J. K. (2001). Parallel sparse supports for array intrinsic functions of Fortran 90. The Journal of Supercomputing, 18(3), 305–339.
Chang, R. -G., Li, J. -S., Lee, J. K., & Chuang, T. -R. (2001). Probabilistic inference schemes for sparsity structures of fortran 90 array intrinsics. In International Conference on Parallel Processing, 2001 (pp. 61–68). IEEE.
Tinney, W. F., & Walker, J. W. (1967). Direct solutions of sparse network equations by optimally ordered triangular factorization. Proceedings of the IEEE, 55(11), 1801–1809.
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146.
Chang, R. -G., Chuang, T. -R., & Lee, J. K. (2004). Support and optimization for parallel sparse programs with array intrinsics of Fortran 90. Parallel Computing, 30(4), 527–550.
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. Preprint retrieved from http://arxiv.org/abs/1408.5093
Davis, T., Hager, W., & Duff, I. (2014). SuiteSparse. https://people.engr.tamu.edu/davis/suitesparse.html
Group, K. (2019). Neural Network Exchange Format (NNEF). Retrieved April 30, 2019, from https://www.khronos.org/nnef
Lee, C. -L., Hsu, M. -Y., Lu, B. -S., Lee, J. -K. (2018). Enable the flow for GPGPU-Sim simulators with fixed-point instructions. In Proceedings of the 47th International Conference on Parallel Processing Companion (p. 12). ACM.
Guennebaud, G., Jacob, B., et al. (2010). Eigen v3. http://eigen.tuxfamily.org
Maas, A., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011). Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (pp. 142–150).
Liao, H. -H., Lee, C. -L., Lee, J. -K., Lai, W. -C., Hung, M. -Y., & Huang, C. -W. (2021). Support convolution of CNN with compression sparse matrix multiplication flow in TVM. In 50th International Conference on Parallel Processing Workshop (pp. 1–7).
Liao, H. -H. (2021). Support sparse convolutions with compression scheme selections in TVM. In: Master Thesis, Department of Computer Science, National Tsing Hua University, Taiwan. https://hdl.handle.net/11296/8746zu
Acknowledgements
This work was supported in part by MediaTek and NSTC Taiwan.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of Interest
The authors have no relevant financial or nonfinancial interests to disclose. The authors have no competing interests to declare that they are relevant to the content of this article. All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or nonfinancial interest in the subject matter or materials discussed in this manuscript. The authors have no financial or proprietary interests in any material discussed in this article.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lee, CL., Chao, CT., Chu, WH. et al. Accelerating AI Applications with Sparse Matrix Compression in Halide. J Sign Process Syst 95, 609–622 (2023). https://doi.org/10.1007/s11265-022-01821-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-022-01821-z