skip to main content
research-article

cuFasterTucker: A Stochastic Optimization Strategy for Parallel Sparse FastTucker Decomposition on GPU Platform

Published: 08 June 2024 Publication History

Abstract

The amount of scientific data is currently growing at an unprecedented pace, with tensors being a common form of data that display high-order, high-dimensional, and sparse features. While tensor-based analysis methods are effective, the vast increase in data size has made processing the original tensor infeasible. Tensor decomposition offers a solution by decomposing the tensor into multiple low-rank matrices or tensors that can be efficiently utilized by tensor-based analysis methods. One such algorithm is the Tucker decomposition, which decomposes an N-order tensor into N low-rank factor matrices and a low-rank core tensor. However, many Tucker decomposition techniques generate large intermediate variables and require significant computational resources, rendering them inadequate for processing high-order and high-dimensional tensors. This article introduces FasterTucker decomposition, a novel approach to tensor decomposition that builds on the FastTucker decomposition, a variant of the Tucker decomposition. We propose an efficient parallel FasterTucker decomposition algorithm, called cuFasterTucker, designed to run on a GPU platform. Our algorithm has low storage and computational requirements and provides an effective solution for high-order and high-dimensional sparse tensor decomposition. Compared to state-of-the-art algorithms, our approach achieves a speedup of approximately 7 to 23 times.

References

[1]
Tamara G. Kolda and Brett W. Bader. 2009. Tensor decompositions and applications. SIAM Review 51, 3 (2009), 455–500.
[2]
Salman Ahmadi-Asl, Stanislav Abukhovich, Maame G. Asante-Mensah, Andrzej Cichocki, Anh Huy Phan, Tohishisa Tanaka, and Ivan Oseledets. 2021. Randomized algorithms for computation of Tucker decomposition and higher order SVD (HOSVD). IEEE Access 9 (2021), 28684–28706.
[3]
Zhengyu Chen, Ziqing Xu, and Donglin Wang. 2021. Deep transfer tensor decomposition with orthogonal constraint for recommender systems. In Proceedings of the AAAI Conference on Artificial Intelligence. 4010–4018.
[4]
Sofia Fernandes, Hadi Fanaee-T., and João Gama. 2021. Tensor decomposition for analysing time-evolving social networks: An overview. Artificial Intelligence Review 54, 4 (2021), 2891–2916.
[5]
Pedro Díez, Sergio Zlotnik, Alberto García-González, and Antonio Huerta. 2018. Algebraic PGD for tensor separation and compression: an algorithmic approach. Comptes Rendus Mécanique 346, 7 (2018), 501–514.
[6]
Feng Huang, Xiang Yue, Zhankun Xiong, Zhouxin Yu, Shichao Liu, and Wen Zhang. 2021. Tensor decomposition with relational constraints for predicting multiple types of microRNA-disease associations. Briefings in Bioinformatics 22, 3 (2021), bbaa140.
[7]
Anh-Huy Phan, Petr Tichavskỳ, and Andrzej Cichocki. 2018. Error preserving correction: A method for CP decomposition at a target error bound. IEEE Transactions on Signal Processing 67, 5 (2018), 1175–1190.
[8]
Youjin Shin and Simon S. Woo. 2022. PasswordTensor: Analyzing and explaining password strength using tensor decomposition. Computers and Security 116 (2022), 102634.
[9]
Hongyang Chen, Fauzia Ahmad, Sergiy Vorobyov, and Fatih Porikli. 2021. Tensor decompositions in wireless communications and MIMO radar. IEEE Journal of Selected Topics in Signal Processing 15, 3 (2021), 438–453.
[10]
Miao Yin, Yang Sui, Siyu Liao, and Bo Yuan. 2021. Towards efficient tensor decomposition-based dnn model compression with optimization framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10674–10683.
[11]
Rohit Kumar Kaliyar, Anurag Goswami, and Pratik Narang. 2021. DeepFakE: Improving fake news detection using tensor decomposition-based deep neural network. The Journal of Supercomputing 77, 2 (2021), 1015–1037.
[12]
Eugene E. Tyrtyshnikov. 2020. Tensor decompositions and rank increment conjecture. Russian Journal of Numerical Analysis and Mathematical Modelling 35, 4 (2020), 239–246.
[13]
L. R. Tucker. 1964. The extension of factor analysis to three-dimensional matrices. In Contributions to Mathematical Psychology, H. Gulliksen and N. Frederiksen (Eds.). Holt, Rinehardt, & Winston, New York, (1964), 110–127.
[14]
Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. 2000. A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications 21, 4 (2000), 1253–1278.
[15]
Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. 2000. On the best rank-1 and rank-(r 1, r 2,..., r n) approximation of higher-order tensors. SIAM Journal on Matrix Analysis and Applications 21, 4 (2000), 1324–1342.
[16]
U. Kang, Evangelos Papalexakis, Abhay Harpale, and Christos Faloutsos. 2012. Gigatensor: Scaling tensor analysis up by 100 times-algorithms and discoveries. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 316–324.
[17]
Martin Haardt, Florian Roemer, and Giovanni Del Galdo. 2008. Higher-order SVD-based subspace estimation to improve the parameter estimation accuracy in multidimensional harmonic retrieval problems. IEEE Transactions on Signal Processing 56, 7 (2008), 3198–3213.
[18]
Emilio Rafael Balda, Sher Ali Cheema, Jens Steinwandt, Martin Haardt, Amir Weiss, and Arie Yeredor. 2016. First-order perturbation analysis of low-rank tensor approximations based on the truncated HOSVD. In Proceedings of the 2016 50th Asilomar Conference on Signals, Systems and Computers. IEEE, 1723–1727.
[19]
Nick Vannieuwenhoven, Raf Vandebril, and Karl Meerbergen. 2012. A new truncation strategy for the higher-order singular value decomposition. SIAM Journal on Scientific Computing 34, 2 (2012), A1027–A1052.
[20]
Lars Grasedyck. 2010. Hierarchical singular value decomposition of tensors. SIAM Journal on Matrix Analysis and Applications 31, 4 (2010), 2029–2054.
[21]
Jiajia Li, Yuchen Ma, and Richard Vuduc. 2018. ParTI!: A Parallel Tensor Infrastructure for Multicore CPUs and GPUs. (2018). Retrieved from http://parti-project.org
[22]
Venkatesan T. Chakaravarthy, Shivmaran S. Pandian, Saurabh Raje, and Yogish Sabharwal. 2019. On optimizing distributed non-negative tucker decomposition. In Proceedings of the ACM International Conference on Supercomputing (ICS’19). Association for Computing Machinery, New York, NY, USA, 238–249. DOI:
[23]
Vassilis N. Ioannidis, Ahmed S. Zamzam, Georgios B. Giannakis, and Nicholas D. Sidiropoulos. 2019. Coupled graphs and tensor factorization for recommender systems and community detection. IEEE Transactions on Knowledge and Data Engineering 33, 3 (2019), 909–920.
[24]
Tian Cheng, Junhao Wen, Qingyu Xiong, Jun Zeng, Wei Zhou, and Xueyuan Cai. 2019. Personalized Web service recommendation based on QoS prediction and hierarchical tensor decomposition. IEEE Access 7 (2019), 62221–62230.
[25]
Puming Wang, Laurence T. Yang, Gongwei Qian, Jintao Li, and Zheng Yan. 2019. HO-OTSVD: A novel tensor decomposition and its incremental decomposition for cyber–physical–social networks (CPSN). IEEE Transactions on Network Science and Engineering 7, 2 (2019), 713–725.
[26]
Yong Luo, Dacheng Tao, Kotagiri Ramamohanarao, Chao Xu, and Yonggang Wen. 2015. Tensor canonical correlation analysis for multi-view dimension reduction. IEEE Transactions on Knowledge and Data Engineering 27, 11 (2015), 3111–3124.
[27]
Xinwang Liu, Xinzhong Zhu, Miaomiao Li, Lei Wang, En Zhu, Tongliang Liu, Marius Kloft, Dinggang Shen, Jianping Yin, and Wen Gao. 2019. Multiple kernel k k-means with incomplete kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 5 (2019), 1191–1204.
[28]
Yongseok Soh, Patrick Flick, Xing Liu, Shaden Smith, Fabio Checconi, Fabrizio Petrini, and Jee Choi. 2021. High performance streaming tensor decomposition. In Proceedings of the 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 683–692.
[29]
Niranjay Ravindran, Nicholas D. Sidiropoulos, Shaden Smith, and George Karypis. 2014. Memory-efficient parallel computation of tensor and matrix products for big tensor decomposition. In Proceedings of the 2014 48th Asilomar Conference on Signals, Systems and Computers. IEEE, 581–585.
[30]
Joon Hee Choi and S. Vishwanathan. 2014. DFacTo: Distributed factorization of tensors. Advances in Neural Information Processing Systems 27 (2014), 1296–1304. https://proceedings.neurips.cc/paper_files/paper/2014/file/d5cfead94f5350c12c322b5b664544c1-Paper.pdf
[31]
Shaden Smith, Niranjay Ravindran, Nicholas D. Sidiropoulos, and George Karypis. 2015. SPLATT: Efficient and parallel sparse tensor-matrix multiplication. In Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium. IEEE, 61–70.
[32]
Shaden Smith, Jongsoo Park, and George Karypis. 2016. An exploration of optimization algorithms for high performance tensor completion. In SC’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 359–371.
[33]
Shaden Smith, Jongsoo Park, and George Karypis. 2018. HPC formulations of optimization algorithms for tensor completion. Parallel Computing 74 (2018), 99–117.
[34]
Wei Tan, Liangliang Cao, and Liana Fong. 2016. Faster and cheaper: Parallelizing large-scale matrix factorization on GPUs. In Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing. 219–230.
[35]
Xiaolong Xie, Wei Tan, Liana L. Fong, and Yun Liang. 2017. CuMF_SGD: Parallelized stochastic gradient descent for matrix factorization on GPUs. In Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing. 79–92.
[36]
Israt Nisa, Aravind Sukumaran-Rajam, Rakshith Kunchum, and P. Sadayappan. 2017. Parallel ccd++ on gpu for matrix factorization. In Proceedings of the General Purpose GPUs. 73–83.
[37]
Sejoon Oh, Namyong Park, Sael Lee, and Uksong Kang. 2018. Scalable tucker factorization for sparse tensors-algorithms and discoveries. In Proceedings of the 2018 IEEE 34th International Conference on Data Engineering (ICDE). IEEE, 1120–1131.
[38]
Moonjeong Park, Jun-Gi Jang, and Lee Sael. 2021. VeST: Very sparse tucker factorization of large-scale tensors. In Proceedings of the 2021 IEEE International Conference on Big Data and Smart Computing (BigComp). IEEE, 172–179.
[39]
Sejoon Oh, Namyong Park, Jun-Gi Jang, Lee Sael, and U. Kang. 2019. High-performance tucker factorization on heterogeneous platforms. IEEE Transactions on Parallel and Distributed Systems 30, 10 (2019), 2237–2248.
[40]
Hao Li, Zixuan Li, Kenli Li, Jan S. Rellermeyer, Lydia Chen, and Keqin Li. 2020. SGD__Tucker: A novel stochastic optimization strategy for parallel sparse tucker decomposition. IEEE Transactions on Parallel and Distributed Systems 32, 7 (2020), 1828–1841.
[41]
Zixuan Li. 2022. cu_FastTucker: A faster and stabler stochastic optimization for parallel sparse tucker decomposition on multi-GPUs. arXiv:2204.07104. Retrieved from https://arxiv.org/abs/2204.07104
[42]
Israt Nisa, Jiajia Li, Aravind Sukumaran-Rajam, Richard Vuduc, and Ponnuswamy Sadayappan. 2019. Load-balanced sparse mttkrp on gpus. In Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 123–133.
[43]
Jianmo Ni, Jiacheng Li, and Julian McAuley. 2019. Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 188–197.
[44]
Mengting Wan and Julian J. McAuley. 2018. Item recommendation on monotonic behavior chains. In Proceedings of the 12th ACM Conference on Recommender Systems, RecSys 2018, Vancouver, BC, Canada, October 2-7, 2018. Sole Pera, Michael D. Ekstrand, Xavier Amatriain, and John O’Donovan (Eds.), ACM, 86–94. DOI:
[45]
Mengting Wan, Rishabh Misra, Ndapa Nakashole, and Julian J. McAuley. 2019. Fine-grained spoiler detection from large-scale review corpora. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers. Anna Korhonen, David R. Traum, and Lluís Màrquez (Eds.), Association for Computational Linguistics, 2605–2610. DOI:
[46]
Hyokun Yun, Hsiang-Fu Yu, Cho-Jui Hsieh, S. V. N. Vishwanathan, and Inderjit Dhillon. 2014. NOMAD: non-locking, stochastic multi-machine algorithm for asynchronous and decentralized matrix completion. Proc. VLDB Endow. 7, 11 (July 2014), 975–986. DOI:

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Parallel Computing
ACM Transactions on Parallel Computing  Volume 11, Issue 2
June 2024
164 pages
EISSN:2329-4957
DOI:10.1145/3613599
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2024
Online AM: 16 February 2024
Accepted: 06 February 2024
Revised: 12 August 2023
Received: 31 March 2023
Published in TOPC Volume 11, Issue 2

Check for updates

Author Tags

  1. GPU CUDA parallelization
  2. Kruskal approximation
  3. sparse tensor decomposition
  4. stochastic strategy
  5. tensor computation

Qualifiers

  • Research-article

Funding Sources

  • National Key R&D Program of China
  • Key Program of National Natural Science Foundation of China
  • National Natural Science Foundation of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 154
    Total Downloads
  • Downloads (Last 12 months)131
  • Downloads (Last 6 weeks)16
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media