research-article

cuFasterTucker: A Stochastic Optimization Strategy for Parallel Sparse FastTucker Decomposition on GPU Platform

Authors:

Kenli LiAuthors Info & Claims

ACM Transactions on Parallel Computing, Volume 11, Issue 2

Article No.: 8, Pages 1 - 33

https://doi.org/10.1145/3648094

Published: 08 June 2024 Publication History

Abstract

The amount of scientific data is currently growing at an unprecedented pace, with tensors being a common form of data that display high-order, high-dimensional, and sparse features. While tensor-based analysis methods are effective, the vast increase in data size has made processing the original tensor infeasible. Tensor decomposition offers a solution by decomposing the tensor into multiple low-rank matrices or tensors that can be efficiently utilized by tensor-based analysis methods. One such algorithm is the Tucker decomposition, which decomposes an N-order tensor into N low-rank factor matrices and a low-rank core tensor. However, many Tucker decomposition techniques generate large intermediate variables and require significant computational resources, rendering them inadequate for processing high-order and high-dimensional tensors. This article introduces FasterTucker decomposition, a novel approach to tensor decomposition that builds on the FastTucker decomposition, a variant of the Tucker decomposition. We propose an efficient parallel FasterTucker decomposition algorithm, called cuFasterTucker, designed to run on a GPU platform. Our algorithm has low storage and computational requirements and provides an effective solution for high-order and high-dimensional sparse tensor decomposition. Compared to state-of-the-art algorithms, our approach achieves a speedup of approximately 7 to 23 times.

References

[1]

Tamara G. Kolda and Brett W. Bader. 2009. Tensor decompositions and applications. SIAM Review 51, 3 (2009), 455–500.

Digital Library

[2]

Salman Ahmadi-Asl, Stanislav Abukhovich, Maame G. Asante-Mensah, Andrzej Cichocki, Anh Huy Phan, Tohishisa Tanaka, and Ivan Oseledets. 2021. Randomized algorithms for computation of Tucker decomposition and higher order SVD (HOSVD). IEEE Access 9 (2021), 28684–28706.

[3]

Zhengyu Chen, Ziqing Xu, and Donglin Wang. 2021. Deep transfer tensor decomposition with orthogonal constraint for recommender systems. In Proceedings of the AAAI Conference on Artificial Intelligence. 4010–4018.

[4]

Sofia Fernandes, Hadi Fanaee-T., and João Gama. 2021. Tensor decomposition for analysing time-evolving social networks: An overview. Artificial Intelligence Review 54, 4 (2021), 2891–2916.

Digital Library

[5]

Pedro Díez, Sergio Zlotnik, Alberto García-González, and Antonio Huerta. 2018. Algebraic PGD for tensor separation and compression: an algorithmic approach. Comptes Rendus Mécanique 346, 7 (2018), 501–514.

[6]

Feng Huang, Xiang Yue, Zhankun Xiong, Zhouxin Yu, Shichao Liu, and Wen Zhang. 2021. Tensor decomposition with relational constraints for predicting multiple types of microRNA-disease associations. Briefings in Bioinformatics 22, 3 (2021), bbaa140.

[7]

Anh-Huy Phan, Petr Tichavskỳ, and Andrzej Cichocki. 2018. Error preserving correction: A method for CP decomposition at a target error bound. IEEE Transactions on Signal Processing 67, 5 (2018), 1175–1190.

Digital Library

[8]

Youjin Shin and Simon S. Woo. 2022. PasswordTensor: Analyzing and explaining password strength using tensor decomposition. Computers and Security 116 (2022), 102634.

Digital Library

[9]

Hongyang Chen, Fauzia Ahmad, Sergiy Vorobyov, and Fatih Porikli. 2021. Tensor decompositions in wireless communications and MIMO radar. IEEE Journal of Selected Topics in Signal Processing 15, 3 (2021), 438–453.

[10]

Miao Yin, Yang Sui, Siyu Liao, and Bo Yuan. 2021. Towards efficient tensor decomposition-based dnn model compression with optimization framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10674–10683.

[11]

Rohit Kumar Kaliyar, Anurag Goswami, and Pratik Narang. 2021. DeepFakE: Improving fake news detection using tensor decomposition-based deep neural network. The Journal of Supercomputing 77, 2 (2021), 1015–1037.

[12]

Eugene E. Tyrtyshnikov. 2020. Tensor decompositions and rank increment conjecture. Russian Journal of Numerical Analysis and Mathematical Modelling 35, 4 (2020), 239–246.

[13]

L. R. Tucker. 1964. The extension of factor analysis to three-dimensional matrices. In Contributions to Mathematical Psychology, H. Gulliksen and N. Frederiksen (Eds.). Holt, Rinehardt, & Winston, New York, (1964), 110–127.

[14]

Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. 2000. A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications 21, 4 (2000), 1253–1278.

Digital Library

[15]

Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. 2000. On the best rank-1 and rank-(r 1, r 2,..., r n) approximation of higher-order tensors. SIAM Journal on Matrix Analysis and Applications 21, 4 (2000), 1324–1342.

Digital Library

[16]

U. Kang, Evangelos Papalexakis, Abhay Harpale, and Christos Faloutsos. 2012. Gigatensor: Scaling tensor analysis up by 100 times-algorithms and discoveries. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 316–324.

Digital Library

[17]

Martin Haardt, Florian Roemer, and Giovanni Del Galdo. 2008. Higher-order SVD-based subspace estimation to improve the parameter estimation accuracy in multidimensional harmonic retrieval problems. IEEE Transactions on Signal Processing 56, 7 (2008), 3198–3213.

Digital Library

[18]

Emilio Rafael Balda, Sher Ali Cheema, Jens Steinwandt, Martin Haardt, Amir Weiss, and Arie Yeredor. 2016. First-order perturbation analysis of low-rank tensor approximations based on the truncated HOSVD. In Proceedings of the 2016 50th Asilomar Conference on Signals, Systems and Computers. IEEE, 1723–1727.

[19]

Nick Vannieuwenhoven, Raf Vandebril, and Karl Meerbergen. 2012. A new truncation strategy for the higher-order singular value decomposition. SIAM Journal on Scientific Computing 34, 2 (2012), A1027–A1052.

Digital Library

[20]

Lars Grasedyck. 2010. Hierarchical singular value decomposition of tensors. SIAM Journal on Matrix Analysis and Applications 31, 4 (2010), 2029–2054.

Digital Library

[21]

Jiajia Li, Yuchen Ma, and Richard Vuduc. 2018. ParTI!: A Parallel Tensor Infrastructure for Multicore CPUs and GPUs. (2018). Retrieved from http://parti-project.org

[22]

Venkatesan T. Chakaravarthy, Shivmaran S. Pandian, Saurabh Raje, and Yogish Sabharwal. 2019. On optimizing distributed non-negative tucker decomposition. In Proceedings of the ACM International Conference on Supercomputing (ICS’19). Association for Computing Machinery, New York, NY, USA, 238–249. DOI:

Digital Library

[23]

Vassilis N. Ioannidis, Ahmed S. Zamzam, Georgios B. Giannakis, and Nicholas D. Sidiropoulos. 2019. Coupled graphs and tensor factorization for recommender systems and community detection. IEEE Transactions on Knowledge and Data Engineering 33, 3 (2019), 909–920.

[24]

Tian Cheng, Junhao Wen, Qingyu Xiong, Jun Zeng, Wei Zhou, and Xueyuan Cai. 2019. Personalized Web service recommendation based on QoS prediction and hierarchical tensor decomposition. IEEE Access 7 (2019), 62221–62230.

[25]

Puming Wang, Laurence T. Yang, Gongwei Qian, Jintao Li, and Zheng Yan. 2019. HO-OTSVD: A novel tensor decomposition and its incremental decomposition for cyber–physical–social networks (CPSN). IEEE Transactions on Network Science and Engineering 7, 2 (2019), 713–725.

[26]

Yong Luo, Dacheng Tao, Kotagiri Ramamohanarao, Chao Xu, and Yonggang Wen. 2015. Tensor canonical correlation analysis for multi-view dimension reduction. IEEE Transactions on Knowledge and Data Engineering 27, 11 (2015), 3111–3124.

Digital Library

[27]

Xinwang Liu, Xinzhong Zhu, Miaomiao Li, Lei Wang, En Zhu, Tongliang Liu, Marius Kloft, Dinggang Shen, Jianping Yin, and Wen Gao. 2019. Multiple kernel k k-means with incomplete kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 5 (2019), 1191–1204.

[28]

Yongseok Soh, Patrick Flick, Xing Liu, Shaden Smith, Fabio Checconi, Fabrizio Petrini, and Jee Choi. 2021. High performance streaming tensor decomposition. In Proceedings of the 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 683–692.

[29]

Niranjay Ravindran, Nicholas D. Sidiropoulos, Shaden Smith, and George Karypis. 2014. Memory-efficient parallel computation of tensor and matrix products for big tensor decomposition. In Proceedings of the 2014 48th Asilomar Conference on Signals, Systems and Computers. IEEE, 581–585.

[30]

Joon Hee Choi and S. Vishwanathan. 2014. DFacTo: Distributed factorization of tensors. Advances in Neural Information Processing Systems 27 (2014), 1296–1304. https://proceedings.neurips.cc/paper_files/paper/2014/file/d5cfead94f5350c12c322b5b664544c1-Paper.pdf

[31]

Shaden Smith, Niranjay Ravindran, Nicholas D. Sidiropoulos, and George Karypis. 2015. SPLATT: Efficient and parallel sparse tensor-matrix multiplication. In Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium. IEEE, 61–70.

Digital Library

[32]

Shaden Smith, Jongsoo Park, and George Karypis. 2016. An exploration of optimization algorithms for high performance tensor completion. In SC’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 359–371.

[33]

Shaden Smith, Jongsoo Park, and George Karypis. 2018. HPC formulations of optimization algorithms for tensor completion. Parallel Computing 74 (2018), 99–117.

Digital Library

[34]

Wei Tan, Liangliang Cao, and Liana Fong. 2016. Faster and cheaper: Parallelizing large-scale matrix factorization on GPUs. In Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing. 219–230.

Digital Library

[35]

Xiaolong Xie, Wei Tan, Liana L. Fong, and Yun Liang. 2017. CuMF_SGD: Parallelized stochastic gradient descent for matrix factorization on GPUs. In Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing. 79–92.

Digital Library

[36]

Israt Nisa, Aravind Sukumaran-Rajam, Rakshith Kunchum, and P. Sadayappan. 2017. Parallel ccd++ on gpu for matrix factorization. In Proceedings of the General Purpose GPUs. 73–83.

Digital Library

[37]

Sejoon Oh, Namyong Park, Sael Lee, and Uksong Kang. 2018. Scalable tucker factorization for sparse tensors-algorithms and discoveries. In Proceedings of the 2018 IEEE 34th International Conference on Data Engineering (ICDE). IEEE, 1120–1131.

[38]

Moonjeong Park, Jun-Gi Jang, and Lee Sael. 2021. VeST: Very sparse tucker factorization of large-scale tensors. In Proceedings of the 2021 IEEE International Conference on Big Data and Smart Computing (BigComp). IEEE, 172–179.

[39]

Sejoon Oh, Namyong Park, Jun-Gi Jang, Lee Sael, and U. Kang. 2019. High-performance tucker factorization on heterogeneous platforms. IEEE Transactions on Parallel and Distributed Systems 30, 10 (2019), 2237–2248.

[40]

Hao Li, Zixuan Li, Kenli Li, Jan S. Rellermeyer, Lydia Chen, and Keqin Li. 2020. SGD__Tucker: A novel stochastic optimization strategy for parallel sparse tucker decomposition. IEEE Transactions on Parallel and Distributed Systems 32, 7 (2020), 1828–1841.

[41]

Zixuan Li. 2022. cu_FastTucker: A faster and stabler stochastic optimization for parallel sparse tucker decomposition on multi-GPUs. arXiv:2204.07104. Retrieved from https://arxiv.org/abs/2204.07104

[42]

Israt Nisa, Jiajia Li, Aravind Sukumaran-Rajam, Richard Vuduc, and Ponnuswamy Sadayappan. 2019. Load-balanced sparse mttkrp on gpus. In Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 123–133.

[43]

Jianmo Ni, Jiacheng Li, and Julian McAuley. 2019. Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 188–197.

[44]

Mengting Wan and Julian J. McAuley. 2018. Item recommendation on monotonic behavior chains. In Proceedings of the 12th ACM Conference on Recommender Systems, RecSys 2018, Vancouver, BC, Canada, October 2-7, 2018. Sole Pera, Michael D. Ekstrand, Xavier Amatriain, and John O’Donovan (Eds.), ACM, 86–94. DOI:

Digital Library

[45]

Mengting Wan, Rishabh Misra, Ndapa Nakashole, and Julian J. McAuley. 2019. Fine-grained spoiler detection from large-scale review corpora. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers. Anna Korhonen, David R. Traum, and Lluís Màrquez (Eds.), Association for Computational Linguistics, 2605–2610. DOI:

[46]

Hyokun Yun, Hsiang-Fu Yu, Cho-Jui Hsieh, S. V. N. Vishwanathan, and Inderjit Dhillon. 2014. NOMAD: non-locking, stochastic multi-machine algorithm for asynchronous and decentralized matrix completion. Proc. VLDB Endow. 7, 11 (July 2014), 975–986. DOI:

Digital Library

Index Terms

cuFasterTucker: A Stochastic Optimization Strategy for Parallel Sparse FastTucker Decomposition on GPU Platform
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Redundancy
  2. Embedded and cyber-physical systems
    1. Embedded systems
    2. Robotics
2. Networks
  1. Network properties
    1. Network reliability

Recommendations

cuFastTucker: A Novel Sparse FastTucker Decomposition For HHLST on Multi-GPUs
High-order, high-dimension, and large-scale sparse tensors (HHLST) have found their origin in various real industrial applications, such as social networks, recommender systems, bioinformatics, and traffic information. To handle these complex tensors, ...
Neural Additive Tensor Decomposition for Sparse Tensors
CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management

Given a sparse tensor, how can we accurately capture complex latent structures inherent in the tensor while maintaining the interpretability of those structures? Tensor decomposition is a fundamental technique for analyzing tensors. Classical tensor ...
An Inexact Penalty Decomposition Method for Sparse Optimization
The penalty decomposition method is an effective and versatile method for sparse optimization and has been successfully applied to solve compressed sensing, sparse logistic regression, sparse inverse covariance selection, low rank minimization, image ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Parallel Computing

ACM Transactions on Parallel Computing Volume 11, Issue 2

June 2024

164 pages

EISSN:2329-4957

DOI:10.1145/3613599

Editor:
David A. Bader
New Jersey Institute of Technology, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2024

Online AM: 16 February 2024

Accepted: 06 February 2024

Revised: 12 August 2023

Received: 31 March 2023

Published in TOPC Volume 11, Issue 2

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Key R&D Program of China
Key Program of National Natural Science Foundation of China
National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
154
Total Downloads

Downloads (Last 12 months)131
Downloads (Last 6 weeks)16

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents