Mining billion-scale tensors: algorithms and discoveries

Jeon, Inah; Papalexakis, Evangelos E.; Faloutsos, Christos; Sael, Lee; Kang, U.

doi:10.1007/s00778-016-0427-4

Mining billion-scale tensors: algorithms and discoveries

Regular Paper
Published: 15 March 2016

Volume 25, pages 519–544, (2016)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Inah Jeon¹,
Evangelos E. Papalexakis²,
Christos Faloutsos²,
Lee Sael³ &
…
U. Kang⁴

1089 Accesses
24 Citations
Explore all metrics

Abstract

How can we analyze large-scale real-world data with various attributes? Many real-world data (e.g., network traffic logs, web data, social networks, knowledge bases, and sensor streams) with multiple attributes are represented as multi-dimensional arrays, called tensors. For analyzing a tensor, tensor decompositions are widely used in many data mining applications: detecting malicious attackers in network traffic logs (with source IP, destination IP, port-number, timestamp), finding telemarketers in a phone call history (with sender, receiver, date), and identifying interesting concepts in a knowledge base (with subject, object, relation). However, current tensor decomposition methods do not scale to large and sparse real-world tensors with millions of rows and columns and ‘fibers.’ In this paper, we propose HaTen2, a distributed method for large-scale tensor decompositions that runs on the MapReduce framework. Our careful design and implementation of HaTen2 dramatically reduce the size of intermediate data and the number of jobs leading to achieve high scalability compared with the state-of-the-art method. Thanks to HaTen2, we analyze big real-world sparse tensors that cannot be handled by the current state of the art, and discover hidden concepts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effective Tensor-Based Data Clustering Through Sub-Tensor Impact Graphs

Canonical polyadic decomposition (CPD) of big tensors with low multilinear rank

Article 23 April 2020

Tensor Networks for Dimensionality Reduction, Big Data and Deep Learning

References

Carlson, A., Betteridge, J., Kisiel, B., Settles, B., E.R.H. Jr., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: AAAI (2010)
Kolda, T.G., Bader, B.W.: The tophits model for higher-order web link analysis. In: Workshop on Link Analysis, Counterterrorism and Security, Vol. 7, pp. 26–29 (2006)
Maruhashi, K., Guo, F., Faloutsos, C.: Multiaspectforensics: Pattern mining on large-scale heterogeneous networks with tensor analysis. In: Proceedings of the Third International Conference on Advances in Social Network Analysis and Mining (2011)
Sun, J., Papadimitriou, S., Yu, P.S.: Window-based tensor analysis on high-dimensional and multi-aspect streams. In: ICDM (2006)
Kolda, T.G., Sun, J.: Scalable tensor decompositions for multi-aspect data mining. In: ICDM, pp. 363–372 (2008)
Davidson, I.N., Gilpin, S., Carmichael, O.T., Walker, P.B.: Network discovery via constrained tensor analysis of fmri data. In: KDD, pp. 194–202, ACM, New York (2013)
Sun, J., Tao, D., Faloutsos, C.: Beyond streams and graphs: Dynamic tensor analysis. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’06, New York, NY, pp. 374–383. ACM, New York (2006)
Hadoop information. http://hadoop.apache.org/
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: OSDI’04, Dec (2004)
Jeon, I., Papalexakis, E.E., Kang, U., Faloutsos, C.: Haten2: Billion-scale tensor decompositions. In: 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, April 13–17, 2015, pp. 1047–1058 (2015)
Harshman, R.: Foundations of the parafac procedure: model and conditions for an explanatory multi-mode factor analysis. In: UCLA Working Papers in Phonetics, Vol. 16, pp. 1–84 (1970)
Tomasi, G., Bro, R.: A comparison of algorithms for fitting the parafac model. Comput. Stat. Data Anal. 50(7), 1700–1734 (2006)
Article MathSciNet MATH Google Scholar
Tucker, L.R.: Some mathematical notes on three-mode factor analysis. Psychometrika 31, 279–311 (1966c)
Article MathSciNet Google Scholar
Andersson, C.A., Bro, R.: Improving the speed of multi-way algorithms: Part I. Tucker3. Chemometr. Intell. Lab. Syst. 42, 93–103 (1998)
Article Google Scholar
Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: NIPS, pp. 556–562 (2000)
Chen, D., Plemmons, R.J.: Nonnegativity constraints in numerical analysis. In: Symposium on the Birth of Numerical Analysis (2007)
Kim, Y.D., Choi, S.: Nonnegative tucker decomposition. In: CVPR, IEEE Computer Society (2007)
Kang, U., Papalexakis, E.E., Harpale, A., Faloutsos, C.: Gigatensor: scaling tensor analysis up by 100 times—algorithms and discoveries. In: KDD, pp. 316–324 (2012)
Freebase dataset. https://www.freebase.com/
Darpa 1998 dataset. http://www.ll.mit.edu/mission/communications/cyber/CSTcorpora/ideval/data/1998data.html
Bader, B.W., Kolda, T.G., et al.: Matlab tensor toolbox version 2.5, January 2012
Acar, E., Aykut-Bingol, C., Bingol, H., Bro, R., Yener, B.: Multiway analysis of epilepsy tensors. Bioinformatics 23(13), i10–i18 (2007)
Article Google Scholar
Papalexakis, E.E., Faloutsos, C., Sidiropoulos, N.D.: Parcube: sparse parallelizable tensor decompositions. In: Machine Learning and Knowledge Discovery in Databases, pp. 521–536. Springer, Berlin (2012)
Papalexakis, E.E., Akoglu, L., Ienco, D.: Do more views of a graph help? community detection and clustering in multi-graphs. In: 16th International Conference on Information Fusion (FUSION), 2013, pp. 899–905, IEEE (2013)
Araujo, M., Papadimitriou, S., Günnemann, S., Faloutsos, C., Basu, P., Swami, A., Papalexakis, E.E., Koutra, D.: Com2: fast automatic discovery of temporal (comet) communities. In: Advances in Knowledge Discovery and Data Mining, pp. 271–283. Springer, Berlin (2014)
Kolda, T.G., Sun, J.: Scalable tensor decompositions for multi-aspect data mining. In: ICDM 2008: Proceedings of the 8th IEEE International Conference on Data Mining, pp. 363–372 (2008)
Chang, K.W., Yih, W.T., Meek, C.: Multi-relational latent semantic analysis. In: EMNLP, pp. 1602–1612 (2013)
De Lathauwer, L., De Moor, B., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21(4), 1253–1278 (2000)
Article MathSciNet MATH Google Scholar
Sun, J., Zeng, H., Liu, H., Lu, Y., Chen, Z.: Cubesvd: a novel approach to personalized web search. In: WWW (2005)
Vasilescu, M., Terzopoulos, D.: Multilinear analysis of image ensembles: tensorfaces. Comput. Vis. ECCV 2002, 447–460 (2002)
MATH Google Scholar
Luo, D., Huang, H., Ding, C.: Discriminative high order SVD: adaptive tensor subspace selection for image classification, clustering, and retrieval. In: ICCV (2011)
Bader, B.W., Kolda, T.G.: Efficient MATLAB computations with sparse and factored tensors. SIAM J. Sci. Comput. 30, 205–231 (2007)
Article MathSciNet MATH Google Scholar
Beutel, A., Talukdar, P.P., Kumar, A.,Faloutsos, C., Papalexakis, E.E., Xing, E.P.: Flexifact: scalable flexible factorization of coupled tensors on hadoop. In: SDM (2014)
Bro, R., Sidiropoulos, N., Giannakis, G.: A fast least squares algorithm for separating trilinear mixtures. In: International Workshop Independent Component and Blind Signal Separation Analytical, pp. 11–15 (1999)
Kim, M., Candan, K.S.: Decomposition-by-normalization (DBN): leveraging approximate functional dependencies for efficient tensor decomposition. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 355–364. ACM, New York (2012)
Erdös, D., Miettinen, P.: Scalable boolean tensor factorizations using random walks. In: CoRR, vol. abs/1310.4843 (2013)

Download references

Acknowledgments

This work was supported by the Basic Science Research Program through the National Research Foundation of Korea funded by the Ministry of Science, ICT and Future Planning (grants No. 2013R1A1A3005259 and No. 2013R1A1A1064409). The ICT at Seoul National University provides research facilities for this study.

Author information

Authors and Affiliations

LG Electronics, Seoul, Korea
Inah Jeon
Computer Science Department and iLab, CMU, Pittsburgh, PA, USA
Evangelos E. Papalexakis & Christos Faloutsos
Department of Computer Science, SUNY, Incheon, Korea
Lee Sael
Department of Computer Science and Engineering, Seoul National University, Seoul, Korea
U. Kang

Authors

Inah Jeon
View author publications
You can also search for this author in PubMed Google Scholar
Evangelos E. Papalexakis
View author publications
You can also search for this author in PubMed Google Scholar
Christos Faloutsos
View author publications
You can also search for this author in PubMed Google Scholar
Lee Sael
View author publications
You can also search for this author in PubMed Google Scholar
U. Kang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to U. Kang.

Ethics declarations

Conflict of interest

There are no potential conflicts of interests.

Human and animals rights statement

The research does not involve human participants and/or animals.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jeon, I., Papalexakis, E.E., Faloutsos, C. et al. Mining billion-scale tensors: algorithms and discoveries. The VLDB Journal 25, 519–544 (2016). https://doi.org/10.1007/s00778-016-0427-4

Download citation

Received: 01 March 2015
Revised: 30 January 2016
Accepted: 28 February 2016
Published: 15 March 2016
Issue Date: August 2016
DOI: https://doi.org/10.1007/s00778-016-0427-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining billion-scale tensors: algorithms and discoveries

Abstract

Access this article

Similar content being viewed by others

Effective Tensor-Based Data Clustering Through Sub-Tensor Impact Graphs

Canonical polyadic decomposition (CPD) of big tensors with low multilinear rank

Tensor Networks for Dimensionality Reduction, Big Data and Deep Learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animals rights statement

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining billion-scale tensors: algorithms and discoveries

Abstract

Access this article

Similar content being viewed by others

Effective Tensor-Based Data Clustering Through Sub-Tensor Impact Graphs

Canonical polyadic decomposition (CPD) of big tensors with low multilinear rank

Tensor Networks for Dimensionality Reduction, Big Data and Deep Learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animals rights statement

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation