ABSTRACT
Multi-aspect data appear frequently in many web-related applications. For example, product reviews are quadruplets of (user, product, keyword, timestamp). How can we analyze such web-scale multi-aspect data? Can we analyze them on an off-the-shelf workstation with limited amount of memory?
Tucker decomposition has been widely used for discovering patterns in relationships among entities in multi-aspect data, naturally expressed as high-order tensors. However, existing algorithms for Tucker decomposition have limited scalability, and especially, fail to decompose high-order tensors since they explicitly materialize intermediate data, whose size rapidly grows as the order increases (≥ 4). We call this problem M-Bottleneck ("Materialization Bottleneck").
To avoid M-Bottleneck, we propose S-HOT, a scalable high-order tucker decomposition method that employs the on-the-fly computation to minimize the materialized intermediate data. Moreover, S-HOT is designed for handling disk-resident tensors, too large to fit in memory, without loading them all in memory at once. We provide theoretical analysis on the amount of memory space and the number of scans of data required by S-HOT. In our experiments, S-HOT showed better scalability not only with the order but also with the dimensionality and the rank than baseline methods. In particular, S-HOT decomposed tensors 1000× larger than baseline methods in terms dimensionality. S- HOT also successfully analyzed real-world tensors that are both large-scale and high-order on an off-the-shelf workstation with limited amount of memory, while baseline methods failed. The source code of S-HOT is publicly available at http://dm.postech.ac.kr/shot to encourage reproducibility.
- E. Acar, S. A. Çamtepe, M. S. Krishnamoorthy, and B. Yener. Modeling and multiway analysis of chatroom tensors. In ISI, 2005. Google ScholarDigital Library
- W. Austin, G. Ballard, and T. G. Kolda. Parallel tensor compression for large-scale scientific data. In IPDPS, 2016. Google ScholarCross Ref
- B. W. Bader, T. G. Kolda, et al. Matlab tensor toolbox version 2.6. Available online, February 2015.Google Scholar
- A. Beutel, A. Kumar, E. E. Papalexakis, P. P. Talukdar, C. Faloutsos, and E. P. Xing. Flexifact: Scalable flexible factorization of coupled tensors on hadoop. In SDM, 2014.Google ScholarCross Ref
- Y. Cai, M. Zhang, D. Luo, C. Ding, and S. Chakravarthy. Low-order tensor decompositions for social tagging recommendation. In WSDM, 2011. Google ScholarDigital Library
- Y. Chi, B. L. Tseng, and J. Tatemura. Eigen-trend: Trend analysis in the blogosphere based on singular value decompositions. In CIKM, 2006.Google ScholarDigital Library
- J. H. Choi and S. Vishwanathan. Dfacto: Distributed factorization of tensors. In NIPS, 2014.Google ScholarDigital Library
- J. E. Cohen, R. C. Farias, and P. Comon. Fast decomposition of large nonnegative tensors. IEEE Signal Processing Letters, 22(7):862--866, 2015. Google ScholarCross Ref
- A. L. de Almeida and A. Y. Kibangou. Distributed computation of tensor decompositions in collaborative networks. In CAMSAP, pages 232--235, 2013. Google ScholarCross Ref
- A. L. De Almeida and A. Y. Kibangou. Distributed large-scale tensor decomposition. In ICASSP, 2014. Google ScholarCross Ref
- T. Franz, A. Schultz, S. Sizov, and S. Staab. Triplerank: Ranking semantic web data by tensor decomposition. In ISWC, 2009.Google ScholarDigital Library
- L. Grasedyck. Hierarchical singular value decomposition of tensors. SIAM Journal on Matrix Analysis and Applications, 31(4):2029--2054, 2010. Google ScholarDigital Library
- B. Jeon, I. Jeon, L. Sael, and U. Kang. Scout: Scalable coupled matrix-tensor factorization - algorithm and discoveries. In ICDE, pages 811--822, 2016.Google ScholarCross Ref
- I. Jeon, E. E. Papalexakis, U. Kang, and C. Faloutsos. Haten2: Billion-scale tensor decompositions. In ICDE, 2015.Google ScholarCross Ref
- U. Kang, E. Papalexakis, A. Harpale, and C. Faloutsos. Gigatensor: scaling tensor analysis up by 100 times-algorithms and discoveries. In KDD, 2012. Google ScholarDigital Library
- T. G. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM Review, 51(3):455--500, 2009. Google ScholarDigital Library
- T. G. Kolda, B. W. Bader, and J. P. Kenny. Higher-order web link analysis using multilinear algebra. In ICDM, 2005. Google ScholarDigital Library
- T. G. Kolda and J. Sun. Scalable tensor decompositions for multi-aspect data mining. In ICDM, 2008. Google ScholarDigital Library
- H. Lamba, V. Nagarajan, K. Shin, and N. Shajarisales. Incorporating side information in tensor completion. In WWW Companion, 2016. Google ScholarDigital Library
- R. B. Lehoucq, D. C. Sorensen, and C. Yang. ARPACK: Solution of Large Scale Eigenvalue Problems by Implicitly Restarted Arnoldi Methods. Available from [email protected], 1997.Google Scholar
- K. Maruhashi, F. Guo, and C. Faloutsos. Multiaspectforensics: Pattern mining on large-scale heterogeneous networks with tensor analysis. In ASONAM, 2011.Google ScholarDigital Library
- S. Moghaddam, M. Jamali, and M. Ester. Etf: extended tensor factorization model for personalizing prediction of review helpfulness. In WSDM, 2012. Google ScholarDigital Library
- E. E. Papalexakis, C. Faloutsos, and N. D. Sidiropoulos. Parcube: Sparse parallelizable tensor decompositions. In ECML/PKDD, pages 521--536, 2012.Google ScholarCross Ref
- E. E. Papalexakis, C. Faloutsos, and N. D. Sidiropoulos. Parcube: Sparse parallelizable candecomp-parafac tensor decomposition. TKDD, 10(1):3, 2015. Google ScholarDigital Library
- E. E. Papalexakis, C. Faloutsos, and N. D. Sidiropoulos. Tensors for data mining and data fusion: Models, applications, and scalable algorithms. TIST, 8(2):16, 2016. Google ScholarDigital Library
- I. Perros, R. Chen, R. Vuduc, and J. Sun. Sparse hierarchical tucker factorization and its application to healthcare. In ICDM, 2015. Google ScholarDigital Library
- A. H. Phan and A. Cichocki. Parafac algorithms for large-scale problems. Neurocomputing, 74(11):1970--1984, 2011. Google ScholarDigital Library
- S. Rendle and L. Schmidt-Thieme. Pairwise interaction tensor factorization for personalized tag recommendation. In WSDM, 2010. Google ScholarDigital Library
- Y. Saad. Numerical Methods for Large Eigenvalue Problems. Society for Industrial and Applied Mathematics, 2011. Google ScholarCross Ref
- L. Sael, I. Jeon, and U. Kang. Scalable tensor mining. Big Data Research, 2(2):82--86, 2015. Google ScholarDigital Library
- K. Shin, B. Hooi, and C. Faloutsos. M-zoom: Fast dense-block detection in tensors with quality guarantees. In ECML/PKDD, 2016.Google ScholarDigital Library
- K. Shin and U. Kang. Distributed methods for high-dimensional and large-scale tensor factorization. In ICDM, 2014. Google ScholarDigital Library
- K. Shin, S. Lee, and U. Kang. Fully scalable methods for distributed tensor factorization. TKDE, PP(99):1--1, 2016.Google Scholar
- N. D. Sidiropoulos and A. Kyrillidis. Multi-way compressed sensing for sparse low-rank tensors. IEEE Signal Processing Letters, 19(11):757--760, 2012. Google ScholarCross Ref
- A. Sinha, Z. Shen, Y. Song, H. Ma, D. Eide, B.-J. P. Hsu, and K. Wang. An overview of microsoft academic service (mas) and applications. In WWW Companion, 2015.Google ScholarDigital Library
- A. Smilde, R. Bro, and P. Geladi. Multi-way analysis: applications in the chemical sciences. John Wiley & Sons, 2005.Google Scholar
- J. Sun, D. Tao, and C. Faloutsos. Beyond streams and graphs: Dynamic tensor analysis. In KDD, 2006.Google ScholarDigital Library
- J.-T. Sun, H.-J. Zeng, H. Liu, Y. Lu, and Z. Chen. Cubesvd: A novel approach to personalized web search. In WWW, 2005. Google ScholarDigital Library
- C. E. Tsourakakis. Mach: Fast randomized tensor decompositions. In SDM, pages 689--700. SIAM, 2010.Google Scholar
- L. R. Tucker. Some mathematical notes on three-mode factor analysis. Psychometrika, 31(3):279--311, 1966. Google ScholarCross Ref
- N. Vervliet, O. Debals, L. Sorber, and L. De Lathauwer. Breaking the curse of dimensionality using decompositions of incomplete tensors: Tensor-based scientific computing in big data analysis. IEEE Signal Processing Magazine, 31(5):71--79, 2014. Google ScholarCross Ref
Index Terms
- S-HOT: Scalable High-Order Tucker Decomposition
Recommendations
Hot-LSNs distributing wear-leveling algorithm for flash memory
Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systemsFlash memory offers attractive features, such as non-volatile, shock resistance, fast access and low power consumption for data storage. However, it has one main drawback of requiring an erase before updating the contents. Furthermore, the flash memory ...
Image compressive sensing via Truncated Schatten-p Norm regularization
Low-rank property as a useful image prior has attracted much attention in image processing communities. Recently, a nonlocal low-rank regularization (NLR) approach toward exploiting low-rank property has shown the state-of-the-art performance in ...
Compressive sensing via nonlocal low-rank tensor regularization
The aim of Compressing sensing (CS) is to acquire an original signal, when it is sampled at a lower rate than Nyquist rate previously. In the framework of CS, the original signal is often assumed to be sparse and correlated in some domain. Recently, ...
Comments