skip to main content
10.1145/3018661.3018721acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

S-HOT: Scalable High-Order Tucker Decomposition

Published:02 February 2017Publication History

ABSTRACT

Multi-aspect data appear frequently in many web-related applications. For example, product reviews are quadruplets of (user, product, keyword, timestamp). How can we analyze such web-scale multi-aspect data? Can we analyze them on an off-the-shelf workstation with limited amount of memory?

Tucker decomposition has been widely used for discovering patterns in relationships among entities in multi-aspect data, naturally expressed as high-order tensors. However, existing algorithms for Tucker decomposition have limited scalability, and especially, fail to decompose high-order tensors since they explicitly materialize intermediate data, whose size rapidly grows as the order increases (≥ 4). We call this problem M-Bottleneck ("Materialization Bottleneck").

To avoid M-Bottleneck, we propose S-HOT, a scalable high-order tucker decomposition method that employs the on-the-fly computation to minimize the materialized intermediate data. Moreover, S-HOT is designed for handling disk-resident tensors, too large to fit in memory, without loading them all in memory at once. We provide theoretical analysis on the amount of memory space and the number of scans of data required by S-HOT. In our experiments, S-HOT showed better scalability not only with the order but also with the dimensionality and the rank than baseline methods. In particular, S-HOT decomposed tensors 1000× larger than baseline methods in terms dimensionality. S- HOT also successfully analyzed real-world tensors that are both large-scale and high-order on an off-the-shelf workstation with limited amount of memory, while baseline methods failed. The source code of S-HOT is publicly available at http://dm.postech.ac.kr/shot to encourage reproducibility.

References

  1. E. Acar, S. A. Çamtepe, M. S. Krishnamoorthy, and B. Yener. Modeling and multiway analysis of chatroom tensors. In ISI, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. W. Austin, G. Ballard, and T. G. Kolda. Parallel tensor compression for large-scale scientific data. In IPDPS, 2016. Google ScholarGoogle ScholarCross RefCross Ref
  3. B. W. Bader, T. G. Kolda, et al. Matlab tensor toolbox version 2.6. Available online, February 2015.Google ScholarGoogle Scholar
  4. A. Beutel, A. Kumar, E. E. Papalexakis, P. P. Talukdar, C. Faloutsos, and E. P. Xing. Flexifact: Scalable flexible factorization of coupled tensors on hadoop. In SDM, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  5. Y. Cai, M. Zhang, D. Luo, C. Ding, and S. Chakravarthy. Low-order tensor decompositions for social tagging recommendation. In WSDM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y. Chi, B. L. Tseng, and J. Tatemura. Eigen-trend: Trend analysis in the blogosphere based on singular value decompositions. In CIKM, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. H. Choi and S. Vishwanathan. Dfacto: Distributed factorization of tensors. In NIPS, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. E. Cohen, R. C. Farias, and P. Comon. Fast decomposition of large nonnegative tensors. IEEE Signal Processing Letters, 22(7):862--866, 2015. Google ScholarGoogle ScholarCross RefCross Ref
  9. A. L. de Almeida and A. Y. Kibangou. Distributed computation of tensor decompositions in collaborative networks. In CAMSAP, pages 232--235, 2013. Google ScholarGoogle ScholarCross RefCross Ref
  10. A. L. De Almeida and A. Y. Kibangou. Distributed large-scale tensor decomposition. In ICASSP, 2014. Google ScholarGoogle ScholarCross RefCross Ref
  11. T. Franz, A. Schultz, S. Sizov, and S. Staab. Triplerank: Ranking semantic web data by tensor decomposition. In ISWC, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. Grasedyck. Hierarchical singular value decomposition of tensors. SIAM Journal on Matrix Analysis and Applications, 31(4):2029--2054, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. B. Jeon, I. Jeon, L. Sael, and U. Kang. Scout: Scalable coupled matrix-tensor factorization - algorithm and discoveries. In ICDE, pages 811--822, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  14. I. Jeon, E. E. Papalexakis, U. Kang, and C. Faloutsos. Haten2: Billion-scale tensor decompositions. In ICDE, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  15. U. Kang, E. Papalexakis, A. Harpale, and C. Faloutsos. Gigatensor: scaling tensor analysis up by 100 times-algorithms and discoveries. In KDD, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. G. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM Review, 51(3):455--500, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. G. Kolda, B. W. Bader, and J. P. Kenny. Higher-order web link analysis using multilinear algebra. In ICDM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. G. Kolda and J. Sun. Scalable tensor decompositions for multi-aspect data mining. In ICDM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. Lamba, V. Nagarajan, K. Shin, and N. Shajarisales. Incorporating side information in tensor completion. In WWW Companion, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. B. Lehoucq, D. C. Sorensen, and C. Yang. ARPACK: Solution of Large Scale Eigenvalue Problems by Implicitly Restarted Arnoldi Methods. Available from [email protected], 1997.Google ScholarGoogle Scholar
  21. K. Maruhashi, F. Guo, and C. Faloutsos. Multiaspectforensics: Pattern mining on large-scale heterogeneous networks with tensor analysis. In ASONAM, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Moghaddam, M. Jamali, and M. Ester. Etf: extended tensor factorization model for personalizing prediction of review helpfulness. In WSDM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. E. E. Papalexakis, C. Faloutsos, and N. D. Sidiropoulos. Parcube: Sparse parallelizable tensor decompositions. In ECML/PKDD, pages 521--536, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  24. E. E. Papalexakis, C. Faloutsos, and N. D. Sidiropoulos. Parcube: Sparse parallelizable candecomp-parafac tensor decomposition. TKDD, 10(1):3, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. E. E. Papalexakis, C. Faloutsos, and N. D. Sidiropoulos. Tensors for data mining and data fusion: Models, applications, and scalable algorithms. TIST, 8(2):16, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. I. Perros, R. Chen, R. Vuduc, and J. Sun. Sparse hierarchical tucker factorization and its application to healthcare. In ICDM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. H. Phan and A. Cichocki. Parafac algorithms for large-scale problems. Neurocomputing, 74(11):1970--1984, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Rendle and L. Schmidt-Thieme. Pairwise interaction tensor factorization for personalized tag recommendation. In WSDM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Y. Saad. Numerical Methods for Large Eigenvalue Problems. Society for Industrial and Applied Mathematics, 2011. Google ScholarGoogle ScholarCross RefCross Ref
  30. L. Sael, I. Jeon, and U. Kang. Scalable tensor mining. Big Data Research, 2(2):82--86, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. K. Shin, B. Hooi, and C. Faloutsos. M-zoom: Fast dense-block detection in tensors with quality guarantees. In ECML/PKDD, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. K. Shin and U. Kang. Distributed methods for high-dimensional and large-scale tensor factorization. In ICDM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. K. Shin, S. Lee, and U. Kang. Fully scalable methods for distributed tensor factorization. TKDE, PP(99):1--1, 2016.Google ScholarGoogle Scholar
  34. N. D. Sidiropoulos and A. Kyrillidis. Multi-way compressed sensing for sparse low-rank tensors. IEEE Signal Processing Letters, 19(11):757--760, 2012. Google ScholarGoogle ScholarCross RefCross Ref
  35. A. Sinha, Z. Shen, Y. Song, H. Ma, D. Eide, B.-J. P. Hsu, and K. Wang. An overview of microsoft academic service (mas) and applications. In WWW Companion, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. A. Smilde, R. Bro, and P. Geladi. Multi-way analysis: applications in the chemical sciences. John Wiley & Sons, 2005.Google ScholarGoogle Scholar
  37. J. Sun, D. Tao, and C. Faloutsos. Beyond streams and graphs: Dynamic tensor analysis. In KDD, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. J.-T. Sun, H.-J. Zeng, H. Liu, Y. Lu, and Z. Chen. Cubesvd: A novel approach to personalized web search. In WWW, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. C. E. Tsourakakis. Mach: Fast randomized tensor decompositions. In SDM, pages 689--700. SIAM, 2010.Google ScholarGoogle Scholar
  40. L. R. Tucker. Some mathematical notes on three-mode factor analysis. Psychometrika, 31(3):279--311, 1966. Google ScholarGoogle ScholarCross RefCross Ref
  41. N. Vervliet, O. Debals, L. Sorber, and L. De Lathauwer. Breaking the curse of dimensionality using decompositions of incomplete tensors: Tensor-based scientific computing in big data analysis. IEEE Signal Processing Magazine, 31(5):71--79, 2014. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. S-HOT: Scalable High-Order Tucker Decomposition

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining
      February 2017
      868 pages
      ISBN:9781450346757
      DOI:10.1145/3018661

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 February 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      WSDM '17 Paper Acceptance Rate80of505submissions,16%Overall Acceptance Rate498of2,863submissions,17%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader