skip to main content
10.1145/2807591.2807671acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

An input-adaptive and in-place approach to dense tensor-times-matrix multiply

Published:15 November 2015Publication History

ABSTRACT

This paper describes a novel framework, called InTensLi ("intensely"), for producing fast single-node implementations of dense tensor-times-matrix multiply (Ttm) of arbitrary dimension. Whereas conventional implementations of Ttm rely on explicitly converting the input tensor operand into a matrix---in order to be able to use any available and fast general matrix-matrix multiply (Gemm) implementation---our framework's strategy is to carry out the Ttm in-place, avoiding this copy. As the resulting implementations expose tuning parameters, this paper also describes a heuristic empirical model for selecting an optimal configuration based on the Ttm's inputs. When compared to widely used single-node Ttm implementations that are available in the Tensor Toolbox and Cyclops Tensor Framework (Ctf), In-TensLi's in-place and input-adaptive Ttm implementations achieve 4× and 13× speedups, showing Gemm-like performance on a variety of input sizes.

References

  1. An updated set of Basic Linear Algebra Subprograms (BLAS). ACM Trans. Math. Softw., 28(2):135--151, June 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. E. Acar, C. Aykut-Bingol, H. Bingol, R. Bro, and B. Yener. Multiway analysis of epilepsy tensors. Bioinformatics, 23(13):i10--i18, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. Acar, S. A. Camtepe, M. S. Krishnamoorthy, and B. Yener. Modeling and multiway analysis of chatroom tensors. In Intelligence and Security Informatics, pages 256--268. Springer, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. E. Acar, R. J. Harrison, F. Olken, O. Alter, M. Helal, L. Omberg, B. Bader, A. Kennedy, H. Park, Z. Bai, D. Kim, R. Plemmons, G. Beylkin, T. Kolda, S. Ragnarsson, L. Delathauwer, J. Langou, S. P. Ponnapalli, I. Dhillon, L.-h. Lim, J. R. Ramanujam, C. Ding, M. Mahoney, J. Raynolds, L. EldÃl'n, C. Martin, P. Regalia, P. Drineas, M. Mohlenkamp, C. Faloutsos, J. Morton, B. Savas, S. Friedland, L. Mullin, and C. Van Loan. Future directions in tensor-based computation and modeling, 2009.Google ScholarGoogle Scholar
  5. A. Auer and etc. Automatic code generation for many-body electronic structure methods: the tensor contrac. Molecular Physics, 104(2):211--228, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  6. B. W. Bader, T. G. Kolda, et al. Matlab tensor toolbox version 2.5. Available from http://www.sandia.gov/tgkolda/TensorToolbox/, January 2012.Google ScholarGoogle Scholar
  7. G. Ballard, E. Carson, J. Demmel, M. Hoemmen, N. Knight, and O. Schwartz. Communication lower bounds and optimal algorithms for numerical linear algebra. Acta Numerica, 23:pp. 1--155, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  8. M. Baskaran, B. Meister, N. Vasilache, and R. Lethin. Efficient and scalable computations with sparse tensors. In High Performance Extreme Computing (HPEC), 2012 IEEE Conference on, pages 1--6, Sept 2012.Google ScholarGoogle ScholarCross RefCross Ref
  9. J. H. Choi and S. Vishwanathan. Dfacto: Distributed factorization of tensors. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 1296--1304. Curran Associates, Inc., 2014.Google ScholarGoogle Scholar
  10. A. Cichocki. Era of big data processing: A new approach via tensor networks and tensor decompositions. CoRR, abs/1403.2048, 2014.Google ScholarGoogle Scholar
  11. K. Goto and R. A. v. d. Geijn. Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw., 34(3):12:1--12:25, May 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. Grasedyck. Hierarchical singular value decomposition of tensors. SIAM J. Matrix Anal. Appl., 31(4):2029--2054, May 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. Grasedyck, D. Kressner, and C. Tobler. A literature survey of low-rank tensor approximation techniques. GAMM-Mitteilungen, 36(1):53--78, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  14. R. A. Harshman. Foundations of the parafac procedure: models and conditions for an" explanatory" multimodal factor analysis. 1970.Google ScholarGoogle Scholar
  15. J. C. Ho, J. Ghosh, and J. Sun. Marble: High-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '14, pages 115--124, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. W. Hockney and I. J. Curington. f1/2 : A parameter to characterize memory and communication bottlenecks. Parallel Computing, 10:277--286, 1989.Google ScholarGoogle ScholarCross RefCross Ref
  17. Intel. Math kernel library. http://developer.intel.com/software/products/mkl/.Google ScholarGoogle Scholar
  18. I. Jeon, E. E. Papalexakis, U. Kang, and C. Faloutsos. Haten2: Billion-scale tensor decompositions. In ICDE, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  19. M. Jiang, P. Cui, F. Wang, X. Xu, W. Zhu, and S. Yang. Fema: Flexible evolutionary multi-faceted analysis for dynamic behavioral pattern discovery. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '14, pages 1186--1195, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. U. Kang, E. E. Papalexakis, A. Harpale, and C. Faloutsos. Gigatensor: scaling tensor analysis up by 100 times - algorithms and discoveries. In The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '12, Beijing, China, August 12--16, 2012, pages 316--324, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Kolda and B. Bader. Tensor decompositions and applications. SIAM Review, 51(3):455--500, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. G. Kolda and J. Sun. Scalable tensor decompositions for multi-aspect data mining. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, ICDM '08, pages 363--372, Washington, DC, USA, 2008. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C.-F. V. Latchoumane, F.-B. Vialatte, J. Solé-Casals, M. Maurice, S. R. Wimalaratna, N. Hudson, J. Jeong, and A. Cichocki. Multiway array decomposition analysis of eegs in alzheimer's disease. Journal of neuroscience methods, 207(1):41--50, 2012.Google ScholarGoogle Scholar
  24. L. D. Lathauwer and J. Vandewalle. Dimensionality reduction in higher-order signal processing and rank- (r1,r2,...,rn) reduction in multilinear algebra. Linear Algebra and its Applications, 391(0):31--55, 2004. Special Issue on Linear Algebra in Signal and Image Processing.Google ScholarGoogle ScholarCross RefCross Ref
  25. J. Li, G. Tan, M. Chen, and N. Sun. Smat: An input adaptive auto-tuner for sparse matrix-vector multiplication. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '13, pages 117--126, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. Matsubara, Y. Sakurai, W. G. van Panhuis, and C. Faloutsos. Funnel: Automatic mining of spatially coevolving epidemics. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '14, pages 105--114, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Mocks. Topographic components model for event-related potentials and some biophysical considerations. Biomedical Engineering, IEEE Transactions on, 35(6):482--484, June 1988.Google ScholarGoogle Scholar
  28. M. Morup, L. K. Hansen, C. S. Herrmann, J. Parnas, and S. M. Arnfred. Parallel factor analysis as an exploratory tool for wavelet transformed event-related {EEG}. NeuroImage, 29(3):938--947, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  29. J. Nagy and M. Kilmer. Kronecker product approximation for preconditioning in three-dimensional imaging applications. Image Processing, IEEE Transactions on, 15(3):604--613, March 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. I. V. Oseledets. Tensor-train decomposition. SIAM J. Scientific Computing, 33(5):2295--2317, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. E. E. Papalexakis, C. Faloutsos, and N. D. Sidiropolous. ParCube: Sparse parallelizable tensor decompositions. In Proceedings of the 2012 European Conference on Machine Learning Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), pages pp. 521--536, Bristol, United Kingdom, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Ramanathan, P. K. Agarwal, M. Kurnikova, and C. J. Langmead. An online approach for mining collective behaviors from molecular dynamics simulations, volume LNCS 5541, pages pp. 138--154. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. N. Ravindran, N. D. Sidiropoulos, S. Smith, and G. Karypis. Memory-efficient parallel computation of tensor and matrix products for big tensor decompositions. Proceedings of the Asilomar Conference on Signals, Systems, and Computers, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  34. B. Savas and L. Eldén. Handwritten digit classification using higher order singular value decomposition. Pattern recognition, 40(3):993--1003, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. Shashua and A. Levin. Linear image coding for regression and classification using the tensor-rank principle. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, volume 1, pages I--42--I--49 vol.1, 2001.Google ScholarGoogle Scholar
  36. N. Sidiropoulos, R. Bro, and G. Giannakis. Parallel factor analysis in sensor array processing. Signal Processing, IEEE Transactions on, 48(8):2377--2388, Aug 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. N. Sidiropoulos, G. Giannakis, and R. Bro. Blind parafac receivers for ds-cdma systems. Signal Processing, IEEE Transactions on, 48(3):810--823, Mar 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. Smith, N. Ravindran, N. Sidiropoulos, and G. Karypis. Splatt: Efficient and parallel sparse tensor-matrix multiplication. In Proceedings of the 29th IEEE International Parallel & Distributed Processing Symposium, IPDPS, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. E. Solomonik, J. Demmel, and T. Hoefler. Communication lower bounds for tensor contraction algorithms. Technical report, ETH Zürich, 2015.Google ScholarGoogle Scholar
  40. E. Solomonik, D. Matthews, J. Hammond, and J. Demmel. Cyclops tensor framework: reducing communication and eliminating load imbalance in massively parallel contractions. Technical Report UCB/EECS-2012-210, EECS Department, University of California, Berkeley, Nov 2012.Google ScholarGoogle Scholar
  41. J. Sun, D. Tao, and C. Faloutsos. Beyond streams and graphs: dynamic tensor analysis. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 374--383. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. L. R. Tucker. Some mathematical notes on three-mode factor analysis. Psychometrika, 31(3):279--311, 1966.Google ScholarGoogle ScholarCross RefCross Ref
  43. F. G. Van Zee and R. A. van de Geijn. BLIS: A framework for rapidly instantiating BLAS functionality. ACM Transactions on Mathematical Software, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. M. A. O. Vasilescu and D. Terzopoulos. Multilinear analysis of image ensembles: Tensorfaces. In Computer Vision-ECCV 2002, pages 447--460. Springer, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. R. C. Whaley and J. Dongarra. Automatically tuned linear algebra software. In SuperComputing 1998: High Performance Networking and Computing, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An input-adaptive and in-place approach to dense tensor-times-matrix multiply

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
          November 2015
          985 pages
          ISBN:9781450337236
          DOI:10.1145/2807591
          • General Chair:
          • Jackie Kern,
          • Program Chair:
          • Jeffrey S. Vetter

          Copyright © 2015 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 15 November 2015

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          SC '15 Paper Acceptance Rate79of358submissions,22%Overall Acceptance Rate1,516of6,373submissions,24%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader