Skip to main content

Clustering on Multi-source Incomplete Data via Tensor Modeling and Factorization

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9078))

Included in the following conference series:

Abstract

With advances in data collection technologies, multiple data sources are assuming increasing prominence in many applications. Clustering from multiple data sources has emerged as a topic of critical significance in the data mining and machine learning community. Different data sources provide different levels of necessarily detailed knowledge. Thus, combining multiple data sources is pivotal to facilitate the clustering process. However, in reality, the data usually exhibits heterogeneity and incompleteness. The key challenge is how to effectively integrate information from multiple heterogeneous sources in the presence of missing data. Conventional methods mainly focus on clustering heterogeneous data with full information in all sources or at least one source without missing values. In this paper, we propose a more general framework T-MIC (Tensor based Multi-source Incomplete data Clustering) to integrate multiple incomplete data sources. Specifically, we first use the kernel matrices to form an initial tensor across all the multiple sources. Then we formulate a joint tensor factorization process with the sparsity constraint and use it to iteratively push the initial tensor towards a quality-driven exploration of the latent factors by taking into account missing data uncertainty. Finally, these factors serve as features to clustering. Extensive experiments on both synthetic and real datasets demonstrate that our proposed approach can effectively boost clustering performance, even with large amounts of missing data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bickel, S., Scheffer, T.: Multi-view clustering. In: ICDM, pp. 19–26 (2004)

    Google Scholar 

  2. Bleiholder, J., Naumann, F.: Data fusion. ACM Comput. Surv. 41(1), 1–41 (2009)

    Article  Google Scholar 

  3. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: COLT, New York, NY, USA, pp. 92–100 (1998)

    Google Scholar 

  4. Cattell, R.B.: Parallel proportional profiles and other principles for determining the choice of factors by rotation. Psychometrika 9(4), 267–283 (1944)

    Article  Google Scholar 

  5. Duin, R.P.: Handwritten-Numerals-Dataset

    Google Scholar 

  6. Kettenring, J.R.: Canonical Analysis of Several Sets of Variables. Biometrika 58(3), 433–451 (1971)

    Article  MATH  MathSciNet  Google Scholar 

  7. Kolda, T.G., Bader, B.W.: Tensor Decompositions and Applications. SIAM REVIEW 51, 455–500 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  8. Kriegel, H.P., Kunath, P.,Pryakhin, A., Schubert, M.: MUSE: multi-represented similarity estimation. In: ICDE, pp. 1340–1342. IEEE Computer Society, Washington (2008)

    Google Scholar 

  9. Kruskal, J.B.: Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra and its Applications 18(2), 95–138 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  10. Kumar, A., Daume III, H.: A co-training approach for multi-view spectral clustering. In: ICML, New York, NY, USA, pp. 393–400, June 2011

    Google Scholar 

  11. Kumar, A., Rai, P., Daume III, H.: Co-regularized multi-view spectral clustering. In: NIPS, pp. 1413–1421 (2011)

    Google Scholar 

  12. Kushmerick, N.: Learning to Remove Internet Advertisements, pp. 175–181. ACM Press (1999)

    Google Scholar 

  13. De Lathauwer, L., De Moor, B., Vandewalle, J.: On the Best Rank-1 and Rank-(R1, R2, RN) Approximation of Higher-Order Tensors. SIAM J. Matrix Anal. Appl. 21(4), 1324–1342 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  14. Li, S., Jiang, Y., Zhou, Z.: Partial Multi-View Clustering (2014)

    Google Scholar 

  15. Liu, J., Wang, C., Gao, J., Han, J.: Multi-view clustering via joint nonnegative matrix factorization. In: SDM (2013)

    Google Scholar 

  16. Liu, X., Ji, S., Glanzel, W., De Moor, B.: Multiview Partitioning via Tensor Methods. IEEE Trans. Knowl. Data Eng. 25(5), 1056–1069 (2013)

    Article  Google Scholar 

  17. Long, B., Yu, P.S., Zhang, Z.M.: A general model for multiple view unsupervised learning. In: SDM, pp. 822–833 (2008)

    Google Scholar 

  18. Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: CIKM, pp. 86–93. ACM, New York (2000)

    Google Scholar 

  19. Papalexakis, E.E., Akoglu, L., Ience, D.: Do more views of a graph help? community detection and clustering in multi-graphs. In: FUSION, pp. 899–905. IEEE (2013)

    Google Scholar 

  20. Papalexakis, E.E., Sidiropoulos, N.D.: Co-clustering as multilinear decomposition with sparse latent factors. In: ICASSP, pp. 2064–2067. IEEE (2011)

    Google Scholar 

  21. Shao, W., Shi, X., Yu, P.S.: Clustering on multiple incomplete datasets via collective kernel learning. In: ICDM, pp. 1181–1186 (2013)

    Google Scholar 

  22. Shi, X., Paiement, J., Grangier, D., Yu, P.S.: Learning from heterogeneous sources via gradient boosting consensus. In: SDM (2012)

    Google Scholar 

  23. Silva, V., Lim, L.-H.: Tensor rank and the ill-posedness of the best low-rank approximation problem. SIAM J. Matrix Anal. Appl. 30(3), 1084–1127 (2008)

    Article  MathSciNet  Google Scholar 

  24. Tang, W., Lu, Z., Dhillon, I.S.: Clustering with multiple graphs. In: ICDM, Miami, Florida, USA, pp. 1016–1021, December 2009

    Google Scholar 

  25. Trivedi, A., Rai, P., Daumé III, H., DuVall, S.L.: Multiview clustering with incomplete views. In: NIPS Workshop, Whistler, Canada (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lifang He .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Shao, W., He, L., Yu, P.S. (2015). Clustering on Multi-source Incomplete Data via Tensor Modeling and Factorization. In: Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D., Motoda, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2015. Lecture Notes in Computer Science(), vol 9078. Springer, Cham. https://doi.org/10.1007/978-3-319-18032-8_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18032-8_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18031-1

  • Online ISBN: 978-3-319-18032-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics