Skip to main content

Document Clustering with an Augmented Nonnegative Matrix Factorization Model

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8444))

Included in the following conference series:

Abstract

In this paper, we propose an augmented NMF model to investigate the latent features of documents. The augmented NMF model incorporates the original nonnegative matrix factorization and the local invariance assumption on the document clustering. In our experiment, first we compare our model to baseline algorithms with several benchmark datasets. Then the effectiveness of the proposed model is evaluated using datasets from CiteULike. The clustering results are compared against the subject categories from Web of Science for the CiteULike dataset. Experiments of clustering on both benchmark data sets and CiteULike datasets outperforms many state of the art clustering methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. CSUR 31(3), 264–323 (1999)

    Article  Google Scholar 

  2. Guy, I., Carmel, D.: Social recommender systems. In: Proceedings of the 20th International Conference Companion on World Wide Web, pp. 283–284 (2011)

    Google Scholar 

  3. Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)

    Article  Google Scholar 

  4. Seung, D., Lee, L.: Algorithms for non-negative matrix factorization. Adv. Neural Inf. Process. Syst. 13, 556–562 (2001)

    Google Scholar 

  5. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57 (1999)

    Google Scholar 

  6. Law, M.H., Jain, A.K.: Incremental nonlinear dimensionality reduction by manifold learning. Pattern Anal. Mach. Intell. IEEE Trans. 28(3), 377–391 (2006)

    Article  Google Scholar 

  7. Balasubramanian, M., Schwartz, E.L.: The isomap algorithm and topological stability. Science 295(5552), 7–7 (2002)

    Article  Google Scholar 

  8. Bengio, Y., Paiement, J.-F., Vincent, P., Delalleau, O., Le Roux, N., Ouimet, M.: Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering. Adv. Neural Inf. Process. Syst. 16, 177–184 (2004)

    Google Scholar 

  9. Samko, O., Marshall, A.D., Rosin, P.L.: Selection of the optimal parameter value for the Isomap algorithm. Pattern Recognit. Lett. 27(9), 968–979 (2006)

    Article  Google Scholar 

  10. Tenenbaum, J.B., De Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)

    Article  Google Scholar 

  11. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)

    Article  MATH  Google Scholar 

  12. Cai, D., He, X., Han, J., Huang, T.S.: Graph regularized nonnegative matrix factorization for data representation. Pattern Anal. Mach. Intell. IEEE Trans. 33(8), 1548–1560 (2011)

    Article  Google Scholar 

  13. Gaussier, E., Goutte, C.: Relation between PLSA and NMF and implications. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 601–602 (2005)

    Google Scholar 

  14. Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 267–273 (2003)

    Google Scholar 

  15. Chung, F.R.: Spectral graph theory, vol. 92. AMS Bookstore (1997)

    Google Scholar 

  16. Ramage, D., Heymann, P., Manning, C.D., Garcia-Molina, H.: Clustering the tagged web. In: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pp. 54–63 (2009)

    Google Scholar 

  17. Lu, C., Hu, X., Park, J.: Exploiting the social tagging network for Web clustering. Syst. Man Cybern. Part Syst. Humans IEEE Trans. 41(5), 840–852 (2011)

    Article  Google Scholar 

  18. Matlab Codes and Datasets for Feature Learning, http://www.cad.zju.edu.cn/home/dengcai/Data/data.html (accessed: September 18, 2013)

  19. Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)

    Article  Google Scholar 

  20. Gu, Q., Zhou, J.: Co-clustering on manifolds. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 359–368 (2009)

    Google Scholar 

  21. Strehl, A., Ghosh, J.: Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2003)

    MATH  MathSciNet  Google Scholar 

  22. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, p. 14 (1967)

    Google Scholar 

  23. Journal Search - IP & Science - Thomson Reuters, http://www.thomsonscientific.com/cgi-bin/jrnlst/jlsubcatg.cgi?PC=D (accessed: October 01, 2013)

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Xiong, Z., Zang, Y., Jiang, X., Hu, X. (2014). Document Clustering with an Augmented Nonnegative Matrix Factorization Model. In: Tseng, V.S., Ho, T.B., Zhou, ZH., Chen, A.L.P., Kao, HY. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8444. Springer, Cham. https://doi.org/10.1007/978-3-319-06605-9_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06605-9_29

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06604-2

  • Online ISBN: 978-3-319-06605-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics