Skip to main content

A Novel Cluster Combination Algorithm for Document Clustering

  • Conference paper
Intelligent Science and Intelligent Data Engineering (IScIDE 2012)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7751))

  • 2423 Accesses

Abstract

Ensemble techniques have been successfully applied in the supervised machine learning area to increase the accuracy and stability of base learner. Recently, analogous techniques have been investigated in unsupervised machine learning area. Research has showed that, by combining an ensemble of multiple clusterings, a superior solution can be attained. In this paper, we solve the cluster combination problem in term of finding a “best” subspace and formulate it as an optimization problem. Then, we get the solution according to basic concept and theorem in linear algebra whereupon a novel cluster combination algorithm is proposed. We compare our algorithm with other common cluster ensemble algorithms on real-world datasets. Experimental results demonstrate the effectiveness of our algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley Longman Publishing, Boston (2010)

    Google Scholar 

  2. Maclin, R., Opitz, D.: Popular ensemble methods: an empirical study. Journal of Artificial Intelligence Research 11(8), 169–198 (1999)

    MATH  Google Scholar 

  3. Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining partitionings. In: Eighteenth National Conference on Artificial Intelligence, pp. 93–98 (2003)

    Google Scholar 

  4. Fred, A., Lourengo, A.: Cluster Ensemble Methods: from Single Clusterings to Combined Solutions. In: Okun, O., Valentini, G. (eds.) Supervised and Unsupervised Ensemble Methods and their Applications. SCI, vol. 126, pp. 3–30. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  5. Nguyen, N., Caruana, R.: Consensus clusterings. In: Proc. of the 7th IEEE ICDM, pp. 607–612 (2007)

    Google Scholar 

  6. Sevillano, X., Alías, F., SocoróJ, C.: BordaConsensus: a new consensus function for soft cluster ensembles. In: Proc. of the 30th Annual International, ACM SIGIR, pp. 743–744 (2007)

    Google Scholar 

  7. Xu, S., Lu, Z.M., Gu, G.C.: An efficient spectral method for document cluster ensemble. In: The 9th Intl. Conf. Young Computer Sci., pp. 808–813 (2008)

    Google Scholar 

  8. Dattorro, J.: Convex Optimization & Euclidean Distance Geometry. Meboo Publishing, USA (2005)

    Google Scholar 

  9. Berry, M.W.: Large-scale sparse singular value computations. The International Journal of Supercomputer Applications 6(1), 13–49 (1992)

    Google Scholar 

  10. http://www.research.att.com/~lewis

  11. http://trec.nist.gov

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xu, S., Wang, Z., Li, X., Cao, R. (2013). A Novel Cluster Combination Algorithm for Document Clustering. In: Yang, J., Fang, F., Sun, C. (eds) Intelligent Science and Intelligent Data Engineering. IScIDE 2012. Lecture Notes in Computer Science, vol 7751. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36669-7_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36669-7_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36668-0

  • Online ISBN: 978-3-642-36669-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics