Skip to main content

A Semi-supervised Clustering Algorithm Based on Must-Link Set

  • Conference paper
Advanced Data Mining and Applications (ADMA 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5139))

Included in the following conference series:

Abstract

Clustering analysis is traditionally considered as an unsupervised learning process. In most cases, people usually have some prior or background knowledge before they perform the clustering. How to use the prior or background knowledge to imporve the cluster quality and promote the efficiency of clustering data has become a hot research topic in recent years. The Must-Link and Cannot-Link constraints between instances are common prior knowledge in many real applications. This paper presents the concept of Must-Link Set and designs a new semi-supervised clustering algorithm MLC-KMeans using Musk-Link Set as assistant centroid. The preliminary experiment on several UCI datasets confirms the effectiveness and efficiency of the algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hand, D., Mannila, H., Smyth, P.: Principles of Data Mining. The MIT Press, Massachusetts (2001)

    Google Scholar 

  2. Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison Wesley, Reading (2005)

    Google Scholar 

  3. Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained k-means clustering with background knowledge. In: Brodley, C., Danyluk, A. P. (eds.) Proc. of the 18th Int’l Conf. on Machine Learning (ICML 2001), pp. 577–584. Morgan Kaufmann Publishers, San Francisco (2001)

    Google Scholar 

  4. Basu, S., Banerjee, A., Mooney, R.J.: Active semi-supervision for pairwise constrained clustering. In: Proceedings of the SIAM International Conference on Data Mining, pp. 333–344 (2004)

    Google Scholar 

  5. Davidson, I., Ravi, S.S.: Clustering with constraints: feasibility issues and the k-Means algorithm. In: 5th SIAM Data Mining Conference, pp. 138–149 (2005)

    Google Scholar 

  6. Blake, C., Keogh, E., Merz, C.J.: UCI repository of machine learning databases. University of California, Irvine (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html

    Google Scholar 

  7. Shin, K., Abraham, A.: Two phase semi-supervised clustering using background knowledge. In: Corchado, E., et al. (eds.) IDEAL 2006. LNCS, vol. 4224, pp. 707–712. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  8. Basu, S., Banerjee, A., Mooney, R.: Semi-Supervised clustering by seeding. In: Sammut, C., Hoffmann, A. (eds.) Proc. of the 19th Int’l Conf. on Machine Learning (ICML 2002), pp. 19–26. Morgan Kaufmann Publishers, San Francisco (2002)

    Google Scholar 

  9. Basu, S.: Semi-supervised clustering: probabilistic models, algorithms and experiments. Ph.D. Thesis, Department of Computer Sciences, University of Texas at Austin (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Huang, H., Cheng, Y., Zhao, R. (2008). A Semi-supervised Clustering Algorithm Based on Must-Link Set. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2008. Lecture Notes in Computer Science(), vol 5139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88192-6_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88192-6_48

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88191-9

  • Online ISBN: 978-3-540-88192-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics