Skip to main content

Semi-supervised Classification Based on Clustering Ensembles

  • Conference paper
Artificial Intelligence and Computational Intelligence (AICI 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5855))

Abstract

In many real-world applications, there only exist very few labeled samples, while a large number of unlabeled samples are available. Therefore, it is difficult for some traditional semi-supervised algorithms to generate the useful classifiers to evaluate the labeling confidence of unlabeled samples. In this paper, a new semi-supervised classification based on clustering ensembles named SSCCE is proposed. It takes advantages of clustering ensembles to generate multiple partitions for a given dataset, and then uses the clustering consistency index to determine the labeling confidence of unlabeled samples. The algorithm can overcome some defects about the traditional semi-supervised classification algorithms, and enhance the performance of the hypothesis trained on very few labeled samples by exploiting a large number of unlabeled samples. Experiments carried out on ten public data sets from UCI machine learning repository show that this method is effective and feasible.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zhou, Z.H., Li, M.: Semi-Supervised Regression with Co-Training Style Algorithms. IEEE Transactions on Knowledge and Data Engineering 19, 1479–1493 (2007)

    Article  Google Scholar 

  2. Gabrys, B., Petrakieva, L.: Combining Labelled and Unlabelled Data in the Design of Pattern Classification Systems. International Journal of Approximate Reasoning, 251–273 (2004)

    Google Scholar 

  3. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  4. Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. School of Information and Computer Science. University of California, Irvine, CA, http://www.ics.uci.edu/~mlearn/MLRepository.html

  5. Nigam, K., Ghani, R.: Analyzing the Effectiveness and Applicability of Co-Training. In: Proceedings of the 9th International Conference on Information and Knowledge Management, pp. 86–93 (2000)

    Google Scholar 

  6. Li, M., Zhou, Z.H.: Improve Computer-Aided Diagnosis with Machine Learning Techniques Using Undiagnosed Samples. IEEE Transactions on Systems, Man and Cybernetics – Part A: Systems and Humans 37, 1088–1098 (2007)

    Article  Google Scholar 

  7. Blum, A., Mitchell, T.: Combining Labeled and Unlabeled Data with Co-Training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT 1998), MI, Wisconsin, pp. 92–100 (1998)

    Google Scholar 

  8. Goldman, S., Zhou, Y.: Enhancing Supervised Learning with Unlabeled Data. In: Proceedings of the 17th International Conference on Machine Learning (ICML 2000), CA, San Francisco, pp. 327–334 (2000)

    Google Scholar 

  9. Zhou, Z.H., Li, M.: Tri-training: Exploiting Unlabeled Data Using Three Classifiers. IEEE Transactions on Knowledge and Data Engineering 17, 1529–1541 (2005)

    Article  Google Scholar 

  10. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)

    Google Scholar 

  11. Topchy, A., Jain, A.K., Punch, W.: A Mixture Model for Clustering Ensembles. In: Proceeding of the 4th SIAM International Conference on Data Mining, pp. 379–390 (2004)

    Google Scholar 

  12. Fred, A.: Finding Consistent Clusters in Data Partitions. In: Kittler, J., Roli, F. (eds.) MCS 2001. LNCS, vol. 2096, pp. 309–318. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  13. Roussopoulos, N., Kelly, S., Vincent, F.: Nearest Neighbor Queries. In: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, pp. 71–79 (1995)

    Google Scholar 

  14. Li, M., Zhou, Z.H.: SETRED: Self-Training with Editing. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 611–621. Springer, Heidelberg (2005)

    Google Scholar 

  15. Dubes, R., Jain, A.K.: Clustering Techniques: the User’s Dilemma. Pattern Recognition 41, 578–588 (1998)

    Google Scholar 

  16. Topchy, A., Minaei-Bidgoli, B., Jain, A.K., Punch, W.F.: Adaptive Clustering Ensembles. In: Proceedings of the 17th International Conference on Pattern Recognition (ICPR 2004), vol. 1, pp. 272–275 (2004)

    Google Scholar 

  17. Kothari, R., Jain, V.: Learning from Labeled and Unlabeled Data. In: IEEE World Congress on Computational Intelligence, IEEE International Joint Conference on Neural Networks, HI, Honolulu, USA, pp. 1468–1474 (2002)

    Google Scholar 

  18. Breiman, L.: Random Forests. Machine Learning 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  19. Zhou, Z.H., Tang, W.: Clusterer Ensemble. Knowledge-Based Systems 9, 77–83 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, S., Guo, G., Chen, L. (2009). Semi-supervised Classification Based on Clustering Ensembles. In: Deng, H., Wang, L., Wang, F.L., Lei, J. (eds) Artificial Intelligence and Computational Intelligence. AICI 2009. Lecture Notes in Computer Science(), vol 5855. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05253-8_69

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-05253-8_69

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-05252-1

  • Online ISBN: 978-3-642-05253-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics