Abstract
In many real-world applications, there only exist very few labeled samples, while a large number of unlabeled samples are available. Therefore, it is difficult for some traditional semi-supervised algorithms to generate the useful classifiers to evaluate the labeling confidence of unlabeled samples. In this paper, a new semi-supervised classification based on clustering ensembles named SSCCE is proposed. It takes advantages of clustering ensembles to generate multiple partitions for a given dataset, and then uses the clustering consistency index to determine the labeling confidence of unlabeled samples. The algorithm can overcome some defects about the traditional semi-supervised classification algorithms, and enhance the performance of the hypothesis trained on very few labeled samples by exploiting a large number of unlabeled samples. Experiments carried out on ten public data sets from UCI machine learning repository show that this method is effective and feasible.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Zhou, Z.H., Li, M.: Semi-Supervised Regression with Co-Training Style Algorithms. IEEE Transactions on Knowledge and Data Engineering 19, 1479–1493 (2007)
Gabrys, B., Petrakieva, L.: Combining Labelled and Unlabelled Data in the Design of Pattern Classification Systems. International Journal of Approximate Reasoning, 251–273 (2004)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. School of Information and Computer Science. University of California, Irvine, CA, http://www.ics.uci.edu/~mlearn/MLRepository.html
Nigam, K., Ghani, R.: Analyzing the Effectiveness and Applicability of Co-Training. In: Proceedings of the 9th International Conference on Information and Knowledge Management, pp. 86–93 (2000)
Li, M., Zhou, Z.H.: Improve Computer-Aided Diagnosis with Machine Learning Techniques Using Undiagnosed Samples. IEEE Transactions on Systems, Man and Cybernetics – Part A: Systems and Humans 37, 1088–1098 (2007)
Blum, A., Mitchell, T.: Combining Labeled and Unlabeled Data with Co-Training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT 1998), MI, Wisconsin, pp. 92–100 (1998)
Goldman, S., Zhou, Y.: Enhancing Supervised Learning with Unlabeled Data. In: Proceedings of the 17th International Conference on Machine Learning (ICML 2000), CA, San Francisco, pp. 327–334 (2000)
Zhou, Z.H., Li, M.: Tri-training: Exploiting Unlabeled Data Using Three Classifiers. IEEE Transactions on Knowledge and Data Engineering 17, 1529–1541 (2005)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)
Topchy, A., Jain, A.K., Punch, W.: A Mixture Model for Clustering Ensembles. In: Proceeding of the 4th SIAM International Conference on Data Mining, pp. 379–390 (2004)
Fred, A.: Finding Consistent Clusters in Data Partitions. In: Kittler, J., Roli, F. (eds.) MCS 2001. LNCS, vol. 2096, pp. 309–318. Springer, Heidelberg (2001)
Roussopoulos, N., Kelly, S., Vincent, F.: Nearest Neighbor Queries. In: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, pp. 71–79 (1995)
Li, M., Zhou, Z.H.: SETRED: Self-Training with Editing. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 611–621. Springer, Heidelberg (2005)
Dubes, R., Jain, A.K.: Clustering Techniques: the User’s Dilemma. Pattern Recognition 41, 578–588 (1998)
Topchy, A., Minaei-Bidgoli, B., Jain, A.K., Punch, W.F.: Adaptive Clustering Ensembles. In: Proceedings of the 17th International Conference on Pattern Recognition (ICPR 2004), vol. 1, pp. 272–275 (2004)
Kothari, R., Jain, V.: Learning from Labeled and Unlabeled Data. In: IEEE World Congress on Computational Intelligence, IEEE International Joint Conference on Neural Networks, HI, Honolulu, USA, pp. 1468–1474 (2002)
Breiman, L.: Random Forests. Machine Learning 45, 5–32 (2001)
Zhou, Z.H., Tang, W.: Clusterer Ensemble. Knowledge-Based Systems 9, 77–83 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, S., Guo, G., Chen, L. (2009). Semi-supervised Classification Based on Clustering Ensembles. In: Deng, H., Wang, L., Wang, F.L., Lei, J. (eds) Artificial Intelligence and Computational Intelligence. AICI 2009. Lecture Notes in Computer Science(), vol 5855. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05253-8_69
Download citation
DOI: https://doi.org/10.1007/978-3-642-05253-8_69
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05252-1
Online ISBN: 978-3-642-05253-8
eBook Packages: Computer ScienceComputer Science (R0)