Semi-supervised Classification Based on Clustering Ensembles

Chen, Si; Guo, Gongde; Chen, Lifei

doi:10.1007/978-3-642-05253-8_69

Si Chen²³,
Gongde Guo²³ &
Lifei Chen²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5855))

Included in the following conference series:

International Conference on Artificial Intelligence and Computational Intelligence

2014 Accesses
3 Citations

Abstract

In many real-world applications, there only exist very few labeled samples, while a large number of unlabeled samples are available. Therefore, it is difficult for some traditional semi-supervised algorithms to generate the useful classifiers to evaluate the labeling confidence of unlabeled samples. In this paper, a new semi-supervised classification based on clustering ensembles named SSCCE is proposed. It takes advantages of clustering ensembles to generate multiple partitions for a given dataset, and then uses the clustering consistency index to determine the labeling confidence of unlabeled samples. The algorithm can overcome some defects about the traditional semi-supervised classification algorithms, and enhance the performance of the hypothesis trained on very few labeled samples by exploiting a large number of unlabeled samples. Experiments carried out on ten public data sets from UCI machine learning repository show that this method is effective and feasible.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Zhou, Z.H., Li, M.: Semi-Supervised Regression with Co-Training Style Algorithms. IEEE Transactions on Knowledge and Data Engineering 19, 1479–1493 (2007)
Article Google Scholar
Gabrys, B., Petrakieva, L.: Combining Labelled and Unlabelled Data in the Design of Pattern Classification Systems. International Journal of Approximate Reasoning, 251–273 (2004)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. School of Information and Computer Science. University of California, Irvine, CA, http://www.ics.uci.edu/~mlearn/MLRepository.html
Nigam, K., Ghani, R.: Analyzing the Effectiveness and Applicability of Co-Training. In: Proceedings of the 9th International Conference on Information and Knowledge Management, pp. 86–93 (2000)
Google Scholar
Li, M., Zhou, Z.H.: Improve Computer-Aided Diagnosis with Machine Learning Techniques Using Undiagnosed Samples. IEEE Transactions on Systems, Man and Cybernetics – Part A: Systems and Humans 37, 1088–1098 (2007)
Article Google Scholar
Blum, A., Mitchell, T.: Combining Labeled and Unlabeled Data with Co-Training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT 1998), MI, Wisconsin, pp. 92–100 (1998)
Google Scholar
Goldman, S., Zhou, Y.: Enhancing Supervised Learning with Unlabeled Data. In: Proceedings of the 17th International Conference on Machine Learning (ICML 2000), CA, San Francisco, pp. 327–334 (2000)
Google Scholar
Zhou, Z.H., Li, M.: Tri-training: Exploiting Unlabeled Data Using Three Classifiers. IEEE Transactions on Knowledge and Data Engineering 17, 1529–1541 (2005)
Article Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)
Google Scholar
Topchy, A., Jain, A.K., Punch, W.: A Mixture Model for Clustering Ensembles. In: Proceeding of the 4th SIAM International Conference on Data Mining, pp. 379–390 (2004)
Google Scholar
Fred, A.: Finding Consistent Clusters in Data Partitions. In: Kittler, J., Roli, F. (eds.) MCS 2001. LNCS, vol. 2096, pp. 309–318. Springer, Heidelberg (2001)
Chapter Google Scholar
Roussopoulos, N., Kelly, S., Vincent, F.: Nearest Neighbor Queries. In: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, pp. 71–79 (1995)
Google Scholar
Li, M., Zhou, Z.H.: SETRED: Self-Training with Editing. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 611–621. Springer, Heidelberg (2005)
Google Scholar
Dubes, R., Jain, A.K.: Clustering Techniques: the User’s Dilemma. Pattern Recognition 41, 578–588 (1998)
Google Scholar
Topchy, A., Minaei-Bidgoli, B., Jain, A.K., Punch, W.F.: Adaptive Clustering Ensembles. In: Proceedings of the 17th International Conference on Pattern Recognition (ICPR 2004), vol. 1, pp. 272–275 (2004)
Google Scholar
Kothari, R., Jain, V.: Learning from Labeled and Unlabeled Data. In: IEEE World Congress on Computational Intelligence, IEEE International Joint Conference on Neural Networks, HI, Honolulu, USA, pp. 1468–1474 (2002)
Google Scholar
Breiman, L.: Random Forests. Machine Learning 45, 5–32 (2001)
Article MATH Google Scholar
Zhou, Z.H., Tang, W.: Clusterer Ensemble. Knowledge-Based Systems 9, 77–83 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematics and Computer Science, Fujian Normal University, Fuzhou, China
Si Chen, Gongde Guo & Lifei Chen

Authors

Si Chen
View author publications
You can also search for this author in PubMed Google Scholar
Gongde Guo
View author publications
You can also search for this author in PubMed Google Scholar
Lifei Chen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Business Information Technology, RMIT University, City Campus, 124 La Trobe Street, 3000, Melbourne, Victoria, Australia
Hepu Deng
College of Metrological Technology and Engineering, China Jiliang University, 310018, Hangzhou, Zhejiang Province, China
Lanzhou Wang
Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon Tong, Hong Kong, China
Fu Lee Wang
College of Information Science and Technology, Hainan University, 570228, Haikou, China
Jingsheng Lei

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, S., Guo, G., Chen, L. (2009). Semi-supervised Classification Based on Clustering Ensembles. In: Deng, H., Wang, L., Wang, F.L., Lei, J. (eds) Artificial Intelligence and Computational Intelligence. AICI 2009. Lecture Notes in Computer Science(), vol 5855. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05253-8_69

Download citation

DOI: https://doi.org/10.1007/978-3-642-05253-8_69
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05252-1
Online ISBN: 978-3-642-05253-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics