Random Relevant and Non-redundant Feature Subspaces for Co-training

Yaslan, Yusuf; Cataltepe, Zehra

doi:10.1007/978-3-642-04394-9_83

Yusuf Yaslan¹⁸ &
Zehra Cataltepe¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5788))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1875 Accesses
1 Citations

Abstract

Random feature subspace selection can produce diverse classifiers and help with Co-training as shown by RASCO algorithm of Wang et al. 2008. For data sets with many irrelevant or noisy feature, RASCO may end up with inaccurate classifiers. In order to remedy this problem, we introduce two algorithms for selecting relevant and non-redundant feature subspaces for Co-training. The first algorithm Rel-RASCO (Relevant Random Subspaces for Co-training) produces subspaces by drawing features with probabilities proportional to their relevances. We also modify a successful feature selection algorithm, mRMR (Minimum Redundancy Maximum Relevance), for random feature subset selection and introduce Prob-mRMR (Probabilistic-mRMR). Experiments on 5 datasets demonstrate that the proposed algorithms outperform both RASCO and Co-training in terms of accuracy achieved at the end of Co-training. Theoretical analysis of the proposed algorithms is also provided.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Roli, F.: Semi-supervised multiple classifier systems: Background and research directions. In: Oza, N.C., Polikar, R., Kittler, J., Roli, F. (eds.) MCS 2005. LNCS, vol. 3541, pp. 1–11. Springer, Heidelberg (2005)
Chapter Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proc. of the 11th Annual Conference on Computational Learning Theory (COLT 1998), pp. 92–100 (1998)
Google Scholar
Wang, J., Luo, S.W., Zeng, X.H.: A random subspace method for co-training. In: International Joint Conference on Neural Networks(IJCNN 2008), pp. 195–200 (2008)
Google Scholar
Li, M., Zhou, Z.H.: Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Transactions on Systems, Man and Cybernetics 6, 1088–1098 (2007)
Article Google Scholar
Didaci, L., Roli, F.: Using co-training and self-training in semi-supervised multiple classifier systems. In: Yeung, D.-Y., Kwok, J.T., Fred, A., Roli, F., de Ridder, D. (eds.) SSPR 2006 and SPR 2006. LNCS, vol. 4109, pp. 522–530. Springer, Heidelberg (2006)
Chapter Google Scholar
Hady, M.F.A., Schwenker, F.: Co-training by committee: A new semi-supervised learning framework. In: IEEE International Conference on Data Mining Workshops, pp. 563–572 (2008)
Google Scholar
Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, Hoboken (2004)
Book MATH Google Scholar
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancys. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 1226–1238 (2005)
Article Google Scholar
Asuncion, A., Newman, D.: UCI machine learning repository (2007)
Google Scholar
Boley, D., Gini, M., Gross, R., Han, E., Hastings, K., Karypis, G., Kumar, V., Mobasher, B., Moore, J.: Partitioning-based clustering for web document categorization. Decision Support Systems 27, 329–341 (1999)
Article Google Scholar
Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing 10(5), 293–302 (2002)
Article Google Scholar
Duin, R.: PRTOOLS A Matlab Toolbox for Pattern Recognition (2004)
Google Scholar
Moerchen, F., Ultsch, A., Thies, M., Loehken, I.: Modelling timbre distance with temporal statistics from polyphonic music. IEEE Transactions on Speech and Audio Processingg 14, 81–90 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Istanbul Technical University Computer Engineering Department, 34469, Maslak, Istanbul, Turkey
Yusuf Yaslan & Zehra Cataltepe

Authors

Yusuf Yaslan
View author publications
You can also search for this author in PubMed Google Scholar
Zehra Cataltepe
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Escuela Politécnica Superior, Universidad de Burgos, Calle Francisco de Vitoria, S/N, Edifico C, 09006, Burgos, Spain
Emilio Corchado
School of Electrical and Electronic Engineering, University of Manchester, Sackville Street Building, Sackville Street, M60 1QD, Manchester, UK
Hujun Yin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yaslan, Y., Cataltepe, Z. (2009). Random Relevant and Non-redundant Feature Subspaces for Co-training. In: Corchado, E., Yin, H. (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2009. IDEAL 2009. Lecture Notes in Computer Science, vol 5788. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04394-9_83

Download citation

DOI: https://doi.org/10.1007/978-3-642-04394-9_83
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04393-2
Online ISBN: 978-3-642-04394-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics