Skip to main content

Random Relevant and Non-redundant Feature Subspaces for Co-training

  • Conference paper
Intelligent Data Engineering and Automated Learning - IDEAL 2009 (IDEAL 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5788))

Abstract

Random feature subspace selection can produce diverse classifiers and help with Co-training as shown by RASCO algorithm of Wang et al. 2008. For data sets with many irrelevant or noisy feature, RASCO may end up with inaccurate classifiers. In order to remedy this problem, we introduce two algorithms for selecting relevant and non-redundant feature subspaces for Co-training. The first algorithm Rel-RASCO (Relevant Random Subspaces for Co-training) produces subspaces by drawing features with probabilities proportional to their relevances. We also modify a successful feature selection algorithm, mRMR (Minimum Redundancy Maximum Relevance), for random feature subset selection and introduce Prob-mRMR (Probabilistic-mRMR). Experiments on 5 datasets demonstrate that the proposed algorithms outperform both RASCO and Co-training in terms of accuracy achieved at the end of Co-training. Theoretical analysis of the proposed algorithms is also provided.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Roli, F.: Semi-supervised multiple classifier systems: Background and research directions. In: Oza, N.C., Polikar, R., Kittler, J., Roli, F. (eds.) MCS 2005. LNCS, vol. 3541, pp. 1–11. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  2. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proc. of the 11th Annual Conference on Computational Learning Theory (COLT 1998), pp. 92–100 (1998)

    Google Scholar 

  3. Wang, J., Luo, S.W., Zeng, X.H.: A random subspace method for co-training. In: International Joint Conference on Neural Networks(IJCNN 2008), pp. 195–200 (2008)

    Google Scholar 

  4. Li, M., Zhou, Z.H.: Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Transactions on Systems, Man and Cybernetics 6, 1088–1098 (2007)

    Article  Google Scholar 

  5. Didaci, L., Roli, F.: Using co-training and self-training in semi-supervised multiple classifier systems. In: Yeung, D.-Y., Kwok, J.T., Fred, A., Roli, F., de Ridder, D. (eds.) SSPR 2006 and SPR 2006. LNCS, vol. 4109, pp. 522–530. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Hady, M.F.A., Schwenker, F.: Co-training by committee: A new semi-supervised learning framework. In: IEEE International Conference on Data Mining Workshops, pp. 563–572 (2008)

    Google Scholar 

  7. Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, Hoboken (2004)

    Book  MATH  Google Scholar 

  8. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancys. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 1226–1238 (2005)

    Article  Google Scholar 

  9. Asuncion, A., Newman, D.: UCI machine learning repository (2007)

    Google Scholar 

  10. Boley, D., Gini, M., Gross, R., Han, E., Hastings, K., Karypis, G., Kumar, V., Mobasher, B., Moore, J.: Partitioning-based clustering for web document categorization. Decision Support Systems 27, 329–341 (1999)

    Article  Google Scholar 

  11. Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing 10(5), 293–302 (2002)

    Article  Google Scholar 

  12. Duin, R.: PRTOOLS A Matlab Toolbox for Pattern Recognition (2004)

    Google Scholar 

  13. Moerchen, F., Ultsch, A., Thies, M., Loehken, I.: Modelling timbre distance with temporal statistics from polyphonic music. IEEE Transactions on Speech and Audio Processingg 14, 81–90 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yaslan, Y., Cataltepe, Z. (2009). Random Relevant and Non-redundant Feature Subspaces for Co-training. In: Corchado, E., Yin, H. (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2009. IDEAL 2009. Lecture Notes in Computer Science, vol 5788. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04394-9_83

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04394-9_83

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04393-2

  • Online ISBN: 978-3-642-04394-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics