Skip to main content

Feature Selection Based on Pairwise Classification Performance

  • Conference paper
Computer Aided Systems Theory - EUROCAST 2009 (EUROCAST 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5717))

Included in the following conference series:

Abstract

The process of feature selection is an important first step in building machine learning models. Feature selection algorithms can be grouped into wrappers and filters; the former use machine learning models to evaluate feature sets, the latter use other criteria to evaluate features individually. We present a new approach to feature selection that combines advantages of both wrapper as well as filter approaches, by using logistic regression and the area under the ROC curve (AUC) to evaluate pairs of features. After choosing as starting feature the one with the highest individual discriminatory power, we incrementally rank features by choosing as next feature the one that achieves the highest AUC in combination with an already chosen feature. To evaluate our approach, we compared it to standard filter and wrapper algorithms. Using two data sets from the biomedical domain, we are able to demonstrate that the performance of our approach exceeds that of filter methods, while being comparable to wrapper methods at smaller computational cost.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. The Journal of Machine Learning Research 3, 1157–1182 (2003)

    MATH  Google Scholar 

  2. Hall, M.A., Holmes, G.: Benchmarking attribute selection techniques for discrete class data mining. IEEE Transactions on Knowledge and Data Engineering 15, 1437–1447 (2003)

    Article  Google Scholar 

  3. Kohavi, R., John, G.H.: The Wrapper Approach. In: Feature Selection for Knowledge Discovery and Data Mining, pp. 33–50. Kluwer Academic Publishers, Dordrecht (1998)

    Google Scholar 

  4. Yang, J., Honavar, V.: Feature subset selection using a genetic algorithm. IEEE Intelligent Systems and their Applications 13, 44–49 (1998)

    Article  Google Scholar 

  5. John, G.H., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Proceedings of the 11th International Conference on Machine Learning (1994)

    Google Scholar 

  6. Bo, T.H., Jonassen, I.: New feature subset selection procedures for classification of expression profiles. Genome Biology 3 (2002); research0017.1–0017.11

    Google Scholar 

  7. Pekalska, E., Harol, A., Lai, C., Duin, R.P.W.: Pairwise selection of features and prototypes. In: Proceedings of the 4th International Conference on Computer Recognition Systems, pp. 271–278 (2005)

    Google Scholar 

  8. Harol, A., Lai, C., Pekalska, E., Duin, R.P.W.: Pairwise feature evaluation for constructing reduced representations. Pattern Analysis & Applications 10, 1433–7541 (2007)

    Article  MathSciNet  Google Scholar 

  9. Michalak, K., Kwasnicka, H.: Correlation-based feature selection strategy in classification problems. International Journal of Applied Mathematics and Computer Science 16, 503–511 (2006)

    MathSciNet  MATH  Google Scholar 

  10. Hosmer, D.W., Lemeshow, S.: Applied Logistic Regression, 2nd edn. Wiley-Interscience Publication, Hoboken (2000)

    Book  MATH  Google Scholar 

  11. Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143, 29–36 (1982)

    Article  Google Scholar 

  12. Provost, F., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. In: Proceedings of the 15th International Conference on Machine Learning, pp. 445–453 (1998)

    Google Scholar 

  13. Kononenko, I.: Estimating attributes: Analysis and extensions of RELIEF. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)

    Chapter  Google Scholar 

  14. Spira, A., Beane, J.E., Shah, V., Steiling, K., Liu, G., Schembri, F., Gilman, S., Dumas, Y.M., Calner, P., Sebastiani, P., Sridhar, S., Beamis, J., Lamb, C., Anderson, T., Gerry, N., Keane, J., Lunburg, M.E., Brody, J.S.: Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer. Nature Medicine 13, 361–366 (2007)

    Article  Google Scholar 

  15. Dreiseitl, S., Ohno-Machado, L., Kittler, H., Vinterbo, S., Billhardt, H., Binder, M.: A comparison of machine learning methods for diagnosis of pigmented skin lesions. Journal of Biomedical Informatics 34, 28–36 (2001)

    Article  Google Scholar 

  16. Osl, M., Dreiseitl, S., Cerqueira, F., Netzer, M., Baumgartner, C.: Improving feature ranking algorithms by demoting redundant features. J. Biomed. Inform. 42(4), 721–725 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dreiseitl, S., Osl, M. (2009). Feature Selection Based on Pairwise Classification Performance. In: Moreno-Díaz, R., Pichler, F., Quesada-Arencibia, A. (eds) Computer Aided Systems Theory - EUROCAST 2009. EUROCAST 2009. Lecture Notes in Computer Science, vol 5717. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04772-5_99

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04772-5_99

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04771-8

  • Online ISBN: 978-3-642-04772-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics