Less Biased Measurement of Feature Selection Benefits

Reunanen, Juha

doi:10.1007/11752790_14

Juha Reunanen²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3940))

Included in the following conference series:

International Statistical and Optimization Perspectives Workshop "Subspace, Latent Structure and Feature Selection"

3783 Accesses
2 Citations

Abstract

In feature selection, classification accuracy typically needs to be estimated in order to guide the search towards the useful subsets. It has earlier been shown [1] that such estimates should not be used directly to determine the optimal subset size, or the benefits due to choosing the optimal set. The reason is a phenomenon called overfitting, thanks to which these estimates tend to be biased. Previously, an outer loop of cross-validation has been suggested for fighting this problem. However, this paper points out that a straightforward implementation of such an approach still gives biased estimates for the increase in accuracy that could be obtained by selecting the best-performing subset. In addition, two methods are suggested that are able to circumvent this problem and give virtually unbiased results without adding almost any computational overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Reunanen, J.: A pitfall in determining the optimal feature subset size. In: Proc. of the 4th Int. Workshop on Pattern Recognition in Information Systems (PRIS 2004), Porto, Portugal, pp. 176–185 (2004)
Google Scholar
Schalkoff, R.J.: Pattern Recognition: Statistical, Structural and Neural Approaches. John Wiley & Sons, Inc., Chichester (1992)
Google Scholar
Devijver, P.A., Kittler, J.: Pattern Recognition: A Statistical Approach. Prentice–Hall International, Englewood Cliffs (1982)
MATH Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
John, G.H., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Proc. of the 11th Int. Conf. on Machine Learning (ICML 1994), New Brunswick, NJ, USA, pp. 121–129 (1994)
Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
MATH Google Scholar
Stone, M.: Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society 36(2), 111–133 (1974)
MathSciNet MATH Google Scholar
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proc. of the 14th Int. Joint Conf. on Artificial Intelligence (IJCAI 1995), Montreal, Canada, pp. 1137–1143 (1995)
Google Scholar
Whitney, A.W.: A direct method of nonparametric measurement selection. IEEE Transactions on Computers 20(9), 1100–1103 (1971)
Article MATH Google Scholar
Pudil, P., Novovičová, J., Kittler, J.: Floating search methods in feature selection. Pattern Recognition Letters 15(11), 1119–1125 (1994)
Article Google Scholar
Somol, P., Pudil, P., Novovičová, J., Paclík, P.: Adaptive floating search methods in feature selection. Pattern Recognition Letters 20(11–13), 1157–1163 (1999)
Article Google Scholar
Jensen, D.D., Cohen, P.R.: Multiple comparisons in induction algorithms. Machine Learning 38(3), 309–338 (2000)
Article MATH Google Scholar
Reunanen, J.: Overfitting in making comparisons between variable selection methods. Journal of Machine Learning Research 3, 1371–1382 (2003)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

ABB, Web Imaging Systems, P.O. Box 94, 00381, Helsinki, Finland
Juha Reunanen

Authors

Juha Reunanen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ISIS Research Group, University of Southampton, Southampton, U.K.
Craig Saunders
Dept. of Knowledge Technologies, Jozef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Marko Grobelnik
School of Electronics and Computer Science, University of Southampton, Building 1, Highfield Campus, SO17 1BJ, Southampton, UK
Steve Gunn
The Centre for Computational Statistics and Machine Learning Department of Computer Science, University College London, Gower St., WC1E 6BT, London, UK
John Shawe-Taylor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Reunanen, J. (2006). Less Biased Measurement of Feature Selection Benefits. In: Saunders, C., Grobelnik, M., Gunn, S., Shawe-Taylor, J. (eds) Subspace, Latent Structure and Feature Selection. SLSFS 2005. Lecture Notes in Computer Science, vol 3940. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11752790_14

Download citation

DOI: https://doi.org/10.1007/11752790_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34137-6
Online ISBN: 978-3-540-34138-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics