Skip to main content

On the Optimality of Subsets of Features Selected by Heuristic and Hyper-heuristic Approaches

  • Conference paper
Trends and Applications in Knowledge Discovery and Data Mining (PAKDD 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7867))

Included in the following conference series:

  • 3443 Accesses

Abstract

The concepts of relevance and redundancy are central to feature selection algorithms that do not use a learning algorithm for subset evaluation. Redundancy is in fact a special form of relevance where there is a correlation (linear or nonlinear) between the input features of a problem. Therefore, having a good heuristic for measuring relevance can also help detect redundancy. In this paper, we show that there is a lack of generality in the solutions found by heuristic measures. Through some counter-examples we show that regardless of the type of heuristic measure and search strategy, filter methods cannot optimise the performance of all learning algorithms. We show how different measures may have different notions of relevance between features and how this could lead to not detecting important features in certain problems. We then propose a hyper-heuristic method that generates an appropriate relevance measure for each problem. The new approach can alleviate problems related to missing relevant features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Asuncion, A., Newman, D.: UCI machine learning repository (2007), http://archive.ics.uci.edu/ml/index.html (last accessed 2010)

  2. Bell, D.A., Wang, H.: A formalism for relevance and its application in feature subset selection. Machine Learning 41(2), 175–195 (2000), http://dx.doi.org/10.1023/A:1007612503587

    Article  MATH  Google Scholar 

  3. Burke, E., Kendall, G., Newall, J., Hart, E., Ross, P., Schulenburg, S.: Hyper-heuristics: An emerging direction in modern search technology. Handbook of Metaheuristics, 457–474 (2003)

    Google Scholar 

  4. Burker, E.K., Hyde, M., Kendall, G., Ochoa, G., Özcan, E., Woodward, J.R.: A classification of hyper-heuristic approaches. Handbook of Metaheuristics, 449–468 (2010)

    Google Scholar 

  5. Carnap, R.: Logical foundations of probability. University of Chicago Press (1967)

    Google Scholar 

  6. Cheng, Q., Varshney, P.K., Arora, M.K.: Logistic regression for feature selection and soft classification of remote sensing data. IEEE Geoscience and Remote Sensing Letters 3(4), 491–494 (2006)

    Article  Google Scholar 

  7. Gärdenfors, P.: On the logic of relevance. Synthese 37(3), 351–367 (1978)

    Article  MathSciNet  Google Scholar 

  8. Keynes, J.: A treatise on probability. Macmillan & Co., Ltd. (1921)

    Google Scholar 

  9. Kohavi, R., John, G.: Wrappers for feature subset selection. Artificial Intelligence 97, 273–324 (1997)

    Article  MATH  Google Scholar 

  10. Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)

    MATH  Google Scholar 

  11. Last, M., Kandel, A., Maimon, O.: Information-theoretic algorithm for feature selection. Pattern Recognition Letters 22(6), 799–811 (2001), http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.22.5311

    Article  MATH  Google Scholar 

  12. Liu, H., Setiono, R.: Chi2: Feature selection and discretization of numeric attributes. In: Proceedings of the Seventh International Conference on Tools with Artificial Intelligence, pp. 388–391. IEEE (1995)

    Google Scholar 

  13. Özcan, E., Bilgin, B., Korkmaz, E.E.: A comprehensive analysis of hyper-heuristics. Intelligent Data Analysis 12(1), 3–23 (2008)

    Google Scholar 

  14. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1226–1238 (2005)

    Google Scholar 

  15. Poli, R., Graff, M.: There is a free lunch for hyper-heuristics, genetic programming and computer scientists. Genetic Programming, 195–207 (2009)

    Google Scholar 

  16. Wolpert, D., Macready, W.: No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation 1(1), 67–82 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Neshatian, K., Varn, L. (2013). On the Optimality of Subsets of Features Selected by Heuristic and Hyper-heuristic Approaches. In: Li, J., et al. Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2013. Lecture Notes in Computer Science(), vol 7867. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40319-4_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40319-4_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40318-7

  • Online ISBN: 978-3-642-40319-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics