Skip to main content

Feature Subset Selection by Estimation of Distribution Algorithms

  • Chapter
Book cover Estimation of Distribution Algorithms

Part of the book series: Genetic Algorithms and Evolutionary Computation ((GENA,volume 2))

Abstract

Feature Subset Selection is a well known task in the Machine Learn-ing, Data Mining, Pattern Recognition and Text Learning paradigms. In this chapter, we present a set of Estimation of Distribution Algorihtms (EDAs) inspired techniques to tackle the Feature Subset Selection problem in Machine Learning and Data Mining tasks. Bayesian networks are used to factorize the probability distribution of best solutions in small and medium dimensionality datasets, and simpler probabilistic models are used in larger dimensionality domains. In a comparison with different sequential and genetic-inspired algorithms in natural and artificial datasets, EDA-based approaches have obtained encouraging accuracy results and need a smaller number of evaluations than genetic approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Aha, D.W. and Bankert, R.L. (1994). Feature selection for case-based classification of cloud types: An empirical comparison. In Proceedings of the AAAI’94 Workshop on Case-Based Reasoning, pages 106–112.

    Google Scholar 

  • Alpaydin, E. (1999). Combined 5x2cv f test for comparing supervised classification learning algorithms. Neural Computation, 11:1885–1892.

    Article  Google Scholar 

  • Bäck, T. (1996). Evolutionary Algorithms in Theory and Practice. Oxford University Press.

    Google Scholar 

  • Baluja, S. (1994). Population-based incremental learning: A method for integrating genetic search based function optimization and competitive learning. Technical Report CMU-CS-94–163, Carnegie Mellon University, Pittsburgh, PA.

    Google Scholar 

  • Blum, A.L. and Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence, 97:245–271.

    Article  MathSciNet  MATH  Google Scholar 

  • Buntine, W. (1991). Theory refinement in Bayesian networks. In Proceedings of the Seventh Conference on Uncertainty in Artificial Intelligence, pages 52–60.

    Google Scholar 

  • Cestnik, B. (1990). Estimating probabilities: a crucial task in machine learning. In Proceedings of the European Conference on Artificial Intelligence, pages 147–149.

    Google Scholar 

  • Chow, C. and Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14:462467.

    Google Scholar 

  • De Bonet, J.D., Isbell, C.L., and Viola, P. (1997). MIMIC: Finding optima by estimating probability densities. In Advances in Neural Information Processing Systems, volume 9. MIT Press.

    Google Scholar 

  • Doak, J. (1992). An evaluation of feature selection methods and their application to computer security. Technical Report CSE-92–18, University of California at Davis.

    Google Scholar 

  • Etxeberria, R. and Larrañaga, P. (1999). Global optimization with Bayesian networks. In II Symposium on Artificial Intelligence. CIMAF99. Special Session on Distributions and Evolutionary Optimization, pages 332–339.

    Google Scholar 

  • Ferri, F.J., Pudil, P., Hatef, M., and Kittler, J. (1994). Comparative study of techniques for large scale feature selection. In Gelsema, E.S. and Kanal, L.N., editors, Multiple Paradigms, Comparative Studies and Hybrid Systems, pages 403–413. North Holland.

    Google Scholar 

  • Friedman, N. and Yakhini, Z. (1996). On the sample complexity of learning Bayesian networks. In Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence, pages 274–282.

    Google Scholar 

  • González, C., Lozano, J. A., and Larrañaga, P. (2001). The converge behavior of PBIL algorithm: a preliminary approach. In Kurková, V., Steel, N. C., Neruda, R., and Kárnÿ, M., editors, International Conference on Artificial Neural Networks and Genetic Algorithms. ICANNGA-2001, pages 228–231. Springer.

    Google Scholar 

  • Grefenstette, J.J. (1986). Optimization of control parameters for genetic algorithms. IEEE Transactions on Systems, Man,and Cybernetics, 16(1):122–128.

    Article  Google Scholar 

  • Harik G.R. and Goldberg, D.E. (1996). Learning linkage. Technical Report IlliGAL Report 99003, University of Illinois at Urbana-Champaign, Illinois Genetic Algorithms Laboratory.

    Google Scholar 

  • Inza, I., Larrañaga, P., Etxeberria, R., and Sierra, B. (2000). Feature subset selection by Bayesian network-based optimization. Artificial Intelligence, 123(1–2):157–184.

    Article  MATH  Google Scholar 

  • Jain, A.K. and Chandrasekaran, R. (1982). Dimensionality and sample size considerations in pattern recognition practice. In Krishnaiah, P.R. and Kanal, L.N., editors, Handbook of Statistics, volume 2, pages 835–855. North-Holland.

    Google Scholar 

  • Jain, A.K. and Zongker, D. (1997). Feature selection: Evaluation, application, and small sample performance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(2):153–158.

    Article  Google Scholar 

  • Kittler, J. (1978). Feature set search algorithms. In Chen, C., editor, Pattern Recognition and Signal Processing, pages 41–60.

    Chapter  Google Scholar 

  • Sithoff and Noordhoff. Kohavi, R. and John, G. (1997). Wrappers for feature subset selection. ArtificialIntelligence, 97(1–2):273–324.

    Google Scholar 

  • Kohavi, R., Sommerfield, D., and Dougherty, J. (1997). Data mining using MLC++, a machine learning library in C++. International Journal of Artificial Intelligence Tools, 6:537–566.

    Article  Google Scholar 

  • Kudo, M. and Sklansky, J. (2000). Comparison of algorithms that select features for pattern classifiers. Pattern Recognition, 33:25–41.

    Article  Google Scholar 

  • Langley, P. and Sage, S. (1994). Induction of selective Bayesian classifiers. In Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence, pages 399–406.

    Google Scholar 

  • Liu, H. and Motoda, H. (1998). Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers.

    Book  MATH  Google Scholar 

  • Miller, A.J. (1990). Subset Selection in Regression. Chapman and Hall.

    Google Scholar 

  • Mladenic, M. (1998). Feature subset selection in text-learning. In Proceedings of the Tenth European Conference on Machine Learning, pages 95–100.

    Google Scholar 

  • Murphy, P. (1995). UCI Repository of machine learning databases. University of California, Department of Information and Computer Science.

    Google Scholar 

  • Narendra, P. and Fukunaga, K. (1977). A branch and bound algorithm for feature subset selection. IEEE Transactions on Computer, C-26(9):917–922.

    Article  MATH  Google Scholar 

  • Ng, A.Y. (1997). Preventing “overfitting” of cross-validation data. In Proceedings of the Fourteenth International Conference on Machine Learning, pages 245–253.

    Google Scholar 

  • Pelikan, M., Goldberg, D.E., and Cantú-Paz, E. (1998). Linkage problem, distribution estimation, and Bayesian networks. Technical Report IlliGAL Report 98013, University of Illinois at Urbana-Champaign, Illinois Genetic Algorithms Laboratory.

    Google Scholar 

  • Pelikan, M. and Müehlenbein, H. (1999). The bivariate marginal distribution algorithm. In Advances in Soft Computing-Engineering Design and Manufacturing, pages 521–535. Springer-Verlag.

    Google Scholar 

  • Pudil, P., Novovicova, J., and Kittler, J. (1994). Floating search methods in feature selection. Pattern Recognition Letters, 15(1):1119–1125.

    Article  Google Scholar 

  • Rana, S., Whitley, L.D., and Cogswell, R. (1996). Searching in the presence of noise. In Lecture Notes in Computer Science 1411: Parallel Problem Solving from Nature - PPSN IV, pages 198–207.

    Google Scholar 

  • Sangüesa, R., Cortés, U., and Gisolfi, A. (1998). A parallel algorithm for building possibilistic causal networks. International Journal of Approximate Reasoning, 18(3–4):251–270.

    Article  MathSciNet  MATH  Google Scholar 

  • Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 7:461–464.

    Article  Google Scholar 

  • Siedelecky, W. and Sklansky, J. (1988). On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence, 2:197–220.

    Article  Google Scholar 

  • Syswerda, G. (1993). Simulated crossover in genetic algorithms. In Whitley, L.D., editor, Foundations of Genetic Algorithms, volume 2, pages 239–255.

    Google Scholar 

  • Thierens, D. and Goldberg, D.E. (1993). Mixing in genetic algorithms. In Proceedings of the Fifth International Conference in Genetic Algorithms, pages 38–45.

    Google Scholar 

  • Xiang, Y. and Chu, T. (1999). Parallel learning of belief networks in large and difficult domains. Data Mining and Knowledge Discovery, 3(3):315–338.

    Article  Google Scholar 

  • Yang, Y. and Pedersen, J.O. (1997). A comparative study on feature selection in text categorization. In Proceedings of the Fourteenth International Conference on Machine Learning, pages 412–420.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer Science+Business Media New York

About this chapter

Cite this chapter

Inza, I., Larrañaga, P., Sierra, B. (2002). Feature Subset Selection by Estimation of Distribution Algorithms. In: Larrañaga, P., Lozano, J.A. (eds) Estimation of Distribution Algorithms. Genetic Algorithms and Evolutionary Computation, vol 2. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-1539-5_13

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-1539-5_13

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-5604-2

  • Online ISBN: 978-1-4615-1539-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics