Skip to main content

Advertisement

Log in

Efficient Genetic Algorithm Based Data Mining Using Feature Selection with Hausdorff Distance

  • Published:
Information Technology and Management Aims and scope Submit manuscript

Abstract

The development of powerful computers and faster input/output devices coupled with the need for storing and analyzing data have resulted in massive databases (of the order of terabytes). Such volumes of data clearly overwhelm more traditional data analysis methods. A new generation of tools and techniques are needed for finding interesting patterns in the data and discovering useful knowledge. In this paper we present the design of more effective and efficient genetic algorithm based data mining techniques that use the concepts of self-adaptive feature selection together with a wrapper feature selection method based on Hausdorff distance measure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. A. Ahmed and Deriche Mohamed, An optimal feature selection technique using the concept of mutual information, in: Proceedings of the International Symposium on Signal Processing and its Applications (ISSPA) (2001) 477–480.

  2. H. Alt, B. Behrends and J. Bloemer, Approximate matching of polygonal shapes (extended abstract), in: Proceedings of the Seventh Annual Symposium on Computational Geometry (1991) 186–193.

  3. P.J. Angeline, Adaptive and self-adaptive evolutionary computations, in, Computational Intelligence: A Dynamic Systems Perspectives, M. Palaniswami, Y.Attikiouzel, R. Marks, D. Fogel, and T. Fukuda (eds.), IEEE Press. Piscataway, NJ, 1995, pp. 152–163.

    Google Scholar 

  4. P.J. Angeline, Two self-adaptive crossover operations for genetic programming, in P. Angeline and K. Kinnear (eds.), Advances in Genetic Programming II, MIT Press, Cambridge, MA, 1996, pp. 152–163.

    Google Scholar 

  5. D. Aranjo, H. Lopes and A, Freitas, Rule discovery with a parallel genetic algorithm, in Data Mining with Evolutionary Algorithms (2000) 89–94.

  6. C. Babcock, Parallel processing mines retail data, Computer World 6 (1994).

  7. Thomas Back, Self-adaptation in genetic algorithms, in F.J. Varela and P. Bourgine (eds.), Towards a Practice of Autonomous Systems: Proceedings of the First European Conference on Artificial Life, MIT Press. Cambridge, MA, pp. 263–271.

  8. J. Bala, K. De Jong, J. Huang, H. Vafaei and H. Wechsler, Hybrid learning using genetic algorithms and decision trees for pattern classification, in Proc. of 14th Intl. Joint Conf. on Artificial Intelligence (IJCAI), (1995).

  9. J. Bala, K. De Jong, and P. Pachowicz, Multistrategy learning from engineering data by integrating inductive generalization and genetic algorithms, in Machine Learning: A Multistartegy Approach Volume IV, R. Michalski and G. Tecuci, (eds.), 1994, San Francisco: Morgan Kaufmann.

  10. R. Battiti, Using mutual information for selecting features in supervised neural net learning, IEEE Transactions on Neural Networks 5(4) (1994) 537–550.

    Article  Google Scholar 

  11. J.M. Benitez, J.L. Castro, C.J. Mantas, and F. Rojas, A Neuro-Fuzzy Approach for Feature Selection, Proceedings of IFSA World Congress and 20th NAFIPS International Conference 2 (2001) 1003–1008.

  12. S, Bhattacharya, Evolutionary algorithms in data mining: Multiobjective performance modeling for direct marketing, in: Proc. of 6 th ACM SIGKDD International Conf. On Knowledge Discovery and Data Mining, (2000) 465–473.

  13. P.S. Bradley, O.L. Mangasarian and W.N. Street, Feature selection in mathematical programming, INFORMS Journal on Computing 10(2) (1998).

  14. L. Breiman, J. Friedman, R. Olshen and C. Stone, Classification and Regression Trees Belmont, Calif.: Wadsworth 1984).

    Google Scholar 

  15. Bjorn Chambless and David Scarborough, Information theoretic feature selection for a neural behavioral model, Proceedings of the International Joint Conference on Neural Networks (IJCNN-01) 2 (2001) 1443–1448.

  16. K. Chellapilla and D.B. Fogel, Exploring self-adaptive methods to improve the efficiency of generating approximate solutions to travelling salesman problems using evolutionary programming, in P.J. Angeline, R.G. Reynolds, J.R. McDonnell and R. Eberhart, (eds.) Evolutionary Programming VI, Springer, 1997).

  17. Coetzee, Frans, M., Eric Glover, Steve Lawrence and C. Lee Giles, Feature Selection in Web Applications by ROC Inflections and Powerset Pruning, in Proceedings of the Symposium on Applications and the Internet (2001), 5–14.

  18. T.M. Cover, The best two independent measurements are not the two best, IEEE Transactions on Systems, Man, and Cybernetics, SMC-4:1 (1974) 116–117.

    Google Scholar 

  19. A. Csaszar, General Topology Adam Hilger, Bristol: 1978.

  20. B. Dasarathy, Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques, (IEEE Computer Society Press. Los Alamitos, CA., 1991).

    Google Scholar 

  21. S. Dzeroski, Inductive logic programming and knowledge discovery in databases, in Advances in Knowledge Discovery and Data Mining, AAAI Press (Menlo Park, Calif., 1996) pp. 117–152.

  22. J.D. Elashoff, R.M. Elashoff and G.E. Goldman, On the choice of variables in classification problems with dichotomous variables, Biometrika 54 (1967) 668–670.

    PubMed  Google Scholar 

  23. J. Elder and D. Pregibon, A Statistical perspective on knowledge discovery aaai press (menlo park, calif., in databases, in Advances in Knowledge Discovery and Data Mining, 1996).

  24. T. Elomaa and E. Ukkonen, A geometric approach to feature selection, in Proceedings of the European Conference on Machine Learning (1994) 351–354.

  25. C. Emmanouilidis, A. Hunter and J. MacIntyre, A multiobjective evolutionary setting for feature selection and a commonality-based crossover operator, in: Proc. of Congress on Evolutionary Computation (2000) 309–316.

  26. U. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy, Advances in Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, CA, (1996).

    Google Scholar 

  27. D.B. Fogel, L.J. Fogel and J.W. Atmar, Meta-evolutionary programming, in R.R. Chen (ed.). Proceedings of 25th Asilomar Conference on Signals, Systems, and Computers (1991), pp. 540–545.

  28. J. Friedman, Multivariate adaptive regression splines, Annals of Statistics 19 (1989, 1992) 1–141.

  29. D. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning (1989). Reading, MA: Addison-Wesley Publishing Co., Inc.

  30. C. Guerra-Salcedo and D. Whitley, Genetic approach to feature selection for ensemble creation, in Proc. of the Genetic and Evolutionary Computation Conference (1999) 236–243.

  31. J. Holland, Adaptations in Natural and Artificial Systems. 2nd Ed., (1992). MIT Press.

  32. W. Hsu, M. Welge, T. Redman and D. Clutter, Genetic wrappers for constructive induction in high-performance data mining, in: Proc. of the Genetic and Evolutionary Computation Conference (2000) 765.

  33. H. Ishibuchi and T. Nakashima, Multi-objective pattern and feature selection by a genetic algorithm, in: Proc. of the Genetic and Evolutionary Computation Conference (2000) 1069–1076.

  34. G. John, R. Kohavi and K. Pfleger, Irrelevant features and the subset selection problem, in: Proceedings of the 11th International Conference on Machine Learning (1994) 121–129.

  35. J. Kolodner, Case-Based Reasoning. San Francisco: Morgan Kaufmann (1993).

    Google Scholar 

  36. Y. Kim, W. Street and F. Menczer, Feature selection in unsupervised learning via evolutionary search, in: Proc. of the 6th ACM SIGKDD Intl. Conf. On Knowledge Discovery and Data Mining (2000) 365–369.

  37. K. Kira and L.A. Rendell, A practical approach to feature selection, in: Proc. of the 9th International Conference on Machine Learning (1992) 249–256.

  38. J. Kittler, Mathematical methods of feature selection in pattern recognition, International Journal of Man-Machine Studies 7 (1975) 609–637.

    Google Scholar 

  39. P. Lanzi, Fast feature selection with genetic algorithms: A filter approach, in: Proc. of IEEE Intl. Conf. on Evolutionary Computation (1997) 537–540.

  40. I. Lee, R. Sikora and M. Shaw, A genetic algorithm based approach to flexible flow-line scheduling with variable lot sizes, IEEE Transactions on Systems, Man, and Cybernetics 27B(1) (1995) 36–54.

    Google Scholar 

  41. J. Levenick, Inserting Introns Improves Genetic Algorithm Success Rate: Taking a Cue from Biology, in R. Belew and L. Booker (eds.) Proc. of the Fourth Intl. Conf. on Genetic Algorithms, (1991) pp. 123–127.

  42. H. Liu and R. Setiono, Feature selection via discretization, IEEE Transactions on Knowledge and Data Engineering 9(4) (1997) 642–645.

    Article  Google Scholar 

  43. S.B. Nadler jr, Hyperspaces of Sets (Marcel Dekker, New York: 1978).

    Google Scholar 

  44. E. Noda, A. Freitas and H. Lopes, Comparing a genetic algorithm with a rule induction algorithm in the data mining task of dependence modeling, in: Proc. of the Genetic and Evolutionary Computation Conference (2000) 1080.

  45. P. Nordin, F. Francone and W. Banzhaf, Explicitly defined introns and destructive crossover in genetic programming, in, P. Angeline and K. Kinnear (eds.), Advances in Genetic Programming: Volume 2, (1996), 111–134.

  46. D. Opitz, An evolutionary approach to feature set selection, in: Proc. of the Genetic and Evolutionary Computation Conference, (1999), 803.

  47. J. Pearl, Probabilistic Reasoning in Intelligent Systems. (Morgan Kaufmann, San Francisco, 1988).

    Google Scholar 

  48. W. Punch, E. Goodman, M. Pei, L. Chia-Shun, P. Hovland and R. Enbody, Further research on feature selection and classification using genetic algorithms, in S. Forrest (ed.), Proceedings of the 5th International Conference on Genetic Algorithms, (1993) 557–564.

  49. J. Quinlan, C4.5: Programs for Machine Learning. (Morgan Kaufmann, San Francisco, 1992).

    Google Scholar 

  50. M. Raymer, W. Punch, E. Goodman, Sanschagrin and L, Kuhn, Simultaneous feature scaling and selection using a genetic algorithm, in Proc. of the 7th Intl. Conf. On Genetic Algorithms (1997) 561–567.

  51. H.P. Schwefel, Numerical Optimization of Computer Models. (Wiley, Chichester, 1981).

    Google Scholar 

  52. R. Sikora, Learning control strategies for a chemical process: A distributed approach, IEEE Expert, (1992) 35–43.

  53. R. Sikora and M. Shaw, A double-layered learning approach to acquiring rules for classification: Integrating genetic algorithms with similarity-based learning, ORSA Journal on Computing 6(2) (1994) 174–187.

    Google Scholar 

  54. R. Sikora and S. Piramuthu, An intelligent fault diagnosis system for robotic machines, International Journal of Computational Intelligence and Organizations 1(3) (1996) 144–153.

    Google Scholar 

  55. G.T. Toussaint, Note on optimal selection of independent binary-valued features for pattern recognition, IEEE Transactions on Information Theory IT-17 (1971), 618.

    Google Scholar 

  56. P. Turney, How to shift bias: Lessons from the baldwin effect, Evolutionary Computation 4(3) (1997) 271–295.

    Google Scholar 

  57. H. Vafaie and K. De Jong, Improving a rule induction system using genetic algorithms, in R. Michalski and G. Tecuci (eds.), Machine Learning: A Multistartegy Approach Volume IV, (San Francisco: Morgan Kaufmann 1994).

  58. J. Whittaker, Graphical Models in Applied Multivariate Statistics. (Wiley, New York).

  59. J. Yang and V. Honavar, Feature subset selection using a genetic algorithm, IEEE Intelligent Systems 13(2) (1998) 44–49.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Riyaz Sikora.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sikora, R., Piramuthu, S. Efficient Genetic Algorithm Based Data Mining Using Feature Selection with Hausdorff Distance. Inf Technol Manage 6, 315–331 (2005). https://doi.org/10.1007/s10799-005-3898-3

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10799-005-3898-3

Keywords

Navigation