Skip to main content

Advertisement

Log in

An evolutionary approach for automatically extracting intelligible classification rules

  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

The process of automatically extracting novel, useful and ultimately comprehensible information from large databases, known as data mining, has become of great importance due to the ever-increasing amounts of data collected by large organizations. In particular, the emphasis is devoted to heuristic search methods able to discover patterns that are hard or impossible to detect using standard query mechanisms and classical statistical techniques. In this paper an evolutionary system capable of extracting explicit classification rules is presented. Special interest is dedicated to find easily interpretable rules that may be used to make crucial decisions. A comparison with the findings achieved by other methods on a real problem, the breast cancer diagnosis, is performed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Anglano C, Giordana A, Lo Bello G et al (1997) A network genetic algorithm for concept learning. In: Proceedings of the 7th international conference on genetic algorithms. Kaufmann, San Francisco, CA, pp 434–441

  2. Augier S, Venturini G, Kodratoff Y (1995) Learning first order logic rules with a genetic algorithm. In: Proceedings of the 1st international conference on knowledge discovery and data mining. AAAI, Menlo Park, CA, pp 21–26

  3. Belew RK (1989) Adaptive information retrieval. In: Proceedings of the 12th annual international ACM/SIGIR conference on research and development in information retrieval, Cambridge, MA, 25–28 June, pp 11–20

  4. Bennett KP, Mangasarian OL (1992) Robust linear programming discrimination of two linearly inseparable sets. Optim Methods Softw 1:23–34

    Article  Google Scholar 

  5. Bojarczuk CC, Lopes HS, Freitas AA (1999) Discovering comprehensible classification rules using genetic programming: a case study in a medical domain. In: Proceedings of the genetic and evolutionary computation conference, Orlando, Florida, 14–17 July, pp 953–958

  6. Brameier M, Banzhaf W (2001) A comparison of linear genetic programming and neural networks. IEEE Trans Evol Comput 5(1):17–26

    Article  Google Scholar 

  7. Breeden JL, Packard NH (1992) A learning algorithm for optimal representations of experimental data. Tech Rep CCSR-92-11, University of Illinois Urbana–Champaign

  8. Carbonell JG, Michalski RS, Mitchell TM (1993) An overview of machine learning. In: Carbonell JG, Michalski RS, Mitchell TM (eds) Machine learning, an artificial intelligence approach. Tioga, Palo Alto, CA, pp 3–23

  9. Chen H, Dhar V (1991) Cognitive process as a basis for intelligent retrieval systems design. Inf Process Manage 27(5):405–432

    Article  Google Scholar 

  10. Chen H, Lynch KJ (1992) Automatic construction of networks of concepts characterizing document databases. IEEE Trans Syst Man Cybernet 22(5):885–902

    Article  Google Scholar 

  11. Chen H, Lynch KJ, Basu K et al (1993) Generating, integrating, and activating thesauri for concept-based document retrieval. IEEE EXPERT 8(2):25–34

    Article  Google Scholar 

  12. Chen M, Han J, Yu PS (1996) Data mining: an overview from database perspective. IEEE Trans Knowl Data Eng 8(6):866–883

    Article  Google Scholar 

  13. De La Iglesia B, Debuse JCW, Rayward-Smith VJ (1996) Discovering knowledge in commercial databases using modern heuristic techniques. In: Proceedings of the 2nd international conference on knowledge discovery and data mining. AAAI, Menlo Park, CA, pp 44–49

  14. Derkse W (1993) On simplicity and elegance. Delft, Eburon

  15. Fayyad UM, Piatetsky-Shapiro G, Smith P (1996) From data mining to knowledge discovery: an overview. In: Fayyad UM et al (eds) Advances in knowledge discovery and data mining. AAAI/MIT, pp 1–34

  16. Fogel DB, Wasson EC, Boughton EM et al (1998) Linear and neural models for classifying breast masses. IEEE Trans Med Imag 17(3):485–488

    Article  Google Scholar 

  17. Freitas AA (1997) A genetic programming framework for two data mining tasks: classification and generalized rule induction. In: Genetic programming 1997: proceedings of the 2nd annual conference. Kaufmann, San Francisco, CA, pp 96–101

  18. Fung G, Mangasarian OL (1999) Semi-supervised support vector machines for unlabeled data classification. Tech Rep, Computer Sciences Department, University of Wisconsin

  19. Giordana A, Saitta L, Zini F (1994) Learning disjunctive concepts by means of genetic algorithms. In: Proceedings of the 11th international conference on machine learning, pp 96–104

  20. Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, Reading, MA

  21. Goldberg DE, Richardson J (1987) Genetic algorithms with sharing for multimodal function optimization. In: Grefenstette JJ (ed) Genetic algorithms and their applications. Erlbaum, Hillsdale, NJ, pp 41–49

  22. Gordon M (1988) Probabilistic and genetic algorithms for document retrieval. Commun ACM 31(10):1208–1218

    Article  Google Scholar 

  23. Holland JH (1975) Adaptation in natural and artificial systems. MIT Press, Cambridge, MA

  24. Holsheimer M, and Siebes A (1994) Data mining: the search for knowledge in databases. Tech Rep CS-R9406, CWI, Amsterdam

    Google Scholar 

  25. Hung MS, Shanker M, Hu M (2001) Estimating breast cancer risks using neural networks. Eur J Oper Res Soc 52:1–10

    Article  Google Scholar 

  26. Ishibuchi H, Nozaki K, Yamamoto N et al (1995) Selecting fuzzy if-then rules for classification problems using genetic algorithms. IEEE Trans Fuzzy Syst 3(3):260–270

    Article  Google Scholar 

  27. Koza JR (1992) Genetic programming: on programming computers by means of natural selection and genetics. MIT Press, Cambridge, MA

    Google Scholar 

  28. Lee CH, Shin DG (1999) A multistrategy approach to classification learning in databases. Data Knowl Eng 31:67–93

    Article  Google Scholar 

  29. Little RJA, Rubin DB (1987) Statistical analysis with missing data. Wiley, New York

  30. Lu H, Setiono R, Liu H (1995) NeuroRule: a connectionist approach to data mining. In: Proceedings of the 21st international conference on very large data bases, pp 478–489

  31. Mangasarian OL, Setiono R, Wolberg WH (1990) Pattern recognition via linear programming: theory and applications to medical diagnosis. In: Coleman TF et al (eds) Large-scale numerical optimization. SIAM, Philadelphia, pp 22–30

  32. Mangasarian OL, Street WN, Wolberg WH (1995) Breast cancer diagnosis and prognosis via linear programming. Oper Res 43(4):570–577

    Article  MathSciNet  Google Scholar 

  33. Michalski RS (1983) A theory and methodology of inductive learning. In: Michalski RS, Carbonell JG, Mitchell TM (eds) Machine learning, an artificial intelligence approach. Tioga, Palo Alto, CA, pp 83–134

  34. Mühlenbein H, Schlierkamp-Voosen D (1993) Predictive models for the breeder genetic algorithm: I. Continuous parameter optimization. Evol Comput 1(1):2–49

    Google Scholar 

  35. Mühlenbein H, Schlierkamp-Voosen D (1994) Strategy adaptation by competing subpopulations, In: Proceedings of the international conference on parallel problem solving from nature. Springer, Berlin Heidelberg New York, pp 199–208

  36. Neri F, Giordana A (1995) A parallel genetic algorithm for concept learning. In: Proceedings of the 6th international conference on genetic algorithms. Kaufmann, San Mateo, CA, pp 436–443

  37. Ngan PS, Wong ML, Leung KS (1998) Using grammar based genetic programming for data mining of medical knowledge. In: Genetic programming 1998: proceedings of the 3rd annual conference. Kaufmann, San Francisco, CA, pp 304–312

  38. Noda E, Freitas AA, Lopes HS (1999) Discovering interesting prediction rules with a genetic algorithm. In: Proceedings of the congress on evolutionary computation, Washington, DC, 6–9 July, pp 1322–1329

  39. Peña CA, Sipper M (1999) Designing breast cancer diagnosis systems via a hybrid fuzzy-genetic methodology. In: Proceedings of the IEEE international fuzzy systems conference, vol 1, pp 135–139

  40. Piatesky-Shapiro G (1991) Discovery, analysis and presentation of strong rules. In: Piatesky-Shapiro G, Frawley W (eds) Knowledge discovery in databases. AAAI, Menlo Park, CA, pp 229–248

  41. Prechelt L (1994) Proben1—a set of neural network benchmark problems and benchmarking rules. Tech Rep 21/94, Fakultät für Informatik, Universität Karlsruhe, Germany

  42. Prechelt L (1995) Some notes on neural learning algorithm benchmarking. Neurocomputing 9(3):343–347

    Article  Google Scholar 

  43. Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106

    Google Scholar 

  44. Quinlan JR (1993) C4.5: programs for machine learning. Kaufmann, San Mateo, CA

  45. Radcliffe NJ, Surry PD (1994) Co-operation through hierarchical competition in genetic data mining. Tech Rep 94-09, Edinburgh Parallel Computing Centre, University of Edinburgh, Scotland

  46. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by error propagation. In: Rumelhart DE, McClelland JL, the PDP Res Group (eds) Parallel distributed processing. MIT Press, Cambridge, MA, pp 318–362

  47. Salim M, Yao X (2002) Evolving SQL queries for data mining. In: Yin H, Allinson N, Freeman R, Keane J, Hubbard S (eds) Proceedings of the 3rd international conference on intelligent data engineering and automated learning (IDEAL’02). Lecture notes in computer science, vol 2412. Springer, Berlin Heidelberg New York, pp 62–67

  48. Setiono R, Hui LCK (1995) Use of a quasi-Newton method in a feedforward neural networks construction algorithm. IEEE Trans Neural Net 6(1):273–277

    Article  Google Scholar 

  49. Shafer J (1997) Analysis of incomplete multivariate data. Chapman and Hall, New York

  50. Sherrah JR, Bogner RE, Bouzerdoum A (1997) The evolutionary pre-processor: automatic feature extraction for supervised classification using genetic programming. In: Proceedings of the 2nd annual genetic programming conference. Kaufmann, Stanford University, 13–16 July, pp 304–312

  51. Smith RE, Forrest S, Perelson AS (1992) Searching for diverse, cooperative populations with genetic algorithms. Evol Comput 1(2):127–149

    Article  Google Scholar 

  52. Taha I, Ghosh J (1997) Evaluation and ordering of rules extracted from feedforward networks. In: Proceedings of the IEEE international conference on neural networks, Houston, TX, pp 221–226

  53. Weiss SM, Kulikowski CA (1991) Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems. Kaufmann, San Mateo, CA

    Google Scholar 

  54. Wolberg WH, Mangasarian OL (1990) Multisurface method of pattern separation for medical diagnosis applied to breast cancer cytology. Proc Natl Acad Sci 87:9193–9196

    Article  Google Scholar 

  55. Wolberg WH, Street WN, Mangasarian OL (1995) Image analysis and machine learning applied to breast cancer diagnosis and prognosis. Anal Quant Cytol Histol 17(2):77–87

    Google Scholar 

  56. Yao X, Liu Y (1997) A new evolutionary system for evolving artificial neural networks. IEEE Trans Neural Net 8(3):694–713

    Article  Google Scholar 

  57. Ziarko W (1994) Rough sets, fuzzy sets and knowledge discovery. Springer, Berlin Heidelberg New York

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to E. Tarantino.

Rights and permissions

Reprints and permissions

About this article

Cite this article

De Falco, I., Della Cioppa, A., Iazzetta, A. et al. An evolutionary approach for automatically extracting intelligible classification rules. Knowl Inf Syst 7, 179–201 (2005). https://doi.org/10.1007/s10115-003-0143-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-003-0143-4

Keywords