Abstract
The process of automatically extracting novel, useful and ultimately comprehensible information from large databases, known as data mining, has become of great importance due to the ever-increasing amounts of data collected by large organizations. In particular, the emphasis is devoted to heuristic search methods able to discover patterns that are hard or impossible to detect using standard query mechanisms and classical statistical techniques. In this paper an evolutionary system capable of extracting explicit classification rules is presented. Special interest is dedicated to find easily interpretable rules that may be used to make crucial decisions. A comparison with the findings achieved by other methods on a real problem, the breast cancer diagnosis, is performed.
Similar content being viewed by others
References
Anglano C, Giordana A, Lo Bello G et al (1997) A network genetic algorithm for concept learning. In: Proceedings of the 7th international conference on genetic algorithms. Kaufmann, San Francisco, CA, pp 434–441
Augier S, Venturini G, Kodratoff Y (1995) Learning first order logic rules with a genetic algorithm. In: Proceedings of the 1st international conference on knowledge discovery and data mining. AAAI, Menlo Park, CA, pp 21–26
Belew RK (1989) Adaptive information retrieval. In: Proceedings of the 12th annual international ACM/SIGIR conference on research and development in information retrieval, Cambridge, MA, 25–28 June, pp 11–20
Bennett KP, Mangasarian OL (1992) Robust linear programming discrimination of two linearly inseparable sets. Optim Methods Softw 1:23–34
Bojarczuk CC, Lopes HS, Freitas AA (1999) Discovering comprehensible classification rules using genetic programming: a case study in a medical domain. In: Proceedings of the genetic and evolutionary computation conference, Orlando, Florida, 14–17 July, pp 953–958
Brameier M, Banzhaf W (2001) A comparison of linear genetic programming and neural networks. IEEE Trans Evol Comput 5(1):17–26
Breeden JL, Packard NH (1992) A learning algorithm for optimal representations of experimental data. Tech Rep CCSR-92-11, University of Illinois Urbana–Champaign
Carbonell JG, Michalski RS, Mitchell TM (1993) An overview of machine learning. In: Carbonell JG, Michalski RS, Mitchell TM (eds) Machine learning, an artificial intelligence approach. Tioga, Palo Alto, CA, pp 3–23
Chen H, Dhar V (1991) Cognitive process as a basis for intelligent retrieval systems design. Inf Process Manage 27(5):405–432
Chen H, Lynch KJ (1992) Automatic construction of networks of concepts characterizing document databases. IEEE Trans Syst Man Cybernet 22(5):885–902
Chen H, Lynch KJ, Basu K et al (1993) Generating, integrating, and activating thesauri for concept-based document retrieval. IEEE EXPERT 8(2):25–34
Chen M, Han J, Yu PS (1996) Data mining: an overview from database perspective. IEEE Trans Knowl Data Eng 8(6):866–883
De La Iglesia B, Debuse JCW, Rayward-Smith VJ (1996) Discovering knowledge in commercial databases using modern heuristic techniques. In: Proceedings of the 2nd international conference on knowledge discovery and data mining. AAAI, Menlo Park, CA, pp 44–49
Derkse W (1993) On simplicity and elegance. Delft, Eburon
Fayyad UM, Piatetsky-Shapiro G, Smith P (1996) From data mining to knowledge discovery: an overview. In: Fayyad UM et al (eds) Advances in knowledge discovery and data mining. AAAI/MIT, pp 1–34
Fogel DB, Wasson EC, Boughton EM et al (1998) Linear and neural models for classifying breast masses. IEEE Trans Med Imag 17(3):485–488
Freitas AA (1997) A genetic programming framework for two data mining tasks: classification and generalized rule induction. In: Genetic programming 1997: proceedings of the 2nd annual conference. Kaufmann, San Francisco, CA, pp 96–101
Fung G, Mangasarian OL (1999) Semi-supervised support vector machines for unlabeled data classification. Tech Rep, Computer Sciences Department, University of Wisconsin
Giordana A, Saitta L, Zini F (1994) Learning disjunctive concepts by means of genetic algorithms. In: Proceedings of the 11th international conference on machine learning, pp 96–104
Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, Reading, MA
Goldberg DE, Richardson J (1987) Genetic algorithms with sharing for multimodal function optimization. In: Grefenstette JJ (ed) Genetic algorithms and their applications. Erlbaum, Hillsdale, NJ, pp 41–49
Gordon M (1988) Probabilistic and genetic algorithms for document retrieval. Commun ACM 31(10):1208–1218
Holland JH (1975) Adaptation in natural and artificial systems. MIT Press, Cambridge, MA
Holsheimer M, and Siebes A (1994) Data mining: the search for knowledge in databases. Tech Rep CS-R9406, CWI, Amsterdam
Hung MS, Shanker M, Hu M (2001) Estimating breast cancer risks using neural networks. Eur J Oper Res Soc 52:1–10
Ishibuchi H, Nozaki K, Yamamoto N et al (1995) Selecting fuzzy if-then rules for classification problems using genetic algorithms. IEEE Trans Fuzzy Syst 3(3):260–270
Koza JR (1992) Genetic programming: on programming computers by means of natural selection and genetics. MIT Press, Cambridge, MA
Lee CH, Shin DG (1999) A multistrategy approach to classification learning in databases. Data Knowl Eng 31:67–93
Little RJA, Rubin DB (1987) Statistical analysis with missing data. Wiley, New York
Lu H, Setiono R, Liu H (1995) NeuroRule: a connectionist approach to data mining. In: Proceedings of the 21st international conference on very large data bases, pp 478–489
Mangasarian OL, Setiono R, Wolberg WH (1990) Pattern recognition via linear programming: theory and applications to medical diagnosis. In: Coleman TF et al (eds) Large-scale numerical optimization. SIAM, Philadelphia, pp 22–30
Mangasarian OL, Street WN, Wolberg WH (1995) Breast cancer diagnosis and prognosis via linear programming. Oper Res 43(4):570–577
Michalski RS (1983) A theory and methodology of inductive learning. In: Michalski RS, Carbonell JG, Mitchell TM (eds) Machine learning, an artificial intelligence approach. Tioga, Palo Alto, CA, pp 83–134
Mühlenbein H, Schlierkamp-Voosen D (1993) Predictive models for the breeder genetic algorithm: I. Continuous parameter optimization. Evol Comput 1(1):2–49
Mühlenbein H, Schlierkamp-Voosen D (1994) Strategy adaptation by competing subpopulations, In: Proceedings of the international conference on parallel problem solving from nature. Springer, Berlin Heidelberg New York, pp 199–208
Neri F, Giordana A (1995) A parallel genetic algorithm for concept learning. In: Proceedings of the 6th international conference on genetic algorithms. Kaufmann, San Mateo, CA, pp 436–443
Ngan PS, Wong ML, Leung KS (1998) Using grammar based genetic programming for data mining of medical knowledge. In: Genetic programming 1998: proceedings of the 3rd annual conference. Kaufmann, San Francisco, CA, pp 304–312
Noda E, Freitas AA, Lopes HS (1999) Discovering interesting prediction rules with a genetic algorithm. In: Proceedings of the congress on evolutionary computation, Washington, DC, 6–9 July, pp 1322–1329
Peña CA, Sipper M (1999) Designing breast cancer diagnosis systems via a hybrid fuzzy-genetic methodology. In: Proceedings of the IEEE international fuzzy systems conference, vol 1, pp 135–139
Piatesky-Shapiro G (1991) Discovery, analysis and presentation of strong rules. In: Piatesky-Shapiro G, Frawley W (eds) Knowledge discovery in databases. AAAI, Menlo Park, CA, pp 229–248
Prechelt L (1994) Proben1—a set of neural network benchmark problems and benchmarking rules. Tech Rep 21/94, Fakultät für Informatik, Universität Karlsruhe, Germany
Prechelt L (1995) Some notes on neural learning algorithm benchmarking. Neurocomputing 9(3):343–347
Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106
Quinlan JR (1993) C4.5: programs for machine learning. Kaufmann, San Mateo, CA
Radcliffe NJ, Surry PD (1994) Co-operation through hierarchical competition in genetic data mining. Tech Rep 94-09, Edinburgh Parallel Computing Centre, University of Edinburgh, Scotland
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by error propagation. In: Rumelhart DE, McClelland JL, the PDP Res Group (eds) Parallel distributed processing. MIT Press, Cambridge, MA, pp 318–362
Salim M, Yao X (2002) Evolving SQL queries for data mining. In: Yin H, Allinson N, Freeman R, Keane J, Hubbard S (eds) Proceedings of the 3rd international conference on intelligent data engineering and automated learning (IDEAL’02). Lecture notes in computer science, vol 2412. Springer, Berlin Heidelberg New York, pp 62–67
Setiono R, Hui LCK (1995) Use of a quasi-Newton method in a feedforward neural networks construction algorithm. IEEE Trans Neural Net 6(1):273–277
Shafer J (1997) Analysis of incomplete multivariate data. Chapman and Hall, New York
Sherrah JR, Bogner RE, Bouzerdoum A (1997) The evolutionary pre-processor: automatic feature extraction for supervised classification using genetic programming. In: Proceedings of the 2nd annual genetic programming conference. Kaufmann, Stanford University, 13–16 July, pp 304–312
Smith RE, Forrest S, Perelson AS (1992) Searching for diverse, cooperative populations with genetic algorithms. Evol Comput 1(2):127–149
Taha I, Ghosh J (1997) Evaluation and ordering of rules extracted from feedforward networks. In: Proceedings of the IEEE international conference on neural networks, Houston, TX, pp 221–226
Weiss SM, Kulikowski CA (1991) Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems. Kaufmann, San Mateo, CA
Wolberg WH, Mangasarian OL (1990) Multisurface method of pattern separation for medical diagnosis applied to breast cancer cytology. Proc Natl Acad Sci 87:9193–9196
Wolberg WH, Street WN, Mangasarian OL (1995) Image analysis and machine learning applied to breast cancer diagnosis and prognosis. Anal Quant Cytol Histol 17(2):77–87
Yao X, Liu Y (1997) A new evolutionary system for evolving artificial neural networks. IEEE Trans Neural Net 8(3):694–713
Ziarko W (1994) Rough sets, fuzzy sets and knowledge discovery. Springer, Berlin Heidelberg New York
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
De Falco, I., Della Cioppa, A., Iazzetta, A. et al. An evolutionary approach for automatically extracting intelligible classification rules. Knowl Inf Syst 7, 179–201 (2005). https://doi.org/10.1007/s10115-003-0143-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-003-0143-4