Rule induction is a data mining technique used to extract classification rules of the form IF (conditions) THEN (predicted class) from data. The majority of the rule induction algorithms found in the literature follow the sequential covering strategy, which essentially induces one rule at a time until (almost) all the training data is covered by the induced rule set. This strategy describes a basic algorithm composed by several key elements, which can be modified and/or extended to generate new and better rule induction algorithms. With this in mind, this work proposes the use of a grammar-based genetic programming (GGP) algorithm to automatically discover new sequential covering algorithms. The proposed system is evaluated using 20 data sets, and the automatically-discovered rule induction algorithms are compared with four well-known human-designed rule induction algorithms. Results showed that the GGP system is a promising approach to effectively discover new sequential covering algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aho, A.V., Sethi, R., Ullman, J.D, (1986), Compilers: Principles, Techniques and Tools. 1st edn. Addison-Wesley.
Banzhaf, W., Nordin, P., Keller, R.E., Francone, F.D, (1998), Genetic Programming - An Introduction; On the Automatic Evolution of Computer Programs and its Applications. Morgan Kaufmann.
Bhattacharyya, S, (1998), Direct marketing response models using genetic algorithms. In: Proc. of 4th Int. Conf. on Knowledge Discovery and Data Mining (KDD-98). 144-148.
Caruana, R., Niculescu-Mizil, A, (2004), Data mining in metric space: an empirical analysis of supervised learning performance criteria. In: Proc. of the 10th ACM SIGKDD Int. Conf. on Knowledge discovery and data mining (KDD-04), ACM Press 69-78.
Clark, P., Boswell, R., 1991, Rule induction with CN2: some recent improvements. In Kodratoff, Y., ed, EWSL-91: Proc. of the European Working Session on Learning on Machine Learning, New York, NY, USA, Springer-Verlag 151-163.
Clark, P., Niblett, T, 1989, The CN2 induction algorithm. Machine Learning 3 261-283.
Cohen, W.W., 1995, Fast effective rule induction. In Prieditis, A., Russell, S., eds, Proc. of the 12th Int. Conf. on Machine Learning (ICML-95), Tahoe City, CA, Morgan Kaufmann 115-123.
Fawcett, T, (2003), Roc graphs: Notes and practical considerations for data mining researchers. Technical Report HPL-2003-4, HP Labs.
Flach, P, (2003), The geometry of roc space: understanding machine learning metrics through roc isometrics. In: Proc. 20th International Conference on Machine Learning (ICML-03), AAAI Press 194-201.
Freitas, A.A, (2002), Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer-Verlag.
Fürnkranz, J, 1999, Separate-and-conquer rule learning. Artificial Intelligence Review 13(1) 3-54.
de la Iglesia, B., Debuse, J.C.W., Rayward-Smith, V.J, (1996) Discovering knowledge in commercial databases using modern heuristic techniques. In: Proc. of the 2nd ACM SIGKDD Int. Conf. on Knowledge discovery and data mining (KDD-96), 44-49.
Genetic Programming, http://www.genetic-programming.org/ (2006)
Koza, J.R, 1992, Genetic Programming: On the Programming of Computers by the means of natural selection. The MIT Press, Massachusetts.
Michalski, R.S, (1969), On the quasi-minimal solution of the general covering problem. In: Proc. of the 5th Int. Symposium on Information Processing, Bled, Yugoslavia 125-128.
Mitchell, T, (1997), Machine Learning. Mc Graw Hill.
Naur, P, 1963, Revised report on the algorithmic language algol-60. Communications ACM 6(1) 1-17.
Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J., (1998), UCI Repository of machine learning databases. University of California, Irvine,http://www.ics.uci.edu/∼mlearn/MLRepository.html
Pappa, G.L., Freitas, A.A. (2006), Automatically evolving rule induction algorithms. In Fürnkranz, J., Scheffer, T., Spiliopoulou, M., eds, Proc. of the 17th European Conf. on Machine Learning (ECML-06). Volume 4212 of Lecture Notes in Computer Science., Springer Berlin/Heidelberg 341-352.
Pappa, G.L, 2007, Automatically Evolving Rule Induction Algorithms with Grammar-based Genetic Programming. PhD thesis, Computing Laboratory, University of Kent, Cannterbury, UK.
Provost, F., Fawcett, T., Kohavi, R, 1998, The case against accuracy estimation for comparing induction algorithms. In: Proc. of the 15th Int. Conf. on Machine Learning (ICML-98), San Francisco, CA, USA, Morgan Kaufmann Publishers Inc. 445-453.
Quinlan, J.R, (1993), C4.5: programs for machine learning. Morgan Kaufmann. Witten, I.H., Frank, E, (2005), Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. 2nd edn. Morgan Kaufmann.
Zhang, J, 1992, Selecting typical instances in instance-based learning. In: Proc. of the 9th Int. Workshop on Machine learning (ML-92), San Francisco, CA, USA, Morgan Kaufmann 470-479.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Pappa, G.L., Freitas, A.A. (2008). Discovering New Rule Induction Algorithms with Grammar-based Genetic Programming. In: Maimon, O., Rokach, L. (eds) Soft Computing for Knowledge Discovery and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-69935-6_6
Download citation
DOI: https://doi.org/10.1007/978-0-387-69935-6_6
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-69934-9
Online ISBN: 978-0-387-69935-6
eBook Packages: Computer ScienceComputer Science (R0)