Abstract
A key task for data mining is to produce accurate and descriptive models. ‘Human readable’ models are often necessary to enable understanding, potentially leading to further insight, and also inducing trust in the user. Rules, or decision trees (if not too numerous or large) are readable, unlike, for example SVM models. However, descriptiveness and accuracy normally conflict; a challenge is to find algorithms that have both high accuracy and high readability. We introduce ORGA (Optimized Ripper using Genetic Algorithm) which hybridizes evolutionary search with the RIPPER ruleset algorithm. RIPPER is effective at producing accurate and readable rulesets, and we show that ORGA provides significant further improvement. ORGA outperforms overall a suitable set of comparative algorithms including implementations of RIPPER, C4.5 and PART. On a majority of the datasets, ORGA’s outperformance of the other algorithms is spectacular, and it is rarely dominated in terms of both accuracy and readability.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Cohen, W.W.: Fast effective rule induction. In: Machine Learning: Proceedings of the Twelfth International Conference, Lake Tahoe, California (1995)
Cios, K.J., Moore, G.W.: Medical data mining and knowledge discovery: Overview of key issues. In: Cios (ed.) Medical Data Mining and Knowledge Discovery, pp. 1–20. Physica-verlag, New York (2001)
Pagallo, G., Haussler, D.: Boolean feature discovery in empirical learning. Machine Learning 5(1) (1990)
Furnkranz, J., Widmer, G.: Incremental Reduced Error Pruning. In: Cohen, W., Hirsh, H. (eds.) Proceedings of the 11th International Conference on Machine Learning (ML 1994), pp. 70–77. Morgan Kaufmann, New Brunswick (1994)
Quinlan, R.: C4.5: Programs for Machine Learning. Kaufmann Publishers, San Mateo (1993)
Cohen, W.W., Singer, Y.: A Simple, Fast and Effective Rule Learner (1999)
Turney, P.D.: Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm. JAIR 2, 369–409 (1995)
Bala, J., Huang, J., Vafaie, H., DeJong, K., Wechsler, H.: Hybrid learning using genetic algorithms and decision tress for pattern classification. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, IJCAI 1995, Montreal, Canada, pp. 719–724 (1995)
Carvalho, D., Freitas, A.: A hybrid decision tree/genetic algorithm method for data mining. Information Sciences 163(1-3), 13–35 (2004)
Hsu, P.L., Lai, R., Chiu, C.C.: The hybrid of association rule algorithms and genetic algorithms for tree induction: an example of predicting the student course performance. Expert Systems with Applications 25(1), 51–62 (2003)
Freitas, A.A.: Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer, Heidelberg (2002)
Street, W.N., Wolberg, W.H., Mangasarian, O.L.: Nuclear feature extraction for breast tumor diagnosis. In: IS&T/SPIE 1993 International Symposium on Electronic Imaging: Science and Technology, San Jose, CA, vol. 1905, pp. 861–870 (1993)
Kan, G., Visser, C., Kooler, J., Dunning, A.: Short and long term predictive value of wall motion score in acute myocardial infarction. British Heart Journal 56, 422–427 (1986)
Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J., Sandhu, S., Guppy, K., Lee, S., Froelicher, V.: International application of a new probability algorithm for the diagnosis of coronary artery disease. American Journal of Cardiology 64, 304–310 (1989)
Diaconis, P., Efron, B.: Computer-Intensive Methods in Statistics. Scientific American 248 (1983)
Michalski, R., Mozetic, I., Hong, J., Lavrac, N.: The Multi-Purpose Incremental Learning System AQ15 and its Testing Applications to Three Medical Domains. In: Proceedings of the Fifth National Conference on Artificial Intelligence, pp. 1041–1045. Morgan Kaufmann, Philadelphia (1986)
Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., Johannes, R.S.: Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In: Proc. of the Symp. on Computer Applications and Medical Care, pp. 261–265. IEEE Computer Society Press, Los Alamitos (1988)
Coomans, D., Broeckaert, M., Jonckheer, M., Massart, D.L.: Comparison of Multivariate Discriminant Techniques for Clinical Data - Application to the Thyroid Functional State. Meth. Inform. Med. 22(1983), 93–101 (1983)
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and Unsupervised Discretization of Continuous Features. In: Machine Learning, Proceedings of the Twelfth International Conference on Machine Learning, pp. 194–202. Morgan Kaufmann, San Francisco (1995)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Holt, R.C.: Very simple classification rules perform well on most commonly used datasets. Machine Learning 11, 63–91 (1993)
Gaines, B.R., Compton, P.: Induction of Ripple-Down Rules Applied to Modeling Large Databases. J. Intell. Inf. Syst. 5(3), 211–228 (1995)
Frank, E., Witten, I.H.: Generating Accurate Rule Sets Without Global Optimization. In: Fifteenth International Conference on Machine Learning, pp. 144–151 (1998)
Kohavi, R.: The Power of Decision Tables. In: 8th ECML, pp. 174–189 (1995)
Witten, I.H., Frank, E., Trigg, L., Hall, M., Holmes, G., Cunningham, S.J.: Weka: Practical machine learning tools and techniques with java implementations. In: Proc. ICONIP/ANZIIS/ANNES 1999 Int. Workshop: Emerging Knowledge Engineering and Connectionist-Based Info. Systems, pp. 192–196 (1999)
Friedman, J.H., Hastie, T., Tibshirani, R.: Additive logistic regression: A statistical view of boosting. The Annals of Statistics 28, 337–374 (2000)
Edgington, E.S.: Randomization tests, 3rd edn. Marcel-Dekker, New York (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Daud, M.N.R., Corne, D. (2008). Readable and Accurate Rulesets with ORGA. In: Rudolph, G., Jansen, T., Beume, N., Lucas, S., Poloni, C. (eds) Parallel Problem Solving from Nature – PPSN X. PPSN 2008. Lecture Notes in Computer Science, vol 5199. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87700-4_86
Download citation
DOI: https://doi.org/10.1007/978-3-540-87700-4_86
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87699-1
Online ISBN: 978-3-540-87700-4
eBook Packages: Computer ScienceComputer Science (R0)