Abstract
The evaluation of production rules generated by different data mining algorithms currently depends upon the data set used, thus their generalization capability cannot be estimated. Our method consists of three steps. Firstly, we take a set of rules, copy these rules into a population of rules, and then perturb the parameters of individuals in this population. Secondly, the maximum robustness bounds for the rules is then found using genetic algorithms, where the performance of each individual is measured with respect to the training data. Finally, the relationship between maximum robustness bounds and generalization capability is constructed using statistical analysis for a large number of rules. The significance of this relationship is that it allows the algorithms that mine rules to be compared in terms of robustness bounds, independent of the test data. This technique is applied in a case study to a protein sequence classification problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Quinlan J.R, C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1994)
Dunham, M.H.: Data Mining: Introductory and Advanced Topics. Prentice Hall, Englewood Cliffs (2003)
Kononenko, I., Bratko, I.: Information based evaluation criterion for classifier’s performance. Machine Learning 6, 67–80 (1991)
Safavian, S.R., Landgrebe, D.: A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man, and Cybernetics 21, 660–674 (1991)
Moret, M.E.: Decision tree and diagrams. Computing Survey 14, 593–623 (1982)
Selby, R.W., Porter, A.A.: Learning from examples: generation and evaluation of decision trees for software resource analysis. IEEE Transactions on Software Engineering 14, 1743–1757 (1988)
Zhou, K.M., Doyle, J.C.: Essentials of Robust Control. Prentice Hall, Englewood Cliffs (1997)
Mitra, S., Konwar, K.M., Pal, S.K.: A fuzzy decision tree, linguistic rules and fuzzy knowledge-based network: generation and evaluation. IEEE Systems, Man and Cybernetics, Part C: Application and Reviews 32, 328–339 (2002)
Wu, C.H., Whitson, G., McLarty, J., Ermongkonchai, A., Change, T.C.: PROCANS: Protein classification artificial neural system. Protein Science, 667–677 (1992)
Protein Information Resources (PIR), http://pir.Georgetown.edu
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, D., Dillon, T.S., Ma, X. (2003). Robustness for Evaluating Rule’s Generalization Capability in Data Mining. In: Gedeon, T.(.D., Fung, L.C.C. (eds) AI 2003: Advances in Artificial Intelligence. AI 2003. Lecture Notes in Computer Science(), vol 2903. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24581-0_60
Download citation
DOI: https://doi.org/10.1007/978-3-540-24581-0_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20646-0
Online ISBN: 978-3-540-24581-0
eBook Packages: Springer Book Archive