Abstract
This paper addresses an important problem related to the use ofinduction systems in analyzing real world data. The problem is thequality and reliability of the rules generated by the systems.~Wediscuss the significance of having a reliable and efficient rule quality measure. Such a measure can provide useful support ininterpreting, ranking and applying the rules generated by aninduction system. A number of rule quality and statistical measuresare selected from the literature and their performance is evaluatedon four sets of semiconductor data. The primary goal of thistesting and evaluation has been to investigate the performance ofthese quality measures based on: (i) accuracy, (ii) coverage, (iii)positive error ratio, and (iv) negative error ratio of the ruleselected by each measure. Moreover, the sensitivity of these qualitymeasures to different data distributions is examined. Inconclusion, we recommend Cohen‘s statistic as being the best qualitymeasure examined for the domain. Finally, we explain some future workto be done in this area.
Similar content being viewed by others
References
T.W. Anderson and S.L. Sclove, The Statistical Analysis of Data, Second Edition, The Scientific Press: Palo Alto, CA, 1986.
C. Apte, S. Weiss, and G. Grout, “Predicting defects in disk drive manufacturing: A case study in high-dimensional classification,” Proc. of the 9th Conf. on AI for Applications, pp. 212–218, 1993.
F. Bergadano, S. Matwin, R.S. Michalski, and J. Zhang, “Measuring quality of concept descriptions,” Proceedings of the Third European Working Session on Learning, IOS Press: Amsterdam, 1988, pp. 1–14.
Y.M.M. Bishop, S.E. Fienberg, and P.W. Holland, Discrete Multivariate Analysis: Theory and Practice, The MIT Press: Cambridge, MA, 1975.
P.B. Brazdil and L. Torgo, Current Trends in Knowledge Acquisition, IOS Press: Amsterdam, 1990.
I. Bruha and S. Kockova, “Quality of decision rules: Empirical and statistical approaches,” Informatica, no. 17, pp. 233–243, 1993.
I. Bruha, “Combining rule qualities in a covering learning algorithm,” Machine Learning Workshop, Canadian AI Conference, 1992.
J. Canning, “A minimum description length model for recognizing objects with variable appearances,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16,no. 10, pp. 1032–1036, 1994.
P. Clark and T. Niblett, “The CN2 induction algorithm,” Machine Learning Journal, vol. 3,no. 4, pp. 261–283, 1989.
P. Clark and S. Matwin, “Using qualitative models to guide inductive learning,” in Proc. 10th International Machine Learning Conference, Univ. of Mass., USA, 1993, pp. 49–56.
V.G. Dabija et al., “Learning to learn decision trees,” Proceedings of the American Conference on Artificial Intelligence, AAAI-MIT Press, 1992, pp. 88–95.
A. Famili, “Use of decision-tree induction for process optimization and knowledge refinement of an industrial process,” AIEDAM, vol. 5,no. 2, pp. 109–124, 1994.
A. Famili, “The role of data pre-processing in intelligent data analysis,” International Symposium on Intelligent Data Analysis, International Institute for Advanced Studies in Systems Research and Cybernetics, pp. 54–58, 1995.
A. Famili and P. Turney, “Intelligently helping human planner in industrial process planning,” AIEDAM, vol. 5,no. 2, pp. 109–124, 1991.
O. Gur-Ali and W.A. Wallace, “Induction of rules subject to a quality constraint: probabilistic inductive learning,” IEEE Transaction on Knowledge and Data Engineering, vol. 5,no. 3, pp. 979–985, 1993.
W. Mendenhall, Introduction to Linear Models and the Design and Analysis of Experiments, Duxbury Press: Belmont, CA, 1968.
R.S. Michalski, I. Mozetic, and J. Hong, “The AQ15 inductive learning system: An overview and experiments,” Technical report ISG 86-20, UIUCDCS-R-86-1260, Dept. of Computer Science, University of Illinois, Urbana, 1986.
J.R. Quinlan, “Simplifying decision trees,” International Journal of Man-Machine Studies, vol. 27,no. 3, pp. 221–234, 1987.
J.R. Quinlan, “Generating production rules from decision trees,” Proceedings of the Tenth International Joint Conference on Artificial Intelligence, Morgan Kaufmann, Los Altos, CA, 1987, pp. 304–307.
J.R. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, pp. 261–283, 1989.
J.R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, San Mateo, CA, 1993.
J.R. Quinlan, “The minimum description length principle and categorical theories,” Proceedings of 11th International Conference on Machine Learning, 1994, pp. 233–241.
J.R. Quinlan, “MDL and categorical theories (continued),” Proceedings of 12th International Conference on Machine Learning, 1995, pp. 464–470.
P. Riddle, R. Segal, and O. Etzioni, “Representation design and brute-force induction in a boeing manufacturing domain,” Applied Artificial Intelligence, vol. 8, pp. 125–147, 1994.
J. Rissanen, “Modeling by shortest data description,” Automatica, vol. 14, pp. 465–471, 1978.
L. Torgo, “Controlled Redundancy in Incremental Rule Learning,” European Workshop on Machine Learning, Springer-Verlag, pp. 185–195, 1993.
L. Torgo, “Rule Combination in Inductive Learning,” European Workshop on Machine Learning, Springer-Verlag, pp. 384–389, 1993.
S.M. Weiss and C.A. Kulikowski, Computer Systems that Learn, Morgan Kaufmann Publishers, San Mateo, CA, 1991.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Dean, P., Famili, A. Comparative Performance of Rule Quality Measures in an Induction System. Applied Intelligence 7, 113–124 (1997). https://doi.org/10.1023/A:1008293727412
Issue Date:
DOI: https://doi.org/10.1023/A:1008293727412