Applications of rule-base coverage measures to expert system evaluation
Introduction
Evaluation of a knowledge-based system is a multi-faceted problem, with numerous approaches and techniques. The results generated by the system must be evaluated, along with its features, the usability of the system, how easily it can be enhanced, and whether or not it has a positive impact on the people who are using the system in place of an approach which is not computer based. The system’s performance must also be evaluated in light of its intended use [1]. If the expert system is meant to function as an intelligent assistant then it must satisfy the criterion of being a useful adjunct to the human problem solver. If the system is expected to emulate the reasoning of a human expert then a more rigorous evaluation of the system is needed.
During the last 20 years there has been considerable development and use of knowledge-based systems for medical decision support. In this period there has been heavy emphasis on functional analysis, addressing two primary questions:
- •
Does the system give the results we expect on test cases?
- •
Does the system improve the effectiveness of those who use it?
The method we present enhances functional analysis of rule-based classification systems with a rule-base coverage assessment, overcoming limitations of common methods for rule-based expert systems evaluation. The underlying premise of this work is that an ideal testing method is one that guarantees that all possible reasoning paths through a rule-base have been exercised. As with procedural software, this is often an unreasonable and/or unattainable goal, possibly due to a lack of test data, to un-executable program paths, or to the size of the rule-base. Further, even if each possible path is exercised, we cannot realistically do so with each distinct set of test values that could cause its traversal. A reasonable goal is for the rule-base testing process to exercise every inference chain or provide information about the failure of the testing process to do so.
Usually verification and validation (V & V) of rule-based systems involves a static structural analysis (verification) method to detect internal inconsistencies, followed by a dynamic, functional, validation in which system behavior on a set of test cases is compared with expected results. The weakness of a strictly functional approach to validation is that the test data available may not adequately cover the rule-base, and, at best, limited information about coverage will be obtained. System performance statistics are usually presented as if they apply to the entire rule-base, rather than just to the tested sections. This can lead to false estimates of system performance in actual use. The system performance indicated by the comparison of actual and expected results is relevant only for the tested sections, while performance in the untested sections cannot be predicted.
We must also consider completeness of the test set and age of the rule-base by the test data. Completeness of the test set refers to the degree to which the data represents all types of cases, which could be presented to the system under intended conditions of use. Coverage of the rule-base refers to how extensively possible combinations of inference relations are exercised during test data evaluation. In the trivial case, with a correct rule-base and a complete test suite, the test data would completely cover the rule-base, all actual results would agree with expected results, and we could predict completely correct performance of the rule-base in actual use. In the more usual situation we may have errors and incompleteness in the rule-base, as well as inadequacies in the test data. If we only judge the system based on a comparison of actual and expected results, the rule-base could perform well on the test data, but actually contain errors which are not identified due to incompleteness of the test data. This could lead to a false prediction of correct performance on all cases, when in fact we cannot make any accurate prediction about performance of the rule-base in those areas for which there is an absence of test data.
Our testing approach, as outlined in Fig. 1, allows clear identification of incompleteness in the test data and potential errors in the rule-base through identification of sections of the rule-base that have not been exercised during functional test. This can indicate weaknesses in the test set and/or sections of the rule-base that may not be necessary. An incomplete test set can be supplemented with additional cases chosen from the available population, guided by a series of heuristics and the coverage analysis information. Alternatively, if there is no test data which covers certain parts of the system, it is possible that those sections should be pruned from the rule-base or modified.
Our approach carries out structural analysis of the rule-base using five rule-base coverage measures (RBCMs) which identify sections not exercised by the test data. This makes it possible to improve completeness of the test suite, thereby increasing the kinds of cases on which the rule-base will be tested and improving coverage of the rule-base.
In addition to the coverage analysis, we employ a rule-base representation which facilitates application of the coverage measures; a set of heuristics for re-sampling the population of available test cases, based on coverage information, as shown in Fig. 2; strategies for rule-base pruning and identification of class-dependencies; a rule-base complexity metric. In another study [3] the utility of the aforementioned is illustrated extensively using rule-bases which were prototypes for the AI/RHEUM system [4] and the TRUBAC (Testing with RUle-BAse Coverage), a tool which implements the coverage analysis method.
Section snippets
Related work
This work builds on both coverage-based testing methods for procedural software (see [5] for a review of methods and [6], [7] for a data-flow approach to testing) and earlier work on rule-base analysis. Early approaches for rule-base analysis carried out only verification or validation. A number of systems, such as the ONCOCIN rule checker program (RCP) [8], CHECK [9], [10], ESC (expert system checker) [11], and KB-Reducer [12], [13] carry out only verification. Beyond their limitation to
Testing with rule-base coverage measures
The first step in rule-base testing with coverage measures is to build a graph representation of the rule-base. Our method uses a directed acyclic graph (DAG) representation. We assume a generic propositional rule-base language [3] into which other rule-base languages can be translated. During construction of the DAG, pairwise redundant rules, pairwise simple contradictory rules and potential contradictions (ambiguities) are identified. After DAG construction is complete, static analysis
Applications of coverage analysis
In addition to providing information about the testing process itself, the coverage analysis can be used to enhance testing and facilitate other kinds of rule-base analysis, as described below.
Conclusions and future work
This work shows that there are numerous uses of rule-base coverage data in the testing process. Rule-base performance evaluation can be misleading unless care is taken to identify problems with both the test data and the rule-base. Both the test data and the rule-base can be improved by using information about the extent to which the test data has covered the rule-base under test.
This work can be extended in a number of directions. Quantitative performance prediction can be computed based on
References (30)
- et al.
A decision-table-based processor for checking completeness and consistency in rule-based expert systems
International Journal of Man-Machine Studies
(1987) - et al.
A model-based method for computer-aided medical decision-making
Artificial Intelligence
(1978) Introduction to Expert Systems
(1990)- et al.
Expert system verification and validation: a survey and tutorial
Artificial Intelligence Review
(1993) - V. Barr, Applications of rule-base coverage measures to expert system evaluation, PhD thesis, Rutgers University,...
- L.C. Kingsland, The evaluation of medical expert systems: experiences with the AI/RHEUM knowledge-based consultant...
- et al.
Validation, verification, and testing of computer software
ACM Computing Surveys
(1982) - P. Frankl, E. Weyuker, A data flow testing tool, Proceedings of IEEE Softfair II, San Francisco, December...
- et al.
Selecting software test data using data flow information
IEEE Transactions on Software Engineering
(1985) - et al.
An approach to verifying completeness and consistency in rule-based expert system
AI Magazine
(1982)
Knowledge base verification
AI Magazine
Automatic Refinement of Expert System Knowledge Bases
Interactive transfer of expertise
Cited by (14)
Advanced empirical testing
2011, Knowledge-Based SystemsCitation Excerpt :Here, every test case covers a specific aspect of the knowledge in the best case. Coverage measures for a knowledge base and the test suite, respectively, have been discussed for example in [4]. Fig. 1 shows an example of a unit testing tool in practice.
Software complexity and its impacts in embedded intelligent real-time systems
2005, Journal of Systems and SoftwareCitation Excerpt :It is the complexity of a system’s intelligence. Metrics for this type of complexity can be derived from the decision support systems, intelligent systems (Reich, 1995), expert systems (Sharma and Conrath, 1996), psychology and cognitive science, system science (Kanoun et al., 1997; Lew et al., 1988; Marr, 1982), computer science and software engineering (Barr, 1999; McCabe, 1976), and control systems (Munson, 1996) literatures. In EIRTS, data processing and reasoning complexity can be measured at the application layer in the system knowledge bases, control structures, and in the objects and classes associated with the system’s reasoning.
Intelligent Costume Recommendation System Based on Expert System
2018, Journal of Shanghai Jiaotong University (Science)CoverageCity: Test coverage for clinical guidelines
2012, CEUR Workshop ProceedingsResubmission: Developing knowledge systems with continuous integration
2011, LWA 2011 - Technical Report of the Symposium "Lernen, Wissen, Adaptivitat - Learning, Knowledge, and Adaptivity 2011" of the GI Special Interest Groups KDML, IR and WMEvaluation of expert systems: The application of a reference model to the usability parameter
2011, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)