The inseparability problem in interactive case-based reasoning
Introduction
In interactive case-based reasoning (CBR) applications such as fault diagnosis, help-desk support, and recommender systems, each of the faults to be identified, or products to be selected, is often represented by a single case in the case library [1], [2], [3], [4], [5]. A case library in which all cases have unique solutions is irreducible in the sense that the deletion of a single case means that the corresponding product or fault is no longer represented in the case library [3], [4]. A problem that often affects retrieval performance in interactive CBR is the inability to distinguish between certain cases. For example, it is not unusual in recommender systems for two distinct products to have the same values for all attributes including price range [2]. While it is unlikely that both products are equally suited to the requirements of the user, the system cannot help the user to choose between them.
We say that two cases are inseparable if they have the same values (or both have missing values) for all attributes in the case library [4]. Inseparability can be caused by inadequacy of the attributes used to index cases. A better understanding of its effects may therefore be of major benefit in case-base construction and maintenance. In this context, the problem is analogous to the inadequacy of attributes in a data set to distinguish between training examples in decision-tree learning [6]. In interactive CBR, inseparability can also arise as a result of incomplete data in the target problem presented for solution. Obtaining data required for retrieval in fault diagnosis, for example, may involve difficult or expensive tests that the user is unable or reluctant to perform [7]. In a recommender system, a user may decline to specify a preferred value for an attribute that she considers to be of no importance. Incomplete data effectively reduces the number of attributes available for retrieval, with the result that certain cases may no longer be distinguishable.
In previous work, we have shown that the separability of an irreducible case library, conceptually the opposite of inseparability but easier to quantify, provides an upper bound for the level of precision that can be achieved by any retrieval strategy [4]. In this paper, we present an in-depth analysis of the inseparability problem, its relationship to the problem of incomplete data, and its impact on retrieval performance.
In Section 2, we examine possible retrieval strategies for irreducible case libraries and techniques for their evaluation in terms of retrieval performance. In Section 3, we examine the relationship between separability and precision, and identify conditions in which separability not only provides an upper bound for precision, but actually determines the level of precision that can be achieved by any retrieval strategy. In Section 4, we show that the separability of an irreducible case library can be at least partially evaluated from a decision tree. In Section 5, we examine the effects on separability and precision of increasing levels of incomplete data and the choice of attributes used to index cases in an irreducible case library. Our conclusions are presented in Section 6.
Section snippets
Retrieval strategies
In this section, we examine possible retrieval strategies for irreducible case libraries, and identify conditions for a retrieval strategy to be regarded as ‘well behaved’. We also describe the empirical techniques on which our approach to the evaluation of retrieval performance is based.
Separability and precision
Our previous definition of inseparability [4] did not take account of the attributes available for retrieval. In practice, the attributes available for retrieval may vary depending on the attributes used to index cases or as a result of incomplete data. Here we say that two cases are inseparable with respect to a given set of attributes A if they have the same value, or both have missing values, for every attribute in A. Inseparability with respect to a given set of attributes can be seen to
Identification trees
We refer to a decision tree used to guide retrieval from an irreducible case library as an identification tree [3], [4]. By Theorem 1, the precision provided by an identification tree dynamically constructed from an irreducible case library that contains no missing values is independent of the splitting criterion used to construct the tree. On the other hand, retrieval efficiency, as measured by the average path length of the identification tree, very much depends on the splitting criterion. In
Experimental results
We now present the results of experiments in which we examine the effects on separability and precision of incomplete data and the choice of attributes used to index cases in an irreducible case library. The irreducible case library used in our experiments was created by removing the six examples in the AutoMPG data set [12] that have missing values to provide a case library containing 392 cases. Separability of this case library with respect to the complete set of eight attributes is 84%. As
Conclusions
Our analysis of the inseparability problem in interactive CBR builds on previous work which showed that the separability of an irreducible case library provides an upper bound for the precision that can be achieved by any retrieval strategy [4]. One source of inseparability is inadequacy of the attributes used to index cases in a case library. Inseparability can also arise as a result of missing values in the case library or incomplete data in the target problem presented for solution by a CBR
References (12)
- et al.
Conversational case-based reasoning
Applied Intelligence
(2001) - M. Doyle, P. Cunningham, A dynamic approach to reducing dialog in on-line decision guides, Proceedings of the Fifth...
- D. McSherry, Minimizing dialog length in interactive case-based reasoning, Proceedings of the Seventeenth International...
- D. McSherry, Precision and recall in interactive case-based reasoning, Proceedings of the Fourth International...
- et al.
Expertguide: a conversational case-based reasoning tool for developing mentors in knowledge spaces
Applied Intelligence
(2001) Induction of decision trees
Machine Learning
(1986)
Cited by (14)
Advances in conversational case-based reasoning
2005, Knowledge Engineering ReviewAn improved case-based reasoning model for simulating urban growth
2021, Sustainability (Switzerland)Recent developments on computer aided fixture design: Case based reasoning approaches
2014, Advances in Mechanical EngineeringA case-based reasoning system for adapting selling
2013, International Journal of Electronic Customer Relationship Management