The inseparability problem in interactive case-based reasoning

https://doi.org/10.1016/S0950-7051(01)00164-2Get rights and content

Abstract

In applications of interactive case-based reasoning (CBR) such as help-desk support and recommender systems, a problem that often affects retrieval performance is the inability to distinguish between cases that have different solutions. For example, it is not unusual in recommender systems for two distinct products or services to have the same values for all attributes in the case library. While it is unlikely that both solutions are equally suited to the user's requirements, the system cannot help the user to choose between them. This problem, which we refer to as inseparability, can also arise as a result of incomplete data in the target problem presented for solution by a CBR system. We present an in-depth analysis of the inseparability problem, its relationship to the problem of incomplete data, and its impact on retrieval performance.

Introduction

In interactive case-based reasoning (CBR) applications such as fault diagnosis, help-desk support, and recommender systems, each of the faults to be identified, or products to be selected, is often represented by a single case in the case library [1], [2], [3], [4], [5]. A case library in which all cases have unique solutions is irreducible in the sense that the deletion of a single case means that the corresponding product or fault is no longer represented in the case library [3], [4]. A problem that often affects retrieval performance in interactive CBR is the inability to distinguish between certain cases. For example, it is not unusual in recommender systems for two distinct products to have the same values for all attributes including price range [2]. While it is unlikely that both products are equally suited to the requirements of the user, the system cannot help the user to choose between them.

We say that two cases are inseparable if they have the same values (or both have missing values) for all attributes in the case library [4]. Inseparability can be caused by inadequacy of the attributes used to index cases. A better understanding of its effects may therefore be of major benefit in case-base construction and maintenance. In this context, the problem is analogous to the inadequacy of attributes in a data set to distinguish between training examples in decision-tree learning [6]. In interactive CBR, inseparability can also arise as a result of incomplete data in the target problem presented for solution. Obtaining data required for retrieval in fault diagnosis, for example, may involve difficult or expensive tests that the user is unable or reluctant to perform [7]. In a recommender system, a user may decline to specify a preferred value for an attribute that she considers to be of no importance. Incomplete data effectively reduces the number of attributes available for retrieval, with the result that certain cases may no longer be distinguishable.

In previous work, we have shown that the separability of an irreducible case library, conceptually the opposite of inseparability but easier to quantify, provides an upper bound for the level of precision that can be achieved by any retrieval strategy [4]. In this paper, we present an in-depth analysis of the inseparability problem, its relationship to the problem of incomplete data, and its impact on retrieval performance.

In Section 2, we examine possible retrieval strategies for irreducible case libraries and techniques for their evaluation in terms of retrieval performance. In Section 3, we examine the relationship between separability and precision, and identify conditions in which separability not only provides an upper bound for precision, but actually determines the level of precision that can be achieved by any retrieval strategy. In Section 4, we show that the separability of an irreducible case library can be at least partially evaluated from a decision tree. In Section 5, we examine the effects on separability and precision of increasing levels of incomplete data and the choice of attributes used to index cases in an irreducible case library. Our conclusions are presented in Section 6.

Section snippets

Retrieval strategies

In this section, we examine possible retrieval strategies for irreducible case libraries, and identify conditions for a retrieval strategy to be regarded as ‘well behaved’. We also describe the empirical techniques on which our approach to the evaluation of retrieval performance is based.

Separability and precision

Our previous definition of inseparability [4] did not take account of the attributes available for retrieval. In practice, the attributes available for retrieval may vary depending on the attributes used to index cases or as a result of incomplete data. Here we say that two cases are inseparable with respect to a given set of attributes A if they have the same value, or both have missing values, for every attribute in A. Inseparability with respect to a given set of attributes can be seen to

Identification trees

We refer to a decision tree used to guide retrieval from an irreducible case library as an identification tree [3], [4]. By Theorem 1, the precision provided by an identification tree dynamically constructed from an irreducible case library that contains no missing values is independent of the splitting criterion used to construct the tree. On the other hand, retrieval efficiency, as measured by the average path length of the identification tree, very much depends on the splitting criterion. In

Experimental results

We now present the results of experiments in which we examine the effects on separability and precision of incomplete data and the choice of attributes used to index cases in an irreducible case library. The irreducible case library used in our experiments was created by removing the six examples in the AutoMPG data set [12] that have missing values to provide a case library containing 392 cases. Separability of this case library with respect to the complete set of eight attributes is 84%. As

Conclusions

Our analysis of the inseparability problem in interactive CBR builds on previous work which showed that the separability of an irreducible case library provides an upper bound for the precision that can be achieved by any retrieval strategy [4]. One source of inseparability is inadequacy of the attributes used to index cases in a case library. Inseparability can also arise as a result of missing values in the case library or incomplete data in the target problem presented for solution by a CBR

References (12)

  • D.W. Aha et al.

    Conversational case-based reasoning

    Applied Intelligence

    (2001)
  • M. Doyle, P. Cunningham, A dynamic approach to reducing dialog in on-line decision guides, Proceedings of the Fifth...
  • D. McSherry, Minimizing dialog length in interactive case-based reasoning, Proceedings of the Seventeenth International...
  • D. McSherry, Precision and recall in interactive case-based reasoning, Proceedings of the Fourth International...
  • H. Shimazu et al.

    Expertguide: a conversational case-based reasoning tool for developing mentors in knowledge spaces

    Applied Intelligence

    (2001)
  • J.R. Quinlan

    Induction of decision trees

    Machine Learning

    (1986)
There are more references available in the full text version of this article.

Cited by (14)

View all citing articles on Scopus
View full text