The interaction between classification and reject performance for distance-based reject-option classifiers

https://doi.org/10.1016/j.patrec.2005.10.015Get rights and content

Abstract

Consider the class of problems in which a target class is well-defined, and an outlier class is ill-defined. In these cases new outlier classes can appear, or the class-conditional distribution of the outlier class itself may be poorly sampled. A strategy to deal with this problem involves a two-stage classifier, in which one stage is designed to perform discrimination between known classes, and the other stage encloses known data to protect against changing conditions. The two stages are, however, interrelated, implying that optimising one may compromise the other. In this paper the relation between the two stages is studied within an ROC analysis framework. We show how the operating characteristics can be used for both model selection, and in aiding in the choice of the reject threshold. An analytic study on a controlled experiment is performed, followed by some experiments on real-world datasets with the distance-based reject-option classifier.

Introduction

In pattern recognition, a typical assumption made during the design phase is that the various classes involved in a particular problem can be sampled reliably. However, in some problems, new classes or clusters may appear in the production phase that were not present during the design/training. In other problems, some classes may be sampled poorly, leading to inaccurate class models. Examples of applications that are affected by this are for instance:

  • Diagnostic problems in which the objective of the classifier is to identify abnormal operation from normal operation (Dubuisson and Masson, 1993). It is often the case that a representative training set can be gathered for one of the classes, but due to the nature of the problem, the other class cannot be sampled in a representative manner. For example, in machine fault diagnosis (Ypma et al., 1999) a destructive test for all possible abnormal states may not be feasible or very expensive.

  • Recognition systems that involve a rejection and classification stage, for example, road sign classification. Here a classifier needs not only to discriminate between examples of road sign classes, but must also reject non-sign class examples (Paclík, 2004). Gathering a representative set of non-signs may not be possible. Similarly face detection (Pham et al., 2002), where a classifier must deal with well-defined face classes, and an ill-defined non-face class, and handwritten digit recognition (Liu et al., 2002), where non-digit examples are a serious issue.

For simplicity we consider the problem as one in which there is a well-defined target class, and a poorly defined outlier class. The primary objective is to maintain a high classification performance between known classes, and simultaneously to protect the classes of interest from new/unseen classes (or changes in expected conditions, reflected in the change of distribution of these classes). We refer to the latter performance measure as rejection performance. Classification performance is defined between a well-defined target class ωt, and some partial knowledge existing for the outlier class ωo. Rejection performance is defined between ωt and a new (unseen) cluster/class from the outlier class ωr that is not defined precisely in training.

Several strategies have been proposed. The first strategy to cope with this situation was proposed in (Dubuisson and Masson, 1993), called the distance-based reject-option. Here a reject-rule was proposed to reject distant objects (with respect to the target class) post-classification. This evaluation differs considerably from the second strategy, the ambiguity reject-option (defined in (Dubuisson and Masson, 1993)) as proposed in (Chow, 1970). In ambiguity reject, a threshold is included to reject objects occurring in the overlap region between two known classes. It is assumed that all classes have been sampled in a representative manner. This is in contrast to this study, in which it is assumed that classes are poorly sampled or not sampled at all.

Classifiers with the reject-rule differ from conventional classifiers in that two thresholds are used to specify the target area, namely a classification threshold θ, and a rejection threshold td (we define the target area to be the region in the feature space in which all examples are labelled target). A limitation of the distance-reject criterion is that the threshold itself has no direct relationship with the distribution of the known classes, as discussed in (Muzzolini et al., 1978). Thus a modified reject-rule was proposed in (Muzzolini et al., 1978), involving computing the probability of a new object belonging to any of the known classes, based on covariance estimates. The threshold can then be based on a degree of model-fit to the known classes.

In (Landgrebe et al., 2004) we presented a third reject strategy, involving combinations of one-class (Tax, 2001) and supervised classifiers. This scheme allowed different models to be specifically designed for the purposes of classification or rejection. It was argued that a model optimised for the sake of classification may differ from that optimised for rejection, and that combining both optimised models can improve the overall combined classification/rejection performance. Experiments showed that this strategy outperforms the other reject-rules in some situations. It was also observed that a relation between the classification and rejection performance exists, and that optimising either performance is at the detriment of the other.

Each of the strategies has a classification and rejection threshold. In both (Dubuisson and Masson, 1993, Muzzolini et al., 1978), it has been shown how the distance-reject-rule can be applied in practise, involving distance- or class-conditional probability-thresholding of new incoming objects. In the case of the ambiguity reject-option, the classifiers can be evaluated and optimised since it is assumed that all classes have been sampled, as shown in (Chow, 1970) for known costs, and applied to imprecise environments in (Ferri and Hernandez-Orallo, 2004, Tortorella, 2004) to name a few. However, in the case of the distance-based reject-option, a challenging problem posed is that the distribution of the unseen class is by definition absent, and thus standard cost-sensitive evaluations and optimisations become ill-defined, lacking a closed Bayesian formalism.

In (Landgrebe et al., 2004), the ill-defined class problem was tackled by deriving strategies that can be used to study the way in which classification and rejection performance interact, based on the assumption that a new unseen class could occur anywhere in feature space. The rationale is that a minimal target area provides, in general, the most robust solution to an unseen class that could occur anywhere in feature space.1 The methodology involved the artificial generation of the unseen class by assuming a uniformly distributed unseen class. Based on this methodology, it was observed that similar to the ambiguity-reject case, there is interaction between classification and rejection performance.

This paper is concerned with evaluating and optimising classifiers taking into account this interaction between classification and rejection. For this, receiver operating characteristic (ROC) curves will be used. ROC analysis (Metz, 1978), is a tool typically used in the evaluation of two-class classifiers in imprecise environments, plotting detection rate (true-positive rate) against the false positive rate. We extend this analysis to the unseen class problem by including an additional dimension that is related to the general robustness of the classifier to an unseen class. A similar 3-dimensional ROC analysis has been applied elsewhere, such as in (Ferri and Hernandez-Orallo, 2004, Mossman, 1999, Dreisetl et al., 2000), but in these cases this did not involve the ill-defined class problem. Our approach attempts to minimise the volume of the classes of interest in the feature space for robustness against unseen classes. It allows models to be compared (in a relative sense, since an absolute measure cannot be obtained) and provides insight into the choice of a reject threshold, that does not impact too much on classification performance.

In Section 2, an example is studied analytically to investigate the nature of the relation between classification and rejection rates, and the extended ROC analysis is presented. In Section 3, a criterion is proposed for the comparison of the extended ROC’s. This criterion is applied to a synthetic 2-dimensional example with three different models. Finally, we discuss how to optimise an operating point (i.e. choose a classification and rejection threshold). Section 4 consists of a number of experiments to demonstrate the methodology in some realistic scenarios. Conclusions are given in Section 5.

Section snippets

The relation between classification and rejection performance

First we will develop our notation and illustrate the interaction between the classification and rejection performance by showing an example. In Fig. 1, a synthetic example is presented in which ωt and ωo are two Gaussian-distributed classes distributed across domain x. Additionally we assume that a class ωr is uniformly distributed across x. The class-conditional densities for ωt, ωo and ωr are denoted p(xωt), p(xωo), and p(xωr), respectively, with priors p(ωt), p(ωo), and p(ωr), which are

Model selection and optimisation

Now a model selection criterion is formalised that makes use of the full operating characteristics, extending ROC analysis to this problem domain. This will be developed and demonstrated by a synthetic example using three different classifier models.

Experiments

In this section a number of real-world examples are conducted, demonstrating practical application of the proposed ROC analysis methodology. Model selection criteria are compared for a number of competing models, and the performance of a classifier with reject-option is compared to the same model, without reject-option. In each case, an independent test set is applied, in which the ωr class is unseen in training, simulating the effect that an unseen class may have on each classifier. These

Conclusion

Classifiers designed to protect a well-defined target class from ill-defined conditions, such as new unseen classes, are defined by two decision thresholds, namely a classification and rejection threshold. The classification threshold is designed to provide an optimal trade-off between known classes, and the rejection threshold protects the target class against changes in conditions e.g. new unseen classes.

In this paper, we discussed the fact that classification and rejection performances are

Acknowledgements

This research is/was supported by the Technology Foundation STW, applied science division of NWO and the technology programme of the Ministry of Economic Affairs. A special mention is given to the anonymous reviewers who helped clarify some aspects of this work.

References (20)

There are more references available in the full text version of this article.

Cited by (78)

  • Trinary tools for continuously valued binary classifiers

    2022, Visual Informatics
    Citation Excerpt :

    However, they do not allow for considering both thresholds in a coordinated fashion. Landgrebe et al. (2006) suggested the use of a 3D version of ROC curves, but such curves are challenging to interpret. ARCs are useful not only for determining rejection thresholds, but also for comparing the calibration of classifiers (Nadeem et al., 2009).

  • Probability-Based Rejection of Decoding Output Improves the Accuracy of Locomotion Detection During Gait

    2023, Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS
View all citing articles on Scopus
View full text