Dissimilarity representations allow for building good classifiers

https://doi.org/10.1016/S0167-8655(02)00024-7Get rights and content

Abstract

In this paper, a classification task on dissimilarity representations is considered. A traditional way to discriminate between objects represented by dissimilarities is the nearest neighbor method. It suffers, however, from a number of limitations, i.e., high computational complexity, a potential loss of accuracy when a small set of prototypes is used and sensitivity to noise. To overcome these shortcomings, we propose to use a normal density-based classifier constructed on the same representation. We show that such a classifier, based on a weighted combination of dissimilarities, can significantly improve the nearest neighbor rule with respect to the recognition accuracy and computational effort.

Introduction

The challenge of automatic pattern recognition is to develop computer methods which learn to distinguish among a number of classes represented by examples. First, an appropriate representation of objects should be found. Then, a decision rule can be constructed, which discriminates between different categories and which is able to generalize well (achieve a high accuracy when novel examples appear). One of the possible representations is based on similarity or dissimilarity relations between objects. When properly defined, it might be advantageous for solving class identification problems. Such a recommendation is supported by the fact that (dis)similarities can be considered as a connection between perception and higher-level knowledge, being a crucial factor in the process of human recognition and categorization (Goldstone, 1999; Edelman, 1999; Wharton et al., 1992).

In contrast to this observation, objects are conventionally represented by characteristic features (Duda et al., 2001). In some cases, however, a feasible feature-based description of objects might be difficult to obtain or inefficient for learning purposes, e.g., when experts cannot define features in a straightforward way, when data are high dimensional, or when features consist of both continuous and categorical variables. Then, the use of dissimilarities, built directly on measurements, e.g., based on template matching, is an appealing alternative. Also, in some applications, e.g., 2D shape recognition (Edelman, 1999), the use of dissimilarities makes the problem more viable.

The nearest neighbor method (NN) (Cover and Hart, 1967) is traditionally applied to dissimilarity representations. Although this decision rule is based on local neighborhoods, i.e., one or a few neighbors, it is still computationally expensive, since dissimilarities to all training examples have to be found. Another drawback is that it potentially decreases its performance when the training set is small. To overcome such limitations and improve the recognition accuracy, we propose to replace this method by a more global decision rule. Such a classifier is constructed from a training set represented by the dissimilarities to a set of prototypes, called the representation set. If this set is small, it has the advantage that only a small set of dissimilarities has to be computed for its evaluation, while it may still profit from the accuracy offered by a large training set.

Throughout this paper, all our investigations are devoted to dissimilarity representations, assuming that no other representations (e.g., features) are available for the researcher. The goal of this work is to propose a novel, advantageous approach to learn only from dissimilarity (distance) representations, dealing with classification problems in particular. Our experiments will demonstrate that the tradeoff between the recognition accuracy and the computational effort is significantly improved by using a normal density-based classifier built on dissimilarities instead of the NN rule. This paper is organized as follows. In Section 2, a more detailed description of dissimilarity representations and the decision rules considered are given. Section 3 describes the datasets used and the experiments conducted. The results are discussed in Section 4 and the conclusions are summarized in Section 5. The essential idea of this paper has been published in Electronic Letters (Pękalska, 2001). Some earlier elements of the presented research can be found in (Duin et al., 1999; Pękalska and Duin, 2000).

Section snippets

Learning from dissimilarities

To construct a classifier on dissimilarities, the training set T of size n (having n objects) and the representation set R (Duin, 2000) of size r will be used. R is a set of prototypes covering all classes present. R is chosen to be a subset of T (RT), although, in general, R and T might be disjunct. In the learning process, a classifier is built on the n×r distance matrix D(T,R), relating all training objects to all prototypes. The information on a set S of s new objects is provided in terms

Datasets and the experimental set-up

A number of experiments is conducted to compare the results of the k-NN rule and the RLNC/RQNC built on dissimilarities. They are designed to observe and analyze the behavior of these classifiers in relation to different sizes of the representation and training sets. Smaller representation sets are of interest, because of lower complexity for representation and evaluation of new objects. This is important for the storage purposes, as well as for the computational aspect. Our concern is then how

Results

The generalization error rates of the k-NN rule and the RLNC/RQNC for three datasets are presented in Fig. 3, Fig. 4, Fig. 5. The k-NN results, marked by stars `', are presented on the rc=nc line. The results depend either on the random selection of the representation set (left subplots) or on the MD criterion (right subplots). Since, the k-NN results are worse in case of the MD selection, the k-NN results always refer to the random selection (also in right subplots). The RLNC's (RQNC's)

Discussion and conclusions

Our experiments confirm that the RLNC constructed on the dissimilarity representations D(T,R) nearly always outperforms the k-NN rule based on the same R. This holds for the RQNC as well, provided that each class is represented by a sufficient number of objects. Since the computational complexity (here mainly indicated by the number of prototypes, as explained in Section 2.3) for evaluation of new objects is an important issue, our study is done with such an emphasis. We have found out that for

Acknowledgements

This work is partly supported by the Dutch Organization for Scientific Research (NWO).

References (20)

  • R.P.W. Duin et al.

    Relational discriminant analysis

    Pattern Recognition Lett.

    (1999)
  • D.W. Aha et al.

    Instance-based learning algorithms

    Mach. Learning

    (1991)
  • T.M. Cover et al.

    Nearest neighbor pattern classification

    IEEE Trans. Inf. Theory

    (1967)
  • P.A. Devijver et al.

    Pattern Recognition: A Statistical Approach

    (1982)
  • M.P. Dubuisson et al.

    Modified Hausdorff distance for object matching

  • R.O Duda et al.

    Pattern Classification

    (2001)
  • R.P.W. Duin

    Classifiers in almost empty spaces

  • S. Edelman

    Representation and Recognition in Vision

    (1999)
  • K. Fukunaga

    Introduction to Statistical Pattern Recognition

    (1990)
  • R.L. Goldstone

    Similarity

There are more references available in the full text version of this article.

Cited by (193)

View all citing articles on Scopus
View full text