Relational discriminant analysis

https://doi.org/10.1016/S0167-8655(99)00085-9Get rights and content

Abstract

Relational discriminant analysis is based on a proximity description of the data. Instead of features, the similarities to a subset of the objects in the training data are used for representation. In this paper we will show that this subset might be small and that its exact choice is of minor importance. Moreover, it is shown that linear or non-linear methods for feature extraction based on multi-dimensional scaling are not, or just hardly better than subsets. Selection drastically simplifies the problem of dimension reduction. Relational discriminant analysis may thus be a valuable pattern recognition tool for applications in which the choice of the features is uncertain.

Introduction

In statistical pattern recognition, objects are traditionally represented by features. Recently (Duin et al., 1997) we argued that formally objects might also be represented by a proximity measure (distances, similarities) to a set of prototypes or support objects. This leads to a featureless approach to pattern recognition in which the application expert expresses the domain knowledge by defining the proximity measure instead of by defining a set of features. Finding classifiers between classes represented in this way is called relational discriminant analysis.

This approach has some resemblance with one of the early methods for pattern recognition called template matching. That method is also featureless and is also based on some proximity measure. What is discussed here goes much further as we will build discriminant functions on similarities and try to reduce the complexity of the representation. In this study the similarity to particular objects in the training set will take over the role of features of the traditional feature based methods.

The starting point of this analysis is an m×m matrix D between all m training objects and a corresponding set of m labels Λ. We can relate them uniquely by a weight vector w asDw=Λ,sow=D−1Λ,if rank(D)=m. For a new object x, represented as a set of similarities to the m training objects, the label can now be estimated asλx=xTw.

The problem with this discriminant is that it may have a small generalization power as it has no noise averaging possibilities. If a smaller representation set R is used, having nm objects, then the corresponding Dr has size m×n and the weight vector has a reduced size of n weights. Eq. (1) can now be based on a mean square error procedure (Fisher’s Linear Discriminant), but also other discriminant functions may be used.

A dimension reduction is important for two reasons. First, the accuracy of a classifier improves if it is trained by m objects but represented in a space with dimensionality nm, due to the curse of dimensionality (Jain and Chandrasekaran, 1987). Secondly, it reduces the complexity of measurements and computations in classifying new objects as they have just to be represented by their similarities to the objects in R only. In this way, the representation set replaces the traditional feature set.

In this paper we will show that in a practical application the selection of objects is (almost) as good as feature extraction by linear as well as non-linear methods for multi-dimensional scaling. Next we will analyze how critical the choice of the reduction R is. We will show that it is hard to improve the performance based on just a random selection. This makes relational discriminant analysis a very simple procedure.

Section snippets

Dimension reduction by the selection of objects

As stated, the reduction of the number of columns in D from m to n by selecting a representation set is almost identical to the feature selection problem. There is, however, one important difference between the similarities to a representation set and a feature set: features can differ largely; some features may even be unique. The similarities to the objects in the representation set, on the other hand, have a uniform interpretation. They can be very similar, as they arise from the same

Feature extraction by multi-dimensional scaling

Instead of selecting objects, a dimension reduction in the relational representation can be achieved by mapping the original set of objects on a feature space of a given reduced dimensionality. In order to preserve the original structure defined by a proximity matrix, we will demand that the distances in the new space reflect these proximities as well as possible. This technique is called multi-dimensional scaling (Borg and Groenen, 1997). Because this technique is based on proximity

Experiments

We used a character database consisting of 200×10 handwritten numerals, each originally represented by 30×48 binary pixels. Out of this dataset, 5 different feature sets are derived: pixel (240 averages of 2×3 pixels), face (216 face distances), Fourier (76 Fourier shape descriptors), Karhumen–Loève (64 weights) and Zernike (47 rotational invariant moments plus 6 morphological features). So in total there are 649 features. See also (van Breukelen et al., 1997).

For each feature set a 2000×2000

Discussion

Table 1 shows that if all classes and all datasets are represented in the subset, a random selection yields almost the same performance as any of the more advanced selection methods and mapping techniques. The bad results for the NN classifier for the feature extraction methods may be explained by the fact that multi-dimensional scaling optimizes the set of distances globally, influencing the NN relations.

Note that the random subset selection needs a training effort of about 1 minute on a Sun

Discussion

Raghavan: When we were both here during the previous conference in the “Pattern Recognition in Practice” series we were talking about a paper by Lev Goldfarb, somewhat related, I think, to multi-dimensional scaling. (Note of the editors: see the discussion in: R.P.W. Duin, D. de Ridder and D.M.J. Tax. Experiments with a Featureless Approach to Pattern Recognition, Pattern Recognition Letters, 18, 1997, pp. 1159–1166.) The results of that paper allowed the type of distances to be more general

References (10)

There are more references available in the full text version of this article.

Cited by (44)

  • On using prototype reduction schemes to optimize dissimilarity-based classification

    2007, Pattern Recognition
    Citation Excerpt :

    The fundamental questions tackled involve (among others) increasing the accuracy of the classifier system, minimizing the time required for training and testing, reducing the effects of the curse of dimensionality, and reducing the effects of the peculiarities of the data distributions. One of the most recent novel developments in this field is the concept of dissimilarity-based classifiers (DBCs) proposed by Duin and his co-authors (see Refs. [3–6,8]). Philosophically, the motivation for DBCs is the following: if we assume that “Similar” objects can be grouped together to form a class, a “class” is nothing more than a set of these “similar” objects.

View all citing articles on Scopus
View full text