Relational discriminant analysis

doi:10.1016/S0167-8655(99)00085-9

Pattern Recognition Letters

Volume 20, Issues 11–13, November 1999, Pages 1175-1181

https://doi.org/10.1016/S0167-8655(99)00085-9 Get rights and content

Abstract

Relational discriminant analysis is based on a proximity description of the data. Instead of features, the similarities to a subset of the objects in the training data are used for representation. In this paper we will show that this subset might be small and that its exact choice is of minor importance. Moreover, it is shown that linear or non-linear methods for feature extraction based on multi-dimensional scaling are not, or just hardly better than subsets. Selection drastically simplifies the problem of dimension reduction. Relational discriminant analysis may thus be a valuable pattern recognition tool for applications in which the choice of the features is uncertain.

Introduction

In statistical pattern recognition, objects are traditionally represented by features. Recently (Duin et al., 1997) we argued that formally objects might also be represented by a proximity measure (distances, similarities) to a set of prototypes or support objects. This leads to a featureless approach to pattern recognition in which the application expert expresses the domain knowledge by defining the proximity measure instead of by defining a set of features. Finding classifiers between classes represented in this way is called relational discriminant analysis.

This approach has some resemblance with one of the early methods for pattern recognition called template matching. That method is also featureless and is also based on some proximity measure. What is discussed here goes much further as we will build discriminant functions on similarities and try to reduce the complexity of the representation. In this study the similarity to particular objects in the training set will take over the role of features of the traditional feature based methods.

The starting point of this analysis is an m×m matrix D between all m training objects and a corresponding set of m labels Λ. We can relate them uniquely by a weight vector w as $Dw=Λ, so w=D^{−1} Λ,$ if rank(D)=m. For a new object x, represented as a set of similarities to the m training objects, the label can now be estimated as $λ_{x} =x^{T} w.$

The problem with this discriminant is that it may have a small generalization power as it has no noise averaging possibilities. If a smaller representation set $R$ is used, having n≪m objects, then the corresponding D_r has size m×n and the weight vector has a reduced size of n weights. Eq. (1) can now be based on a mean square error procedure (Fisher’s Linear Discriminant), but also other discriminant functions may be used.

A dimension reduction is important for two reasons. First, the accuracy of a classifier improves if it is trained by m objects but represented in a space with dimensionality n≪m, due to the curse of dimensionality (Jain and Chandrasekaran, 1987). Secondly, it reduces the complexity of measurements and computations in classifying new objects as they have just to be represented by their similarities to the objects in R only. In this way, the representation set replaces the traditional feature set.

In this paper we will show that in a practical application the selection of objects is (almost) as good as feature extraction by linear as well as non-linear methods for multi-dimensional scaling. Next we will analyze how critical the choice of the reduction R is. We will show that it is hard to improve the performance based on just a random selection. This makes relational discriminant analysis a very simple procedure.

Section snippets

Dimension reduction by the selection of objects

As stated, the reduction of the number of columns in D from m to n by selecting a representation set is almost identical to the feature selection problem. There is, however, one important difference between the similarities to a representation set and a feature set: features can differ largely; some features may even be unique. The similarities to the objects in the representation set, on the other hand, have a uniform interpretation. They can be very similar, as they arise from the same

Feature extraction by multi-dimensional scaling

Instead of selecting objects, a dimension reduction in the relational representation can be achieved by mapping the original set of objects on a feature space of a given reduced dimensionality. In order to preserve the original structure defined by a proximity matrix, we will demand that the distances in the new space reflect these proximities as well as possible. This technique is called multi-dimensional scaling (Borg and Groenen, 1997). Because this technique is based on proximity

Experiments

We used a character database consisting of 200×10 handwritten numerals, each originally represented by 30×48 binary pixels. Out of this dataset, 5 different feature sets are derived: pixel (240 averages of 2×3 pixels), face (216 face distances), Fourier (76 Fourier shape descriptors), Karhumen–Loève (64 weights) and Zernike (47 rotational invariant moments plus 6 morphological features). So in total there are 649 features. See also (van Breukelen et al., 1997).

For each feature set a 2000×2000

Discussion

Table 1 shows that if all classes and all datasets are represented in the subset, a random selection yields almost the same performance as any of the more advanced selection methods and mapping techniques. The bad results for the NN classifier for the feature extraction methods may be explained by the fact that multi-dimensional scaling optimizes the set of distances globally, influencing the NN relations.

Note that the random subset selection needs a training effort of about 1 minute on a Sun

Discussion

Raghavan: When we were both here during the previous conference in the “Pattern Recognition in Practice” series we were talking about a paper by Lev Goldfarb, somewhat related, I think, to multi-dimensional scaling. (Note of the editors: see the discussion in: R.P.W. Duin, D. de Ridder and D.M.J. Tax. Experiments with a Featureless Approach to Pattern Recognition, Pattern Recognition Letters, 18, 1997, pp. 1159–1166.) The results of that paper allowed the type of distances to be more general

References (10)

R.P.W Duin et al.
Experiments with a featureless approach to pattern recognition
Pattern Recognition Letters
(1997)
H Niemann
Linear and non-linear mappings of patterns
Pattern Recognition
(1980)
I Borg et al.
Modern Multi-dimensional Scaling
(1997)
Duin, R.P.W., 1998. Relational discriminant analysis and its large sample size problem. In: Jain, A.K., Venkatesh, S.,...
Duin, R.P.W., de Ridder, D., 1997. Neural network experiences between perceptrons and support vectors. In: Clark, A.F....

There are more references available in the full text version of this article.

Cited by (44)

Generative models for similarity-based classification
2008, Pattern Recognition
A maximum-entropy approach to generative similarity-based classifiers model is proposed. First, a descriptive set of similarity statistics is assumed to be sufficient for classification. Then the class-conditional distributions of these descriptive statistics are estimated as the maximum-entropy distributions subject to empirical moment constraints. The resulting exponential class-conditional distributions are used in a maximum a posteriori decision rule, forming the similarity discriminant analysis (SDA) classifier. Simulated and real data experiments compare performance to the k-nearest neighbor classifier, the nearest-centroid classifier, and the potential support vector machine (PSVM).
On using prototype reduction schemes to optimize dissimilarity-based classification
2007, Pattern Recognition
Citation Excerpt :
The fundamental questions tackled involve (among others) increasing the accuracy of the classifier system, minimizing the time required for training and testing, reducing the effects of the curse of dimensionality, and reducing the effects of the peculiarities of the data distributions. One of the most recent novel developments in this field is the concept of dissimilarity-based classifiers (DBCs) proposed by Duin and his co-authors (see Refs. [3–6,8]). Philosophically, the motivation for DBCs is the following: if we assume that “Similar” objects can be grouped together to form a class, a “class” is nothing more than a set of these “similar” objects.
The aim of this paper is to present a strategy by which a new philosophy for pattern classification, namely that pertaining to dissimilarity-based classifiers (DBCs), can be efficiently implemented. This methodology, proposed by Duin and his co-authors (see Refs. [Experiments with a featureless approach to pattern recognition, Pattern Recognition Lett. 18 (1997) 1159–1166; Relational discriminant analysis, Pattern Recognition Lett. 20 (1999) 1175–1181; Dissimilarity representations allow for buillding good classifiers, Pattern Recognition Lett. 23 (2002) 943–956; Dissimilarity representations in pattern recognition, Concepts, theory and applications, Ph.D. Thesis, Delft University of Technology, Delft, The Netherlands, 2005; Prototype selection for dissimilarity-based classifiers, Pattern Recognition 39 (2006) 189–208]), is a way of defining classifiers between the classes, and is not based on the feature measurements of the individual patterns, but rather on a suitable dissimilarity measure between them. The advantage of this methodology is that since it does not operate on the class-conditional distributions, the accuracy can exceed the Bayes’ error bound. The problem with this strategy is, however, the need to compute, store and process the inter-pattern dissimilarities for all the training samples, and thus, the accuracy of the classifier designed in the dissimilarity space is dependent on the methods used to achieve this. In this paper, we suggest a novel strategy to enhance the computation for all families of DBCs. Rather than compute, store and process the DBC based on the entire data set, we advocate that the training set be first reduced into a smaller representative subset. Also, rather than determine this subset on the basis of random selection, or clustering, etc., we advocate the use of a prototype reduction scheme (PRS), whose output yields the points to be utilized by the DBC. The rationale for this is explained in the paper. Apart from utilizing PRSs, in the paper we also propose simultaneously employing the Mahalanobis distance as the dissimilarity-measurement criterion to increase the DBCs classification accuracy. Our experimental results demonstrate that the proposed mechanism increases the classification accuracy when compared with the “conventional” approaches for samples involving real-life as well as artificial data sets—even though the resulting dissimilarity criterion is not symmetric.
Development of advanced quantitative analysis methods for wear particle characterization and classification to aid tribological system diagnosis
2005, Tribology International
Classification of wear particles is performed in two steps, i.e. first a particle to be classified is characterized by surface feature parameters such as roughness, directionality, homogeneity, periodicity, etc. and then, the particle is assigned to a specific class using these parameters. However, a significant limitation of this approach is that surface parameters are often not unique to a specific surface topography and their values may change significantly with scale, orientation angle and position at which the particle data was acquired. Various attempts were made to overcome this limitation by selecting a core set of parameters which ensures that wear particles are accurately classified. However, the parameter selection is usually cumbersome and requires lengthy computation. Furthermore, there is no guarantee that parameters selected are sensitive enough to separate particles belonging to different classes. Thus, a new classification technique based entirely on dissimilarity measures (e.g. Euclidean, Baddeley's distances), calculated between surface images of an unclassified particle and classified particles, was developed. The classification process is based on assigning a particle to a class of particles with the smallest dissimilarity measure. This idea arises naturally from the two facts: (i) compactness, similar objects are in close proximity to each other in their representation space, while different objects are far apart, and (ii) true representation, if objects are close to each other in their representation space they belong to the same class. In this paper, an overview of recent advances and developments in the area of particle classification based on dissimilarity measures is presented with a particular emphasis on constructing a simple and accurate classifier.
Relational Fisher Analysis: Dimensionality Reduction in Relational Data with Global Convergence †
2023, Algorithms
The basic assembly of skeletal models in the fall detection problem
2023, Computer Optics
Statistical and geometrical approaches to homogeneity testing
2022, CEUR Workshop Proceedings

View all citing articles on Scopus

View full text

Relational discriminant analysis

Abstract

Introduction

Section snippets

Dimension reduction by the selection of objects

Feature extraction by multi-dimensional scaling

Experiments

Discussion

Discussion

Pattern Recognition Letters

Pattern Recognition

Modern Multi-dimensional Scaling