Dissimilarity representations allow for building good classifiers
Introduction
The challenge of automatic pattern recognition is to develop computer methods which learn to distinguish among a number of classes represented by examples. First, an appropriate representation of objects should be found. Then, a decision rule can be constructed, which discriminates between different categories and which is able to generalize well (achieve a high accuracy when novel examples appear). One of the possible representations is based on similarity or dissimilarity relations between objects. When properly defined, it might be advantageous for solving class identification problems. Such a recommendation is supported by the fact that (dis)similarities can be considered as a connection between perception and higher-level knowledge, being a crucial factor in the process of human recognition and categorization (Goldstone, 1999; Edelman, 1999; Wharton et al., 1992).
In contrast to this observation, objects are conventionally represented by characteristic features (Duda et al., 2001). In some cases, however, a feasible feature-based description of objects might be difficult to obtain or inefficient for learning purposes, e.g., when experts cannot define features in a straightforward way, when data are high dimensional, or when features consist of both continuous and categorical variables. Then, the use of dissimilarities, built directly on measurements, e.g., based on template matching, is an appealing alternative. Also, in some applications, e.g., 2D shape recognition (Edelman, 1999), the use of dissimilarities makes the problem more viable.
The nearest neighbor method (NN) (Cover and Hart, 1967) is traditionally applied to dissimilarity representations. Although this decision rule is based on local neighborhoods, i.e., one or a few neighbors, it is still computationally expensive, since dissimilarities to all training examples have to be found. Another drawback is that it potentially decreases its performance when the training set is small. To overcome such limitations and improve the recognition accuracy, we propose to replace this method by a more global decision rule. Such a classifier is constructed from a training set represented by the dissimilarities to a set of prototypes, called the representation set. If this set is small, it has the advantage that only a small set of dissimilarities has to be computed for its evaluation, while it may still profit from the accuracy offered by a large training set.
Throughout this paper, all our investigations are devoted to dissimilarity representations, assuming that no other representations (e.g., features) are available for the researcher. The goal of this work is to propose a novel, advantageous approach to learn only from dissimilarity (distance) representations, dealing with classification problems in particular. Our experiments will demonstrate that the tradeoff between the recognition accuracy and the computational effort is significantly improved by using a normal density-based classifier built on dissimilarities instead of the NN rule. This paper is organized as follows. In Section 2, a more detailed description of dissimilarity representations and the decision rules considered are given. Section 3 describes the datasets used and the experiments conducted. The results are discussed in Section 4 and the conclusions are summarized in Section 5. The essential idea of this paper has been published in Electronic Letters (Pękalska, 2001). Some earlier elements of the presented research can be found in (Duin et al., 1999; Pękalska and Duin, 2000).
Section snippets
Learning from dissimilarities
To construct a classifier on dissimilarities, the training set T of size n (having n objects) and the representation set R (Duin, 2000) of size r will be used. R is a set of prototypes covering all classes present. R is chosen to be a subset of T (R⊆T), although, in general, R and T might be disjunct. In the learning process, a classifier is built on the n×r distance matrix D(T,R), relating all training objects to all prototypes. The information on a set S of s new objects is provided in terms
Datasets and the experimental set-up
A number of experiments is conducted to compare the results of the k-NN rule and the RLNC/RQNC built on dissimilarities. They are designed to observe and analyze the behavior of these classifiers in relation to different sizes of the representation and training sets. Smaller representation sets are of interest, because of lower complexity for representation and evaluation of new objects. This is important for the storage purposes, as well as for the computational aspect. Our concern is then how
Results
The generalization error rates of the k-NN rule and the RLNC/RQNC for three datasets are presented in Fig. 3, Fig. 4, Fig. 5. The k-NN results, marked by stars `', are presented on the rc=nc line. The results depend either on the random selection of the representation set (left subplots) or on the MD criterion (right subplots). Since, the k-NN results are worse in case of the MD selection, the k-NN results always refer to the random selection (also in right subplots). The RLNC's (RQNC's)
Discussion and conclusions
Our experiments confirm that the RLNC constructed on the dissimilarity representations D(T,R) nearly always outperforms the k-NN rule based on the same R. This holds for the RQNC as well, provided that each class is represented by a sufficient number of objects. Since the computational complexity (here mainly indicated by the number of prototypes, as explained in Section 2.3) for evaluation of new objects is an important issue, our study is done with such an emphasis. We have found out that for
Acknowledgements
This work is partly supported by the Dutch Organization for Scientific Research (NWO).
References (20)
- et al.
Relational discriminant analysis
Pattern Recognition Lett.
(1999) - et al.
Instance-based learning algorithms
Mach. Learning
(1991) - et al.
Nearest neighbor pattern classification
IEEE Trans. Inf. Theory
(1967) - et al.
Pattern Recognition: A Statistical Approach
(1982) - et al.
Modified Hausdorff distance for object matching
- et al.
Pattern Classification
(2001) Classifiers in almost empty spaces
Representation and Recognition in Vision
(1999)Introduction to Statistical Pattern Recognition
(1990)Similarity
Cited by (193)
Column generation-based prototype learning for optimizing area under the receiver operating characteristic curve
2024, European Journal of Operational ResearchSubscripto multiplex: A Riemannian symmetric positive definite strategy for offline signature verification
2023, Pattern Recognition LettersA dissimilarity-based approach to automatic classification of biosignal modalities
2022, Applied Soft ComputingDissimilarity-based time–frequency distributions as features for epileptic EEG signal classification
2021, Biomedical Signal Processing and ControlDissimilarity-based representations for one-class classification on time series
2020, Pattern Recognition