Dissimilarity representations allow for building good classifiers

doi:10.1016/S0167-8655(02)00024-7

Pattern Recognition Letters

Volume 23, Issue 8, June 2002, Pages 943-956

https://doi.org/10.1016/S0167-8655(02)00024-7 Get rights and content

Abstract

In this paper, a classification task on dissimilarity representations is considered. A traditional way to discriminate between objects represented by dissimilarities is the nearest neighbor method. It suffers, however, from a number of limitations, i.e., high computational complexity, a potential loss of accuracy when a small set of prototypes is used and sensitivity to noise. To overcome these shortcomings, we propose to use a normal density-based classifier constructed on the same representation. We show that such a classifier, based on a weighted combination of dissimilarities, can significantly improve the nearest neighbor rule with respect to the recognition accuracy and computational effort.

Introduction

The challenge of automatic pattern recognition is to develop computer methods which learn to distinguish among a number of classes represented by examples. First, an appropriate representation of objects should be found. Then, a decision rule can be constructed, which discriminates between different categories and which is able to generalize well (achieve a high accuracy when novel examples appear). One of the possible representations is based on similarity or dissimilarity relations between objects. When properly defined, it might be advantageous for solving class identification problems. Such a recommendation is supported by the fact that (dis)similarities can be considered as a connection between perception and higher-level knowledge, being a crucial factor in the process of human recognition and categorization (Goldstone, 1999; Edelman, 1999; Wharton et al., 1992).

In contrast to this observation, objects are conventionally represented by characteristic features (Duda et al., 2001). In some cases, however, a feasible feature-based description of objects might be difficult to obtain or inefficient for learning purposes, e.g., when experts cannot define features in a straightforward way, when data are high dimensional, or when features consist of both continuous and categorical variables. Then, the use of dissimilarities, built directly on measurements, e.g., based on template matching, is an appealing alternative. Also, in some applications, e.g., 2D shape recognition (Edelman, 1999), the use of dissimilarities makes the problem more viable.

The nearest neighbor method (NN) (Cover and Hart, 1967) is traditionally applied to dissimilarity representations. Although this decision rule is based on local neighborhoods, i.e., one or a few neighbors, it is still computationally expensive, since dissimilarities to all training examples have to be found. Another drawback is that it potentially decreases its performance when the training set is small. To overcome such limitations and improve the recognition accuracy, we propose to replace this method by a more global decision rule. Such a classifier is constructed from a training set represented by the dissimilarities to a set of prototypes, called the representation set. If this set is small, it has the advantage that only a small set of dissimilarities has to be computed for its evaluation, while it may still profit from the accuracy offered by a large training set.

Throughout this paper, all our investigations are devoted to dissimilarity representations, assuming that no other representations (e.g., features) are available for the researcher. The goal of this work is to propose a novel, advantageous approach to learn only from dissimilarity (distance) representations, dealing with classification problems in particular. Our experiments will demonstrate that the tradeoff between the recognition accuracy and the computational effort is significantly improved by using a normal density-based classifier built on dissimilarities instead of the NN rule. This paper is organized as follows. In Section 2, a more detailed description of dissimilarity representations and the decision rules considered are given. Section 3 describes the datasets used and the experiments conducted. The results are discussed in Section 4 and the conclusions are summarized in Section 5. The essential idea of this paper has been published in Electronic Letters (Pękalska, 2001). Some earlier elements of the presented research can be found in (Duin et al., 1999; Pękalska and Duin, 2000).

Section snippets

Learning from dissimilarities

To construct a classifier on dissimilarities, the training set T of size n (having n objects) and the representation set R (Duin, 2000) of size r will be used. R is a set of prototypes covering all classes present. R is chosen to be a subset of T (R⊆T), although, in general, R and T might be disjunct. In the learning process, a classifier is built on the n×r distance matrix D(T,R), relating all training objects to all prototypes. The information on a set S of s new objects is provided in terms

Datasets and the experimental set-up

A number of experiments is conducted to compare the results of the k-NN rule and the RLNC/RQNC built on dissimilarities. They are designed to observe and analyze the behavior of these classifiers in relation to different sizes of the representation and training sets. Smaller representation sets are of interest, because of lower complexity for representation and evaluation of new objects. This is important for the storage purposes, as well as for the computational aspect. Our concern is then how

Results

The generalization error rates of the k-NN rule and the RLNC/RQNC for three datasets are presented in Fig. 3, Fig. 4, Fig. 5. The k-NN results, marked by stars ` $∗$ ', are presented on the r_c=n_c line. The results depend either on the random selection of the representation set (left subplots) or on the MD criterion (right subplots). Since, the k-NN results are worse in case of the MD selection, the k-NN results always refer to the random selection (also in right subplots). The RLNC's (RQNC's)

Discussion and conclusions

Our experiments confirm that the RLNC constructed on the dissimilarity representations D(T,R) nearly always outperforms the k-NN rule based on the same R. This holds for the RQNC as well, provided that each class is represented by a sufficient number of objects. Since the computational complexity (here mainly indicated by the number of prototypes, as explained in Section 2.3) for evaluation of new objects is an important issue, our study is done with such an emphasis. We have found out that for

Acknowledgements

This work is partly supported by the Dutch Organization for Scientific Research (NWO).

References (20)

R.P.W. Duin et al.
Relational discriminant analysis
Pattern Recognition Lett.
(1999)
D.W. Aha et al.
Instance-based learning algorithms
Mach. Learning
(1991)
T.M. Cover et al.
Nearest neighbor pattern classification
IEEE Trans. Inf. Theory
(1967)
P.A. Devijver et al.
Pattern Recognition: A Statistical Approach
(1982)
M.P. Dubuisson et al.
Modified Hausdorff distance for object matching
R.O Duda et al.
Pattern Classification
(2001)
R.P.W. Duin
Classifiers in almost empty spaces
S. Edelman
Representation and Recognition in Vision
(1999)
K. Fukunaga
Introduction to Statistical Pattern Recognition
(1990)
R.L. Goldstone
Similarity

There are more references available in the full text version of this article.

Cited by (193)

Column generation-based prototype learning for optimizing area under the receiver operating characteristic curve
2024, European Journal of Operational Research
The traditional classification algorithms focus on the maximization of classification accuracy which might lead to poor performance in practice by forcing classifiers to overfit to the majority class. In order to overcome this issue, various approaches focus on the optimization of alternative loss functions such as the Area Under the Curve (AUC). AUC is a Receiver Operating Characteristics (ROC) metric that has been widely used to measure classification performance, especially when there are class imbalances. In this work, we propose a column generation (CG)-based algorithm called Ranking-CG, which learns a model, similar to the popular Ranking SVM, through approximate maximization of the AUC. Unlike the Ranking SVM, our algorithm utilizes a column generation method that iteratively adds features to control the model complexity effectively working as an internal feature selection procedure. Our experiments show that column generation can be an important tool to prevent overfitting. We extend the Ranking-CG by proposing a prototype generation method, denoted by Ranking-CG Prototype, that constructs reference points by solving a non-linear optimization problem. Based on the extensive experiments conducted on 74 binary classification problems, the Ranking-CG Prototype yields the best average test AUC among all competing methods by using significantly few features than other benchmarks.
Subscripto multiplex: A Riemannian symmetric positive definite strategy for offline signature verification
2023, Pattern Recognition Letters
The human handwritten signature is considered to be a significant biometric trait. In the case of offline signatures, the problem is addressed as an image recognition task. On the other hand, the visual representation of symmetric positive definitive matrices, usually by means of the covariance descriptor of the image feature maps, forms a specific Riemannian manifold with a widespread usage and a favorable performance in a plethora of applications. Surprisingly, no records of offline-signature-verification-oriented research in the space of symmetric positive definitive matrix have been found up to now. In this work, we propose, for the first time in offline signature-verification literature, mapping of handwritten signature images in points of the tangent space of a connected symmetric positive definitive manifold for verification purposes. Furthermore, based on the principles of differential geometry, we address the notorious limited training problem of offline signature verification in this manifold by proposing two different feature augmentation methods. The efficiency of the proposed method is evaluated using three popular datasets of Western and Asian origin. Error rates against skilled and random forgery in both baselines as well augmentation scenarios are strong indicators of the informative and highly discriminative nature of symmetric positive definitive manifold oriented representation.
Dissimilarity-vector spaces based on Dynamic Time Warpings of spectral/time-frequency information for structural health monitoring
2022, Computers and Structures
This paper presents a powerful data representation, so far unexplored for data-based Structural Health Monitoring, relying on the dissimilarity pattern recognition paradigm and the proximity-learning, providing highly discriminant dissimilarity-based vector spaces, —also called generalized dissimilarity kernels—, where any classifier can be trained for damage classification issues. Conventionally, these damage detection tasks involve a domain-dependent and, quite often, non-trivial preprocessing step by which is computed a feature set from each observation. A crucial consequence is that valuable structural information could be lost in this feature extraction step, leading to models with poor performance. In particular, in this paper we introduce a novel type of dissimilarity-based vector spaces for structural health diagnosis, building up them on a direct pairwise comparison between spectral/time–frequency structural information via the Dynamic Time Warping distance, without a previous feature extraction step, for learning one-class classifiers and using only undamaged data during training. The very sound results, using two data sets widely referenced in the scientific literature, clearly show its potential to complement the state-of-art of pattern recognition algorithms that are used on data-based Structural Health Monitoring.
A dissimilarity-based approach to automatic classification of biosignal modalities
2022, Applied Soft Computing
Over the last years, pervasive wearable technology has spread to people’s daily lives, unobtrusively acquiring large amounts of data. Such devices contain biomedical sensors, prone to contribute for the improvement of the user’s quality of life through artificial intelligence algorithms (e.g. health monitoring and emotion recognition). Physiological signals are the basis of such applications, and critical problems are data (un)labeling and incorrect metadata about the source. We propose a framework for the automatic identification of the type of physiological data source, namely Respiration, Electrocardiography, Electrodermal Activity, and Blood Volume Pulse data through the application of Supervised Learning on different representation spaces (feature-based and dissimilarity-based), in both an Online and Offline setting. We build our model through a comprehensive study of (1) Supervised Learning classifiers; (2) Similarity metrics; (3) Data representation; and lastly, (4) Sample aggregation techniques for the creation of the prototypes that will translate the data into the dissimilarity-based space. We explore the aforementioned techniques on two unexplored databases. The experimental results led to accuracies superior to 92% for the online setting, and 96% for the offline setting, attaining competitive results with the current state of the art. Our work paves the way to the development of systems capable of automatically identifying sensor types and subsequently applying the most appropriate data processing, analysis and classification workflows.
Dissimilarity-based time–frequency distributions as features for epileptic EEG signal classification
2021, Biomedical Signal Processing and Control
This work aims at exploring a general framework embedding techniques from classifiers, Time–Frequency Distributions (TFD) and dissimilarity measures for epileptic seizures detection. The proposed approach consists firstly in computing dissimilarities between TFD of electroencephalogram (EEG) signals and secondly in using them to define a decision rule. Compared to the existing approaches, the proposed one uses entire TFD of EEG signals and does not require arbitrary feature extraction. Several dissimilarity measures and TFDs have been compared to select the most appropriate for EEG signals. Classifiers, such as Artificial Neural Network (ANN), Support Vector Machine (SVM), Linear Discriminate Analysis (LDA) and k-Nearest Neighbours (k-NN), have been combined with the proposed approach. In order to evaluate the proposed approach, 13 different classification problems (including 2, 3 and 5-class) pertaining to five types of EEG signals have been used. The comparison between results obtained with the proposed approach and results reported in the literature with the same database of epileptic EEG signals demonstrates the effectiveness of this approach for seizure detection. Experimental results show that this approach has achieved highest accuracy in the most studied classification problems. A high value of 98% is achieved for the 5-class problem. Further, in most classification problems with 2 and 3-class, it also yields a satisfactory accuracy of approximately 100%. The robustness of the proposed approach is evaluated with the addition of noise to the EEG signals at various signal-to-noise ratios (SNRs). The experimental results show that this approach has a good classification accuracy at low SNRs.
Dissimilarity-based representations for one-class classification on time series
2020, Pattern Recognition
In several real-world classification problems it can be impractical to collect samples from classes other than the one of interest, hence the need for classifiers trained on a single class. There is a rich literature concerning binary and multi-class time series classification but less concerning one-class learning.
In this study, we investigate the little-explored one-class time series classification problem. We represent time series as vectors of dissimilarities from a set of time series referred to as prototypes. Based on this approach, we evaluate a Cartesian product of 12 dissimilarity measures, and 8 prototype methods (strategies to select prototypes). Finally, a one-class nearest neighbor classifier is used on the dissimilarity-based representations (DBR).
Experimental results show that DBR are competitive overall when compared with a strong baseline on the data-sets of the UCR/UEA archive. Additionally, DBR enable dimensionality reduction, and visual exploration of data-sets.

View all citing articles on Scopus

View full text

Dissimilarity representations allow for building good classifiers

Abstract

Introduction

Section snippets

Learning from dissimilarities

Datasets and the experimental set-up

Results

Discussion and conclusions

Acknowledgements

Pattern Recognition Lett.

Instance-based learning algorithms

Mach. Learning

Nearest neighbor pattern classification

IEEE Trans. Inf. Theory

Pattern Recognition: A Statistical Approach

Modified Hausdorff distance for object matching

Pattern Classification

Classifiers in almost empty spaces

Representation and Recognition in Vision

Introduction to Statistical Pattern Recognition

Similarity