Elsevier

Pattern Recognition Letters

Volume 32, Issue 6, 15 April 2011, Pages 816-823
Pattern Recognition Letters

An empirical evaluation on dimensionality reduction schemes for dissimilarity-based classifications

https://doi.org/10.1016/j.patrec.2011.01.009Get rights and content

Abstract

This paper presents an empirical evaluation on the methods of reducing the dimensionality of dissimilarity spaces for optimizing dissimilarity-based classifications (DBCs). One problem of DBCs is the high dimensionality of the dissimilarity spaces. To address this problem, two kinds of solutions have been proposed in the literature: prototype selection (PS) based methods and dimension reduction (DR) based methods. Although PS-based and DR-based methods have been explored separately by many researchers, not much analysis has been done on the study of comparing the two. Therefore, this paper aims to find a suitable method for optimizing DBCs by a comparative study. Our empirical evaluation, obtained with the two approaches for an artificial and three real-life benchmark databases, demonstrates that DR-based methods, such as principal component analysis (PCA) and linear discriminant analysis (LDA) based methods, generally improve the classification accuracies more than PS-based methods. Especially, the experimental results demonstrate that PCA is more useful for the well-represented data sets, while LDA is more helpful for the small sample size problems.

Research highlights

► Prototype selection (PS)/dimension reduction (DR) based methods have been compared. ► Two methods have been evaluated for an artificial, three real-life benchmark data. ► Generally DR-based methods excel PS-based ones in terms of classification accuracy.► PCA based method is more useful for the well-represented data sets. ► LDA based method works better for the SSS problems.

Introduction

One of the most recent and novel developments in the field of statistical pattern recognition (PR) (Jain et al., 2000) is the concept of dissimilarity-based classifications (DBCs) proposed by Duin and his co-authors (Pekalska and Duin, 2005). DBCs are a way of defining classifiers among the classes; and the process is not based on the feature measurements of individual object samples, but rather on a suitable dissimilarity measure among the individual samples. The three major questions we encountered when designing DBCs are summarized as follows: (1) How can prototype subsets be selected (or created) from the training samples? (2) How can the dissimilarities between object samples be measured? (3) How can a classifier in the dissimilarity space be designed?

Numerous strategies have been used to explore these questions (Kim and Oommen, 2007, Pekalska and Duin, 2005, Pekalska et al., 2006). First of all, the prototype subsets can be selected in a simple way by using all of the input vectors as prototypes. In most cases, however, this strategy imposes a computational burden on the classifier. To select the prototype subset that is compact and capable of simultaneously representing the entire data set, Duin and his colleagues (Pekalska and Duin, 2005, Pekalska et al., 2006) discussed a number of methods, where a training set is pruned to yield a set of representation prototypes. On the other hand, by invoking a prototype reduction scheme (PRS), Kim and Oommen (2007) also obtained a representative subset, which is utilized by the DBC. Aside from using PRSs, the authors simultaneously proposed the use of the Mahalanobis distance as the dissimilarity-measurement criterion.

With regard to the second question, investigations have focused on measuring the appropriate dissimilarity by using various lp norms, the modified Hausdorff norms, and traditional PR-based measures, such as those used in template matching and correlation-based analysis (Pekalska and Duin, 2005, Pekalska et al., 2006). Finally, the third question refers to the learning paradigms, especially those which deal with either parametric or nonparametric classifiers. On this subject, the use of many traditional decision classifiers including the k-NN rule and the linear/quadratic normal-density-based classifiers has been reported (Pekalska and Duin, 2005). In the interest of brevity, the details of the second and the third questions are omitted here, but can be found in the concerning literature.

In DBCs, a good selection of prototypes seems to be crucial to succeed with the classification algorithm in the dissimilarity space. The prototypes should avoid redundancies in terms of selection of similar samples, and include as much information as possible. However, it is difficult for us to find the optimal number of prototypes. Furthermore, there is a possibility of losing some useful information for discrimination when selecting prototypes (Bicego et al., 2004, Bunke and Riesen, 2007, Kim and Gao, 2008, Riesen et al., 2007). To avoid these problems, in Bicego et al., 2004, Bunke and Riesen, 2007, Kim and Gao, 2008, Riesen et al., 2007, the authors separately proposed an alternative approach where all of the available samples were selected as prototypes, and, subsequently, a dimension reduction scheme, such as principal component analysis (PCA) or linear discriminant analysis (LDA), was applied to the reduction of dimensionality. That is, they preferred not to directly select the representation prototypes from the training samples; rather, they proposed a way of using the dimension reduction schemes after computing the dissimilarity matrix with the entire set of training samples. This approach seems to be more principled and allows us to completely avoid the problem of finding the optimal number of prototypes.

In this paper, we perform an empirical evaluation on the two approaches of reducing the dimensionality of dissimilarity spaces for optimizing DBCs: prototype selection (PS) based methods and dimension reduction (DR) based methods. In PS-based methods, we first select the representation prototypes from the training data set by resorting to one of the prototype selection methods as described in Kim and Oommen, 2007, Pekalska and Duin, 2005, Pekalska et al., 2006. Then, we compute the dissimilarity matrix in which each individual dissimilarity is computed on the basis of the measures described in Pekalska et al. (2006). Finally, we perform the classification by invoking a classifier built in the dissimilarity space. On the other hand, in DR-based methods, we prefer not to directly select the representation prototypes from the training samples; rather, we employ a way of using a DR, after computing the dissimilarity matrix with the entire set of training samples. Here, the point to be mentioned is how to choose the optimal number of prototypes and the subspace dimensions to be reduced. In PS-based methods, we heuristically select the same number of (or twice as many) prototypes as the number of classes. In DR-based methods, on the other hand, we can use a cumulative proportion technique (Laaksonen and Oja, 1996, Oja, 1983) to choose the subspace dimensions.

The main contribution of this paper is to present an empirical evaluation on the two approaches of reducing the dimensionality of dissimilarity spaces for optimizing DBCs. This evaluation shows that DBCs can be optimized by employing a dimension reduction scheme as well as a prototype selection method. Here, the aim of using the dimension reduction scheme instead of selecting the prototypes is to accommodate some useful information for discrimination and to avoid the problem of finding the optimal number of prototypes. Our experimental results, obtained with the two approaches for an artificial and three real-life benchmark databases, demonstrate that the DR-based methods, such as PCA and LDA-based methods, can generally improve the classification accuracy of DBCs more than the PS-based methods.

The remainder of the paper is organized as follows: In Section 2, after presenting a brief overview of dissimilarity representation and prototype selection methods, we present a scheme for the solution of using the dimension reduction schemes. Following this, in Section 3, we present an experimental set-up for the prototype selection based methods and the dimension reduction based methods. In Section 4, we present the experimental results of an artificial data set and three real-life image databases. Finally, in Section 5, we present our concluding remarks.

Section snippets

Foundations of DBCs

A dissimilarity representation of a set of samples, T={xi}i=1nRd, is based on pairwise comparisons and is expressed, for example, as an n × m dissimilarity matrix, DT,Y[·, ·], where Y={yj}j=1mRd, a prototype set, is extracted from T, and the subscripts of D represent the set of elements on which the dissimilarities are evaluated. Thus, each entry, DT,Y[i, j], corresponds to the dissimilarity between the pairs of objects, xi and yj, where xi  T and yj  Y. Consequently, an object, xi, is represented

Experimental data

PS-based and DR-based methods were tested and compared with each other by conducting experiments for an artificial data set (which is referred to as XOR4: 4-dimensional Exclusive OR), a digit image data set, and two well-known face databases, namely, Nist38 (Wilson and Garris, 1992), AT&T (Samaria et al., 1994), and Yale (Georghiades et al., 2001).

The data set named XOR4, which has been included in the experiment as a baseline data set, was generated from a mixture of four 4-dimensional

Experimental results

The run-time characteristics of the empirical evaluation on the four data sets are reported below and shown in figures and tables. We first investigate the rationality of employing a PS or a DR method in reducing the dimensionality. Then, we present classification accuracies of the PS and DR-based methods for an artificial data set and three real-life databases. Consequently, based on the classification results, we grade a ranking of the methods. Finally, we introduce a comparison of the

Conclusions

In order to reduce the dimensionality of dissimilarity spaces for optimizing dissimilarity-based classifications (DBCs), two kinds of approaches, prototype selection (PS) based methods and dimension reduction (DR) based methods, have been explored separately in the literature. The aim of this paper is to empirically evaluate the two methods in terms of classification accuracy for an artificial, three real-life benchmark databases. It is a well known fact that classification algorithms based on

Acknowledgments

The authors thank the anonymous Referees for their valuable comments, which improved the quality and readability of the paper.

References (23)

  • Fan, R.-E., Chen, P.-H., Lin, C.-J. 2005. Working set selection using the second order information for training SVM....
  • Cited by (8)

    • An empirical study on improving dissimilarity-based classifications using one-shot similarity measure

      2014, Digital Signal Processing: A Review Journal
      Citation Excerpt :

      The advantage of this strategy is that it offers a different way to include expert knowledge on the objects in classifying them [10]. A few of the issues we encounter when designing DBCs are as follows: selecting (creating) the prototype subset from a given data set [18,21,26]; reducing the dimensionality of the dissimilarity space [16,28]; solving non-Euclidean problems in the dissimilarity space (pseudo-Euclidean embedding) [10]; increasing the robustness of the dissimilarity space (or combining dissimilarity representations) [17]; optimizing classification (or clustering) based on dissimilarity increments (i.e., differentiation of dissimilarity distances) [2,13]. In order to explore the other issues, various strategies have been proposed in the literature.

    • A theoretical investigation of several model selection criteria for dimensionality reduction

      2012, Pattern Recognition Letters
      Citation Excerpt :

      Dimensionality reduction plays an important role in feature selection and extraction for pattern recognition problems (Tubbs et al., 1982; Wang and Paliwal, 2003). Especially for those high-dimensional pattern classification tasks, dimensionality reduction based methods can well improve the classification accuracy (Kim, 2011), or improve classifiers’ computational efficiency (Villegas and Paredes, 2011). Factor Analysis (FA) (Anderson and Rubin, 1956) is a widely-used linear technique of dimensionality reduction (Jolliffe, 2002), by modeling the observed multidimensional variable as a low dimensional Gaussian latent variable (or factor) through a linear transform by a factor loading matrix, plus a Gaussian noise vector.

    • Travel-Activity Patterns Identification Using Graph Edit Distance

      2023, IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC
    • Dissimilarity representations using l<inf>p</inf>-norms in Eigen spaces

      2015, Proceedings of the 2015 International Conference on Image Processing, Computer Vision, and Pattern Recognition, IPCV 2015
    • Instance selection

      2015, Intelligent Systems Reference Library
    • On using class-dependent principle component analysis for dissimilarity-based classifications

      2013, Proceedings of the 2013 International Conference on Image Processing, Computer Vision, and Pattern Recognition, IPCV 2013
    View all citing articles on Scopus

    This work was supported by the National Research Foundation of Korea funded by the Korean Government (NRF-2010-0015829). Preliminary versions of this paper were partially presented at the CIARP2008, the 13th Iberoamerican Congress on Pattern Recognition, Havana, Cuba, in September 9–12, 2008, and the ICAART2010, the International Conference on Agents and Artificial Intelligence, Valencia, Spain, in January 22–24, 2010.

    View full text