Simple and interpretable discrimination

https://doi.org/10.1016/j.csda.2008.11.018Get rights and content

Abstract

A number of approaches have been proposed for constructing alternatives to principal components that are more easily interpretable, while still explaining considerable part of the data variability. One such approach is employed in order to produce interpretable canonical variates and explore their discrimination behavior, which is more complicated as orthogonality with respect to the within-groups sums-of-squares matrix is involved. The proposed simple and interpretable canonical variates are an optimal choice between good and sparse approximation to the original ones, rather than identifying the variables that dominate the discrimination. The numerical algorithms require low computational cost, and are illustrated on the Fisher’s iris data and on moderately large real data.

Introduction

Discriminant analysis (DA) is a descriptive multivariate technique for analyzing grouped data, i.e. data where the observations are divided into a number of groups that usually represent samples from different populations (Fisher, 1936). Recently DA has also been viewed as a promising dimensionality reduction technique. Indeed, the presence of group structure in the data additionally facilitates dimensionality reduction. The best known variety of DA is linear discriminant analysis (LDA), whose central goal is to describe the differences between the groups in terms of canonical variates which are linear combinations of the original variables (Fisher, 1936). LDA requires solving a generalized eigenvalue problem (Golub and Van Loan, 1996, Section 8.7).

The interpretation of the canonical variates is based on the coefficients of the original variables in the linear combinations. The interpretation can be clear and obvious if the coefficients in the loadings vectors take one of a small number of values which includes exact zero. Unfortunately, in many applications this is not the case. The interpretation problem is exacerbated by the fact that there are three types of coefficient, raw, standardized and structure, which can be used to describe the canonical variates (Rencher, 1992, Trendafilov and Jolliffe, 2007).

There are several approaches to the interpretation of the canonical variates, each of which has disadvantages (Rencher, 1992, Trendafilov and Jolliffe, 2007). For example, a modification of LDA, aiming for better discrimination and possibly interpretation, is considered in Krzanowski (1995). In this approach the vectors of coefficients in the canonical variates are constrained to be orthogonal.

The difficulties with interpreting canonical variates are similar to those encountered when interpreting principal components Jolliffe (2002, Section 11). Recently, there has been an increasing interest in approximate, but computationally simple and easy to interpret, versions of principal component analysis (PCA) (Chipman and Gu, 2005, Jolliffe and Uddin, 2000, Jolliffe et al., 2003, Rousson and Gasser, 2004, Vines, 2000). Roughly speaking, the aim of most of these works is to find “components” whose loadings vectors include exact zero coefficients. The reason is twofold: one may need such simplified PCA to aid interpretation and exact zeros make it easier to deal with very large data sets. Thus it is of interest such simplified PCA techniques can be adapted for use in LDA.

In this paper, we employ and adapt the idea of Chipman and Gu (2005). Thus, we seek approximate canonical variates whose loadings vectors can be classified in one of the following three ways: homogeneous, contrast or sparse. The resulting approximate canonical variates should therefore be much easier to interpret than their exact counterparts. In short, the approximate canonical variates

  • require low computational cost,

  • are easy to interpret because most of the loadings are 0s,

  • approximate the original canonical variates.

This is in contrast to the DALASS canonical variates (Trendafilov and Jolliffe, 2007) whose loadings vectors have the largest entries corresponding to the variables that dominate the discrimination (also see Table 5).

In Section 2 we revise the mathematical theory underlying discriminant analysis and in Section 3 illustrate the use of LDA, focusing on the interpretation of loading vectors. The calculation of interpretable canonical variates via interpretable PCA is detailed in Section 4. The ideas of Section 4 are generalized to provide a more direct approach to interpretable LDA in Section 5. This new approach is applied to a couple of data sets in Section 6. Finally some conclusions are given in Section 7.

Section snippets

Mathematical theory of canonical variates

Suppose that measurements on p variables are made and recorded for a total of n observations (cases). We assume that n>p. Further suppose that the n observations are a priori divided into g groups and let ni be the number of individuals in the ith group, i.e. n1+n2++ng=n. Also let the (1×p) vector xij denote the measurements made on the jth individual belonging to the ith group and let the (n×p) data matrix X represent the measurements over all observations.

Consider the following linear

Applications of canonical variates

In this section two datasets, Fisher’s iris data (Fisher, 1936) and Wood’s haddock sounds data (Wood et al., 2005) are analyzed using LDA to illustrate the difficulties in interpreting canonical variates.

Seeking interpretable canonical variates via interpretable PCA

One approach to producing interpretable canonical variates is to make use of a trick proposed by Dhillon et al. (2002). They suggested that the canonical variates can be taken to be the components arising out of standard PCA applied to the between-groups sums-of-squares matrix B. This approach is particularly appropriate when the data set is large as it reduces the computational burden.

Some data preprocessing might be desirable to compensate for not using the within-groups sums-of-squares

Seeking interpretable canonical variates via interpretable LDA

Canonical variates found using LDA share similarities with principal components applied to the between-groups matrix, B. So in principle interpretable canonical variates can be sought by using the same procedures described in Section 4. That is, by finding the homogeneous, contrast or sparse approximate loadings vector that is closest to the loadings vector of the canonical variate. However, for canonical variates, the orthogonality condition is ATWA instead of ATA that is used in PCA. In other

Analysis of the haddock sound data via interpretable LDA

In Section 4.2 an interpretable canonical variates analysis of the haddock sound data was approximated by applying interpretable PCA to the between groups matrix B. In this section the interpretable canonical variates analysis is now repeated making use of the information in the within groups matrix W as well as B. That is, various variants of interpretable LDA were applied to the haddock sound data, both raw and after standardization. The results are given in Table 3.

Once again the choice of η

Concluding remarks

In this paper, a simple approximate numerical procedure for LDA with interpretable loadings vectors has been considered. The simplicity and interpretability features of the loading vectors listed in Section 1 are directly adopted from those proposed for PCA by Chipman and Gu (2005).

This work is rather exploratory, and in a sense, it raises more questions than answers. First, when is it better to carry out the simplification on raw coefficients or standardized coefficients? And are the usual LDA

Acknowledgements

The authors thank Prof Ian Jolliffe for a number of helpful comments and Dr Mark Wood for providing the haddock sound data set introduced in Section 3.2. We are grateful to the Reviewers and Prof Chris Jones for the careful reading of the manuscript and their helpful comments.

References (17)

There are more references available in the full text version of this article.

Cited by (4)

  • Two simple multivariate procedures for monitoring planetary gearboxes in non-stationary operating conditions

    2013, Mechanical Systems and Signal Processing
    Citation Excerpt :

    Some useful classifiers and comparisons of their performance for motor faults can be found in, for example, Timusk et al. [22]. A more extended mathematical theory of canonical discriminant variates was presented by Trendafilov and Vines [23]. Multi-dimensional analysis seems to be more and more attractive nowadays due to increasing availability of number of channels in monitoring systems and advanced computational algorithms.

  • Efficacy of some primary discriminant functions in diagnosing planetary gearboxes

    2013, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
View full text