Simple and interpretable discrimination

doi:10.1016/j.csda.2008.11.018

Computational Statistics & Data Analysis

Volume 53, Issue 4, 15 February 2009, Pages 979-989

https://doi.org/10.1016/j.csda.2008.11.018 Get rights and content

Abstract

A number of approaches have been proposed for constructing alternatives to principal components that are more easily interpretable, while still explaining considerable part of the data variability. One such approach is employed in order to produce interpretable canonical variates and explore their discrimination behavior, which is more complicated as orthogonality with respect to the within-groups sums-of-squares matrix is involved. The proposed simple and interpretable canonical variates are an optimal choice between good and sparse approximation to the original ones, rather than identifying the variables that dominate the discrimination. The numerical algorithms require low computational cost, and are illustrated on the Fisher’s iris data and on moderately large real data.

Introduction

Discriminant analysis (DA) is a descriptive multivariate technique for analyzing grouped data, i.e. data where the observations are divided into a number of groups that usually represent samples from different populations (Fisher, 1936). Recently DA has also been viewed as a promising dimensionality reduction technique. Indeed, the presence of group structure in the data additionally facilitates dimensionality reduction. The best known variety of DA is linear discriminant analysis (LDA), whose central goal is to describe the differences between the groups in terms of canonical variates which are linear combinations of the original variables (Fisher, 1936). LDA requires solving a generalized eigenvalue problem (Golub and Van Loan, 1996, Section 8.7).

The interpretation of the canonical variates is based on the coefficients of the original variables in the linear combinations. The interpretation can be clear and obvious if the coefficients in the loadings vectors take one of a small number of values which includes exact zero. Unfortunately, in many applications this is not the case. The interpretation problem is exacerbated by the fact that there are three types of coefficient, raw, standardized and structure, which can be used to describe the canonical variates (Rencher, 1992, Trendafilov and Jolliffe, 2007).

There are several approaches to the interpretation of the canonical variates, each of which has disadvantages (Rencher, 1992, Trendafilov and Jolliffe, 2007). For example, a modification of LDA, aiming for better discrimination and possibly interpretation, is considered in Krzanowski (1995). In this approach the vectors of coefficients in the canonical variates are constrained to be orthogonal.

The difficulties with interpreting canonical variates are similar to those encountered when interpreting principal components Jolliffe (2002, Section 11). Recently, there has been an increasing interest in approximate, but computationally simple and easy to interpret, versions of principal component analysis (PCA) (Chipman and Gu, 2005, Jolliffe and Uddin, 2000, Jolliffe et al., 2003, Rousson and Gasser, 2004, Vines, 2000). Roughly speaking, the aim of most of these works is to find “components” whose loadings vectors include exact zero coefficients. The reason is twofold: one may need such simplified PCA to aid interpretation and exact zeros make it easier to deal with very large data sets. Thus it is of interest such simplified PCA techniques can be adapted for use in LDA.

In this paper, we employ and adapt the idea of Chipman and Gu (2005). Thus, we seek approximate canonical variates whose loadings vectors can be classified in one of the following three ways: homogeneous, contrast or sparse. The resulting approximate canonical variates should therefore be much easier to interpret than their exact counterparts. In short, the approximate canonical variates

•
require low computational cost,
•
are easy to interpret because most of the loadings are 0s,
•
approximate the original canonical variates.

This is in contrast to the DALASS canonical variates (Trendafilov and Jolliffe, 2007) whose loadings vectors have the largest entries corresponding to the variables that dominate the discrimination (also see Table 5).

In Section 2 we revise the mathematical theory underlying discriminant analysis and in Section 3 illustrate the use of LDA, focusing on the interpretation of loading vectors. The calculation of interpretable canonical variates via interpretable PCA is detailed in Section 4. The ideas of Section 4 are generalized to provide a more direct approach to interpretable LDA in Section 5. This new approach is applied to a couple of data sets in Section 6. Finally some conclusions are given in Section 7.

Section snippets

Mathematical theory of canonical variates

Suppose that measurements on $p$ variables are made and recorded for a total of $n$ observations (cases). We assume that $n > p$ . Further suppose that the $n$ observations are a priori divided into $g$ groups and let $n_{i}$ be the number of individuals in the $i$ th group, i.e. $n_{1} + n_{2} + \dots + n_{g} = n$ . Also let the ( $1 \times p$ ) vector $x_{i j}$ denote the measurements made on the $j$ th individual belonging to the $i$ th group and let the ( $n \times p$ ) data matrix $X$ represent the measurements over all observations.

Consider the following linear

Applications of canonical variates

In this section two datasets, Fisher’s iris data (Fisher, 1936) and Wood’s haddock sounds data (Wood et al., 2005) are analyzed using LDA to illustrate the difficulties in interpreting canonical variates.

Seeking interpretable canonical variates via interpretable PCA

One approach to producing interpretable canonical variates is to make use of a trick proposed by Dhillon et al. (2002). They suggested that the canonical variates can be taken to be the components arising out of standard PCA applied to the between-groups sums-of-squares matrix $B$ . This approach is particularly appropriate when the data set is large as it reduces the computational burden.

Some data preprocessing might be desirable to compensate for not using the within-groups sums-of-squares

Seeking interpretable canonical variates via interpretable LDA

Canonical variates found using LDA share similarities with principal components applied to the between-groups matrix, $B$ . So in principle interpretable canonical variates can be sought by using the same procedures described in Section 4. That is, by finding the homogeneous, contrast or sparse approximate loadings vector that is closest to the loadings vector of the canonical variate. However, for canonical variates, the orthogonality condition is $A^{T} W A$ instead of $A^{T} A$ that is used in PCA. In other

Analysis of the haddock sound data via interpretable LDA

In Section 4.2 an interpretable canonical variates analysis of the haddock sound data was approximated by applying interpretable PCA to the between groups matrix $B$ . In this section the interpretable canonical variates analysis is now repeated making use of the information in the within groups matrix $W$ as well as $B$ . That is, various variants of interpretable LDA were applied to the haddock sound data, both raw and after standardization. The results are given in Table 3.

Once again the choice of $η$

Concluding remarks

In this paper, a simple approximate numerical procedure for LDA with interpretable loadings vectors has been considered. The simplicity and interpretability features of the loading vectors listed in Section 1 are directly adopted from those proposed for PCA by Chipman and Gu (2005).

This work is rather exploratory, and in a sense, it raises more questions than answers. First, when is it better to carry out the simplification on raw coefficients or standardized coefficients? And are the usual LDA

Acknowledgements

The authors thank Prof Ian Jolliffe for a number of helpful comments and Dr Mark Wood for providing the haddock sound data set introduced in Section 3.2. We are grateful to the Reviewers and Prof Chris Jones for the careful reading of the manuscript and their helpful comments.

References (17)

I.S. Dhillon et al.
Class visualization of high-dimensional data with applications
Computational Statistics and Data Analysis
(2002)
J. Duintjer Tebbens et al.
Improving implementation of linear discriminant analysis for the high dimension/small sample size problem
Computational Statistics and Data Analysis
(2007)
N.T. Trendafilov et al.
Projected gradient approach to the numerical solution of the SCoTLASS
Computational Statistics and Data Analysis
(2006)
N.T. Trendafilov et al.
Variable selection in discriminant analysis via the LASSO
Computational Statistics and Data Analysis
(2007)
H.A. Chipman et al.
Interpretable dimension reduction
Journal of Applied Statistics
(2005)
R.A. Fisher
The use of multiple measurements in taxonomic problems
Annals of Eugenics
(1936)
G.H. Golub et al.
Matrix Computations
(1996)
I.T. Jolliffe
Principal Component Analysis
(2002)

There are more references available in the full text version of this article.

Cited by (4)

Two simple multivariate procedures for monitoring planetary gearboxes in non-stationary operating conditions
2013, Mechanical Systems and Signal Processing
Citation Excerpt :
Some useful classifiers and comparisons of their performance for motor faults can be found in, for example, Timusk et al. [22]. A more extended mathematical theory of canonical discriminant variates was presented by Trendafilov and Vines [23]. Multi-dimensional analysis seems to be more and more attractive nowadays due to increasing availability of number of channels in monitoring systems and advanced computational algorithms.
This paper deals with the diagnostics of planetary gearboxes under nonstationary operating conditions. In most diagnostics applications, energy of vibration signals (calculated directly from time series or extracted from spectral representation of signal) is used. Unfortunately energy based features are sensitive to load conditions and it makes diagnostics difficult.
In this paper we used energy based 15D data vectors (namely spectral amplitudes of planetary mesh frequency and its harmonics) in order to investigate if it is possible to improve diagnostics efficiency in comparison to previous, one dimensional, approaches proposed for the same problem. Two multivariate methods, Principal Component Analysis (PCA) and Canonical Discriminant Analysis (CDA), were used as techniques for data analysis. We used these techniques in order to investigate dimensionality of the data and to visualize data in 3D and 2D spaces in order to understand data behavior and assess classification ability. As a case study the data from two planetary gearboxes used in complex mining machines (one in bad condition and the other in good condition) were analyzed. For these two machines more than 2000 15D vectors were acquired. It should be noted that due to non-stationarity of loading conditions, previous diagnostics results obtained using other techniques were moderately good (ca. 80% recognition efficiency); however there is still some need to improve diagnostics classification ability. After application of the proposed approaches it was found that the entire data could be reduced to 2 dimensions whereby data instances became visible and a good discriminant function (characterized by a misclassification rate of .0023, i.e. only 5 erroneous classifications for a total of 2183 instances) could be derived.
This paper suggests a novel way for condition monitoring of planetary gearboxes based on multivariate statistics. The emphasis is put on the algebraic and geometric interpretations of the PCA. In the second approach, the CDA method has been proposed for the first time in such a context. It should be noted that existing PCA based approaches already proposed in literature also use PCA for data reduction, but they do not analyse their geometry after projection. Moreover, they considered simple laboratory data, with artificially introduced local damage; they were not applied to real case study with distributed form of wear as presented here. It should be added that just a few works may be found in the context of planetary gearbox, time varying load and multivariate statistics. So, we believe that the data processing procedure proposed here may be interesting both for scientists and engineers.
Recipes for sparse LDA of horizontal data
2016, Metron
Classification of biomedical signals for differential diagnosis of Raynaud's phenomenon
2014, Journal of Applied Statistics
Efficacy of some primary discriminant functions in diagnosing planetary gearboxes
2013, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

View full text

Simple and interpretable discrimination

Abstract

Introduction

Section snippets

Mathematical theory of canonical variates

Applications of canonical variates

Seeking interpretable canonical variates via interpretable PCA

Seeking interpretable canonical variates via interpretable LDA

Analysis of the haddock sound data via interpretable LDA

Concluding remarks

Acknowledgements

Computational Statistics and Data Analysis

Computational Statistics and Data Analysis

Computational Statistics and Data Analysis

Computational Statistics and Data Analysis

Interpretable dimension reduction

Journal of Applied Statistics

The use of multiple measurements in taxonomic problems

Annals of Eugenics

Matrix Computations

Principal Component Analysis