Reciprocal curves

doi:10.1016/j.csda.2005.09.006

Computational Statistics & Data Analysis

Volume 51, Issue 2, 15 November 2006, Pages 836-858

https://doi.org/10.1016/j.csda.2005.09.006 Get rights and content

Abstract

Partial least squares (PLS) is a modeling technique that has met with considerable success, particularly in the fields of chemometrics and psychometrics. In this paper, we extend linear PLS to a nonlinear version called “reciprocal curves.” Reciprocal curves are smooth one-dimensional curves that pass through the center of the data, are co-consistent and share several optimality properties originally identified with PLS.

Introduction

Linear data analysis techniques are important tools to use when exploring data, but methods that allow for nonlinear structures should provide more relevant or parsimonious models resulting in a more fruitful exploratory analysis. In this paper we extend the notion of partial least squares (PLS) into a nonlinear domain using techniques developed by Hastie and Stuetzle (1989) to extend principal components analysis (PCA) to “principal curves.”

The origins of PLS can be traced to Herman Wold's original nonlinear iterative partial least squares (NIPALS) algorithm (Wold, 1966, Wold, 1982) that was developed to linearize models which were nonlinear in their parameters. The associated ad hoc estimation of parameters was accomplished through an iterative method involving linear least squares modeling and model relaxation (e.g. Varga, 1962). The NIPALS method was adapted for over-determined multiple regression problems (Wold et al., 1983) and that extension of NIPALS became known as PLS, and it is still implemented in various software packages based on the original NIPALS algorithm. Statisticians have adopted different perspectives on the same paradigm, such as discussed in de Jong (1993), Stone and Brooks (1990), Helland (1988), Frank and Friedman (1993), Hinkle (1995), Hinkle and Rayens, 1995, Hinkle and Rayens, 1999, Rayens and Andersen, 2003, Rayens and Andersen, 2004, and Rayens (2000). These views are more consistent with classical multivariate statistical theory and allow for the implementation of PLS by way of solutions to well-posed eigenstructure problems that clearly exhibit the compromise PLS strikes between variance summary and correlation. Unfortunately, this eigenstructure perspective is still not universally known within the community of users; and the effect on the eigenvalue problem of different constraints on the structure is less well-known. We will avoid peripheral discussions of these matters here by suppressing the constraints and defining PLS as follows.

Definition 1

Suppose the random vector $(X, Y)$ has $p + q > 2$ elements, where $X$ is $p$ -dimensional, and associated positive definite covariance matrix $Σ = (\begin{matrix} Σ_{XX} & Σ_{XY} \\ Σ_{XY}^{'} & Σ_{YY} \end{matrix}) .$ The $r$ th pair of PLS component scores, where $1 ⩽ r ⩽ \min (p, q)$ , is the pair of linear combinations, $u_{r} = α_{r}^{t} X$ and $v_{r} = β_{r}^{t} Y$ such that the vectors $α_{r}$ and $β_{r}$ are unit length and maximize $Cov (u_{r}, v_{r})$ . The weight vectors $α_{r}$ and $β_{r}$ are the $r$ th PLS weight vectors.

As Frank and Friedman (1993) noted, the PLS paradigm is simply a penalized version of canonical covariates analysis (CCA) in the sense that $argmax [Cov (α_{r}^{t} X, β_{r}^{t} Y)] = argmax [Var (α_{r}^{t} X)] [{Corr}^{2} (α_{r}^{t} X, β_{r}^{t} Y)] [Var (β_{r}^{t} Y)] .$ From this we see that PLS seeks to summarize the variability in two separate spaces, subject to a penalty that requires that these two resulting summaries, $α_{r}^{t} X$ and $β_{r}^{t} Y$ , are also relatively well correlated. It is not hard to show that the first set of PLS weight vectors is determined by the eigenstructure $Σ_{XY} β_{1} \propto α_{1}, Σ_{XY}^{t} α_{1} \propto β_{1},$ and that subsequent solutions depend on the particular within-space constraints on the structure (see Rayens, 2000, for specific solutions obtained without the use of Lagrange multipliers). Thus the resulting PLS structures are vectors; or linear approximations of geometrical structures underlying the random vector spaces being modeled or described.

A slightly different perspective on PLS, provided by the following proposition, sets up the framework for extending PLS to nonlinear functions:

Proposition 1

The first pair of PLS components, for the random vector $(X, Y)$ minimize the expression $E [{∥X - α α^{t} X∥}^{2} + {∥α^{t} X - β^{t} Y∥}^{2} + {∥Y - β β^{t} Y∥}^{2}] .$

Hinkle (1995) shows that each subsequent pair of weight vectors from Definition 1 minimize the above-mentioned expectation provided these additional directions are constrained to be orthogonal in each space to those already extracted. Regardless of the constraint set, however, the first pair will always minimize the given expectation.

The form of Eq. (3) suggests that any underlying errors-in-variables model corresponding to the PLS paradigm, at least as far as the initial set of weight vectors are concerned, will correspond to three separate models:

$•$
two PCA-type “outer relations” $X = α u + ɛ and Y = β v + ζ$
$•$
and one CCA-type “inner relation” $v = δ \cdot u + ξ,$ where the “component scores” are defined as $u = α^{t} X$ and $v = β^{t} Y$ ; and $ɛ, ζ$ and $ξ$ are independent error terms. Of course, this set of errors-in-variables models is “linear” in several senses. First, and most importantly, the “component structures” defined by the weight vectors $α$ and $β$ are linear, and the relationship between scores $(u, v)$ is assumed linear as well.

In the following sections, a theory for “reciprocal curves” is developed which combines this theoretical perspective on PLS with Hastie's (1984) novel extension of principal components to “principal curves.” Note that in comparison to principal curves, which model a single random vector space, the method developed below is structurally different. In particular, variability in two spaces, each generated by different probability distributions and nonlinear structures, needs to be accounted for with an additional restriction that the resulting structures be optimally related.

Before continuing, we give two brief motivational examples. We will use both to illustrate the PLS approach to modeling, and also to suggest why the nonlinear generalization we are going to develop in Section 3 will be useful.

Section snippets

Wine data—relationship between component scores

These data have been analyzed in the literature before, particularly as an example of PLS linear and nonlinear modeling (see e.g. Frank, 1990, Frank and Kowalski, 1984). The goal is to predict three subjective sensory measurements made by a panel of judges (aroma character, sugar concentration, and flavor character) on 38 Pinot Noir wine samples from the chemical composition based on the 17 elemental concentrations (Cd, Mo, Mn, Ni, Cu, Al, Ba, Cr, Sr, Pb, B, Mg, Si, Na, Ca, P, and K) obtained

Reciprocal curves

Recall, the form of Eq. (3) suggested that PLS essentially corresponds to three separate models: two PCA-type “outer relations” of the form $X = α u + ɛ$ and $Y = β v + ζ$ and one CCA-type “inner relation” of the form $v = δ \cdot u + ξ$ , (see Eqs. (4) and (5)). In an effort to effectively model more general geometric structures in an analogous fashion, one could generalize this perspective on PLS as follows. Assume two “outer relations” $X = f (u) + ɛ and Y = g (v) + ζ$ along with an “inner relation” $v = φ (u) + ξ,$ where $f$ and $g$ are,

An algorithm for reciprocal curves

In this section algorithms for finding reciprocal curves in both the theoretical setting and discrete data setting are presented. The basic method is essentially an iterative one based explicitly on the definition of reciprocal curves (2) and the above-mentioned critical distance result. Briefly stated, both implementations simply project, conditionally average, and then test each concurrent curve for co-consistency and convergence of the variation function.

Wine data—relationship between component scores

Recall, when PLS is used to model these data, the inner relation that is exhibited is monotone, but not definitively straight line, which is all traditional PLS can model at the inner stage. Ideally, one would be able to maintain or increase the score correlation achieved by linear PLS while staying within the modeling paradigm.

The results from fitting reciprocal curves to the element and sensory space data are exhibited in Table 2. Notice that there was a sizeable increase in the achieved

Conclusions

In this paper we have adapted the idea of self-consistency to one of co-consistency, allowing us to develop a coherent theory of reciprocal curves. These curves are both formal nonlinear extensions of linear PLS components and “best fitting” structures. They allow the possibility of modeling both highly nonlinear outer relations and general monotone inner relations. This can facilitate modeling of a curved inner relation without having to resort to an external modeling strategy. More

References (26)

S. de Jong
SIMPLS: an alternative approach to partial least squares regression
Chemometrics and Intelligent Laboratory Systems
(1993)
I. Frank
A nonlinear PLS model
Chemometrics and Intelligent Laboratory Systems
(1990)
I. Frank et al.
Prediction of wine quality and geographic origin from chemical measurements by partial least squares regression modeling
Analytica Chemica Acta
(1984)
I. Frank et al.
ACE: a nonlinear regression model
Chemometrics and Intelligent Laboratory Systems
(1988)
J.E. Hinkle et al.
Partial least squares and compositional data: problems and alternatives
Chemometrics and Intelligent Laboratory Systems
(1995)
W.S. Rayens et al.
Partial least squares as a target-directed structure-seeking technique
Chemometrics and Intelligent Laboratory Systems
(2004)
W.S. Cleveland
Robust locally weighted regression and smoothing scatterplots
J. Amer. Statist. Assoc.
(1979)
G. Cruciani et al.
Predictive ability of regression models. Part. I: Standard deviation of prediction errors
J. Chemometrics
(1992)
I. Frank et al.
Statistical view of chemometrics regression tools
Technometrics
(1993)
Hastie, T.J., 1984. Principal curves and surfaces. Laboratory for Computational Statistics Technical Report 11,...

T. Hastie et al.

Principal curves

J. Amer. Statist. Assoc.

(1989)

T. Hastie et al.

Generalized Additive Models

(1990)

I.S. Helland

On the structure of partial least squares regression

Comm. Statist. Ser.—Simulation and Computation

(1988)

Cited by (0)

View full text

Reciprocal curves

Abstract

Introduction

Section snippets

Wine data—relationship between component scores

Reciprocal curves

An algorithm for reciprocal curves

Wine data—relationship between component scores

Conclusions

Chemometrics and Intelligent Laboratory Systems

Chemometrics and Intelligent Laboratory Systems

Analytica Chemica Acta

Chemometrics and Intelligent Laboratory Systems

Chemometrics and Intelligent Laboratory Systems

Chemometrics and Intelligent Laboratory Systems

Robust locally weighted regression and smoothing scatterplots

J. Amer. Statist. Assoc.

Predictive ability of regression models. Part. I: Standard deviation of prediction errors

J. Chemometrics

Statistical view of chemometrics regression tools

Technometrics

Principal curves

J. Amer. Statist. Assoc.

Generalized Additive Models

On the structure of partial least squares regression

Comm. Statist. Ser.—Simulation and Computation