Reciprocal curves

https://doi.org/10.1016/j.csda.2005.09.006Get rights and content

Abstract

Partial least squares (PLS) is a modeling technique that has met with considerable success, particularly in the fields of chemometrics and psychometrics. In this paper, we extend linear PLS to a nonlinear version called “reciprocal curves.” Reciprocal curves are smooth one-dimensional curves that pass through the center of the data, are co-consistent and share several optimality properties originally identified with PLS.

Introduction

Linear data analysis techniques are important tools to use when exploring data, but methods that allow for nonlinear structures should provide more relevant or parsimonious models resulting in a more fruitful exploratory analysis. In this paper we extend the notion of partial least squares (PLS) into a nonlinear domain using techniques developed by Hastie and Stuetzle (1989) to extend principal components analysis (PCA) to “principal curves.”

The origins of PLS can be traced to Herman Wold's original nonlinear iterative partial least squares (NIPALS) algorithm (Wold, 1966, Wold, 1982) that was developed to linearize models which were nonlinear in their parameters. The associated ad hoc estimation of parameters was accomplished through an iterative method involving linear least squares modeling and model relaxation (e.g. Varga, 1962). The NIPALS method was adapted for over-determined multiple regression problems (Wold et al., 1983) and that extension of NIPALS became known as PLS, and it is still implemented in various software packages based on the original NIPALS algorithm. Statisticians have adopted different perspectives on the same paradigm, such as discussed in de Jong (1993), Stone and Brooks (1990), Helland (1988), Frank and Friedman (1993), Hinkle (1995), Hinkle and Rayens, 1995, Hinkle and Rayens, 1999, Rayens and Andersen, 2003, Rayens and Andersen, 2004, and Rayens (2000). These views are more consistent with classical multivariate statistical theory and allow for the implementation of PLS by way of solutions to well-posed eigenstructure problems that clearly exhibit the compromise PLS strikes between variance summary and correlation. Unfortunately, this eigenstructure perspective is still not universally known within the community of users; and the effect on the eigenvalue problem of different constraints on the structure is less well-known. We will avoid peripheral discussions of these matters here by suppressing the constraints and defining PLS as follows.

Definition 1

Suppose the random vector (X,Y) has p+q>2 elements, where X is p-dimensional, and associated positive definite covariance matrix Σ=ΣXXΣXYΣXYΣYY.The rth pair of PLS component scores, where 1rmin(p,q), is the pair of linear combinations, ur=αrtX and vr=βrtY such that the vectors αr and βr are unit length and maximize Covur,vr. The weight vectors αr and βr are the rth PLS weight vectors.

As Frank and Friedman (1993) noted, the PLS paradigm is simply a penalized version of canonical covariates analysis (CCA) in the sense thatargmaxCovαrtX,βrtY=argmaxVarαrtXCorr2αrtX,βrtYVarβrtY.From this we see that PLS seeks to summarize the variability in two separate spaces, subject to a penalty that requires that these two resulting summaries, αrtX and βrtY, are also relatively well correlated. It is not hard to show that the first set of PLS weight vectors is determined by the eigenstructureΣXYβ1α1,ΣXYtα1β1,and that subsequent solutions depend on the particular within-space constraints on the structure (see Rayens, 2000, for specific solutions obtained without the use of Lagrange multipliers). Thus the resulting PLS structures are vectors; or linear approximations of geometrical structures underlying the random vector spaces being modeled or described.

A slightly different perspective on PLS, provided by the following proposition, sets up the framework for extending PLS to nonlinear functions:

Proposition 1

The first pair of PLS components, for the random vector (X,Y) minimize the expressionEX-ααtX2+αtX-βtY2+Y-ββtY2.

Hinkle (1995) shows that each subsequent pair of weight vectors from Definition 1 minimize the above-mentioned expectation provided these additional directions are constrained to be orthogonal in each space to those already extracted. Regardless of the constraint set, however, the first pair will always minimize the given expectation.

The form of Eq. (3) suggests that any underlying errors-in-variables model corresponding to the PLS paradigm, at least as far as the initial set of weight vectors are concerned, will correspond to three separate models:

  • two PCA-type “outer relations”X=αu+ɛandY=βv+ζ

  • and one CCA-type “inner relation”v=δ·u+ξ,where the “component scores” are defined as u=αtX and v=βtY; and ɛ,ζ and ξ are independent error terms. Of course, this set of errors-in-variables models is “linear” in several senses. First, and most importantly, the “component structures” defined by the weight vectors α and β are linear, and the relationship between scores (u,v) is assumed linear as well.

In the following sections, a theory for “reciprocal curves” is developed which combines this theoretical perspective on PLS with Hastie's (1984) novel extension of principal components to “principal curves.” Note that in comparison to principal curves, which model a single random vector space, the method developed below is structurally different. In particular, variability in two spaces, each generated by different probability distributions and nonlinear structures, needs to be accounted for with an additional restriction that the resulting structures be optimally related.

Before continuing, we give two brief motivational examples. We will use both to illustrate the PLS approach to modeling, and also to suggest why the nonlinear generalization we are going to develop in Section 3 will be useful.

Section snippets

Wine data—relationship between component scores

These data have been analyzed in the literature before, particularly as an example of PLS linear and nonlinear modeling (see e.g. Frank, 1990, Frank and Kowalski, 1984). The goal is to predict three subjective sensory measurements made by a panel of judges (aroma character, sugar concentration, and flavor character) on 38 Pinot Noir wine samples from the chemical composition based on the 17 elemental concentrations (Cd, Mo, Mn, Ni, Cu, Al, Ba, Cr, Sr, Pb, B, Mg, Si, Na, Ca, P, and K) obtained

Reciprocal curves

Recall, the form of Eq. (3) suggested that PLS essentially corresponds to three separate models: two PCA-type “outer relations” of the form X=αu+ɛ and Y=βv+ζ and one CCA-type “inner relation” of the form v=δ·u+ξ, (see Eqs. (4) and (5)). In an effort to effectively model more general geometric structures in an analogous fashion, one could generalize this perspective on PLS as follows. Assume two “outer relations”X=f(u)+ɛandY=g(v)+ζalong with an “inner relation”v=φ(u)+ξ,where f and g are,

An algorithm for reciprocal curves

In this section algorithms for finding reciprocal curves in both the theoretical setting and discrete data setting are presented. The basic method is essentially an iterative one based explicitly on the definition of reciprocal curves (2) and the above-mentioned critical distance result. Briefly stated, both implementations simply project, conditionally average, and then test each concurrent curve for co-consistency and convergence of the variation function.

Wine data—relationship between component scores

Recall, when PLS is used to model these data, the inner relation that is exhibited is monotone, but not definitively straight line, which is all traditional PLS can model at the inner stage. Ideally, one would be able to maintain or increase the score correlation achieved by linear PLS while staying within the modeling paradigm.

The results from fitting reciprocal curves to the element and sensory space data are exhibited in Table 2. Notice that there was a sizeable increase in the achieved

Conclusions

In this paper we have adapted the idea of self-consistency to one of co-consistency, allowing us to develop a coherent theory of reciprocal curves. These curves are both formal nonlinear extensions of linear PLS components and “best fitting” structures. They allow the possibility of modeling both highly nonlinear outer relations and general monotone inner relations. This can facilitate modeling of a curved inner relation without having to resort to an external modeling strategy. More

References (26)

  • T. Hastie et al.

    Principal curves

    J. Amer. Statist. Assoc.

    (1989)
  • T. Hastie et al.

    Generalized Additive Models

    (1990)
  • I.S. Helland

    On the structure of partial least squares regression

    Comm. Statist. Ser.—Simulation and Computation

    (1988)
  • Cited by (0)

    View full text