Skip to main content

Advertisement

Log in

Nonlinear kernel density principal component analysis with application to climate data

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Principal component analysis (PCA) is a well-established tool for identifying the main sources of variation in multivariate data. However, as a linear method it cannot describe complex nonlinear structures. To overcome this limitation, a novel nonlinear generalization of PCA is developed in this paper. The method obtains the nonlinear principal components from ridges of the underlying density of the data. The density is estimated by using Gaussian kernels. Projection onto a ridge of such a density estimate is formulated as a solution to a differential equation, and a predictor-corrector method is developed for this purpose. The method is further extended to time series data by applying it to the phase space representation of the time series. This extension can be viewed as a nonlinear generalization of singular spectrum analysis (SSA). Ability of the nonlinear PCA to capture complex nonlinear shapes and its SSA-based extension to identify periodic patterns from time series are demonstrated on climate data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. The explained variances for the nonlinear principal components were obtained from the covariance matrix of the corresponding scores.

  2. Problem (45) can be equivalently formulated as an orthogonal Procrustes problem. With the matrices defined above, this problem has a unique solution (e.g. Higham 2008).

References

  • Berkooz, G., Holmes, P., Lumley, J.L.: The proper orthogonal decomposition in the analysis of turbulent flows. Annu. Rev. Fluid Mech. 25, 539–575 (1993)

    Article  MathSciNet  Google Scholar 

  • Chacón, J.E., Duong, T., Wand, M.P.: Asymptotics for general multivariate kernel density derivative estimators. Stat. Sin. 21, 807–840 (2011)

    Article  MATH  Google Scholar 

  • Christiansen, B.: The shortcomings of nonlinear principal component analysis in identifying circulation regimes. J. Clim. 18(22), 4814–4823 (2005)

    Article  MathSciNet  Google Scholar 

  • Damon, J.: Generic structure of two-dimensional images under Gaussian blurring. SIAM J. Appl. Math. 59(1), 97–138 (1998)

    Article  MathSciNet  Google Scholar 

  • Delworth, T.L., Broccoli, A.J., Rosati, A., Stouffer, R.J., Balaji, V.: GFDL’s CM2 global coupled climate models. Part I: formulation and simulation characteristics. J. Clim. 19(5), 643–674 (2006)

    Article  Google Scholar 

  • Donoho, D.L., Grimes, C.: Hessian eigenmaps: locally linear embedding techniques for high-dimensional data. Proc. Natl. Acad. Sci. 100(10), 5591–5596 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  • Duong, T.: ks: Kernel density estimation and kernel discriminant analysis for multivariate data in R. J. Stat. Softw. 21(7), 1–16 (2007)

    Article  Google Scholar 

  • Einbeck, J., Tutz, G., Evers, L.: Local principal curves. Stat. Comput. 15(4), 301–313 (2005)

    Article  MathSciNet  Google Scholar 

  • Einbeck, J., Evers, L., Bailer-Jones, C.: Representing complex data using localized principal components with application to astronomical data. In: Gorban, A.N., Kégl, B., Wunsch, D.C., Zinovyev, A.Y. (eds.) Principal Manifolds for Data Visualization and Dimension Reduction, volume 58 of Lecture Notes in Computational Science and Engineering, pp. 178–201. Springer, Berlin, Heidelberg (2008)

  • Genovese, C.R., Perone-Pacifico, M., Verdinelli, I., Wasserman, L.: Nonparametric ridge estimation. Ann. Stat. 42(4), 1511–1545 (2014)

    Article  MATH  MathSciNet  Google Scholar 

  • Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (1996)

    MATH  Google Scholar 

  • Golyandina, N., Nekrutkin, V., Zhigljavsky, A.A.: Analysis of Time Series Structure: SSA and Related Techniques. Chapman and Hall/CRC Press, London (2001)

    Book  Google Scholar 

  • Greengard, L., Strain, J.: The fast Gauss transform. SIAM J. Sci. Stat. Comput. 12(1), 79–94 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  • Higham, N.J.: Functions of Matrices: Theory and Computation. SIAM, Philadelphia (2008)

    Book  Google Scholar 

  • Hsieh, W.W.: Nonlinear multivariate and time series analysis by neural network methods. Rev. Geophys. 42(1), 1–25 (2004)

    Article  Google Scholar 

  • Hsieh, W.W., Hamilton, K.: Nonlinear singular spectrum analysis of the tropical stratospheric wind. Quart. J. R. Meteorol. Soc. 129(592), 2367–2382 (2003)

    Article  Google Scholar 

  • Jaromczyk, J.W., Toussaint, G.T.: Relative neighborhood graphs and their relatives. Proc. IEEE 80(9), 1502–1517 (1992)

    Article  Google Scholar 

  • Jolliffe, I.T.: Principal Component Analysis. Springer-Verlag, Berlin (1986)

    Book  Google Scholar 

  • Kambhatla, N., Leen, K.T.: Dimension reduction by local principal component analysis. Neural Comput. 9(7), 1493–1516 (1997)

    Article  Google Scholar 

  • Kramer, M.A.: Nonlinear principal component analysis using autoassociative neural networks. AIChE J. 37(2), 233–243 (1991)

    Article  Google Scholar 

  • Loève, M.: Probability Theory: Foundations, Random Sequences. van Nostrand, Princeton (1955)

    MATH  Google Scholar 

  • Magnus, J.R.: On differentiating eigenvalues and eigenvectors. Econ. Theory 1(2), 179–191 (1985)

    Article  Google Scholar 

  • Miller, J.: Relative critical sets in \(R^n\) and applications to image analysis. PhD thesis, University of North Carolina (1998)

  • Monahan, A.H.: Nonlinear principal component analysis: tropical Indo-Pacific sea surface temperature and sea level pressure. J. Clim. 14(2), 219–233 (2001)

    Article  Google Scholar 

  • Newbigging, S.C., Mysak, L.A., Hsieh, W.W.: Improvements to the non-linear principal component analysis method, with applications to ENSO and QBO. Atmos.-Ocean 41(4), 291–299 (2003)

    Article  Google Scholar 

  • Ortega, J.M.: Numerical Analysis: A Second Course. SIAM, Philadelphia (1990)

    Book  MATH  Google Scholar 

  • Ozertem, U., Erdogmus, D.: Locally defined principal curves and surfaces. J. Mach. Learn. Res. 12, 1249–1286 (2011)

    MATH  MathSciNet  Google Scholar 

  • Pearson, K.: On lines and planes of closest fit to systems of points in space. Philos. Mag. Ser. 6 2(11), 559–572 (1901)

    Article  Google Scholar 

  • Pulkkinen, S.: Ridge-based method for finding curvilinear structures from noisy data. Comput. Stat. Data Anal. 82, 89–109 (2015)

    Article  MathSciNet  Google Scholar 

  • Pulkkinen, S., Mäkelä, M.M., Karmitsa, N.: A generative model and a generalized trust region Newton method for noise reduction. Comput. Optim. Appl. 57(1), 129–165 (2014)

    Article  MATH  MathSciNet  Google Scholar 

  • Rangayyan, R.M.: Biomedical Signal Analysis: A Case-Study Approach. IEEE Press, New York (2002)

    Google Scholar 

  • Renardy, M., Rogers, R.C.: An introduction to partial differential equations. In: Marsden, J.E., Sirovich, L., Antman, S.S. (eds.) Texts in Applied Mathematics, vol. 13, 2nd edn. Springer-Verlag, New York (2004)

    Google Scholar 

  • Ross, I.: Nonlinear dimensionality reduction methods in climate data analysis. PhD thesis, University of Bristol, United Kingdom (2008)

  • Ross, I., Valdes, P.J., Wiggins, S.: ENSO dynamics in current climate models: an investigation using nonlinear dimensionality reduction. Nonlinear Process. Geophys. 15, 339–363 (2008)

    Article  Google Scholar 

  • Schölkopf, B., Smola, A., Müller, K.-R.: Kernel principal component analysis. In: Gerstner, W., Germond, A., Hasler, M., Nicoud, J.-D. (eds.) Artificial Neural Networks-ICANN’97, volume 1327 of Lecture Notes in Computer Science, pp. 583–588. Springer, Berlin (1997)

  • Scholz, M., Kaplan, F., Guy, C.L., Kopka, J., Selbig, J.: Non-linear PCA: a missing data approach. Bioinformatics 21(20), 3887–3895 (2005)

    Article  Google Scholar 

  • Scholz, M., Fraunholz, M., Selbig, J.: Nonlinear principal component analysis: Neural network models and applications. In: Gorban, A.N., Kégl, B., Wunsch, D.C., Zinovyev, A.Y. (eds.) Principal Manifolds for Data Visualization and Dimension Reduction, volume 58 of Lecture Notes in Computational Science and Engineering, pp. 44–67. Springer, Berlin (2008)

  • Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)

    Article  Google Scholar 

  • Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. J. R. Stat. Soc. 61(3), 611–622 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  • Vautard, R., Yiou, P., Ghil, M.: Singular-spectrum analysis: a toolkit for short, noisy chaotic signals. Physica D 58(1–4), 95–126 (1992)

    Article  Google Scholar 

  • Weare, B.C., Navato, A.R., Newell, E.R.: Empirical orthogonal analysis of Pacific sea surface temperatures. J. Phys. Oceanogr. 6(5), 671–678 (1976)

    Article  Google Scholar 

  • Weinberger, K., Saul, L.: Unsupervised learning of image manifolds by semidefinite programming. Int. J. Comput. Vis. 70(1), 77–90 (2006)

    Article  Google Scholar 

  • Whittlesey, E.F.: Fixed points and antipodal points. Am. Math. Mon. 70(8), 807–821 (1963)

  • Yang, C., Duraiswami, R., Gumerov, N.A., Davis, L.: Improved fast gauss transform and efficient kernel density estimation. In Ninth IEEE International Conference on Computer Vision. volume 1, pp. 664–671. Nice, France (2003)

  • Zhang, Z., Zha, H.: Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM J. Sci. Comput. 26(1), 313–338 (2004)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgments

The author was financially supported by the TUCS Graduate Programme, the Academy of Finland grant No. 255718 and the Finnish Funding Agency for Technology and Innovation (Tekes) grant No. 3155/31/2009. He would also like to thank Prof. Marko Mäkelä, Doc. Napsu Karmitsa and Prof. Hannu Oja (University of Turku) and the reviewers for their valuable comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seppo Pulkkinen.

Appendix

Appendix

In this appendix we give proofs of Theorems 3.2 and 3.3. In the following, we assume that the given set of sample points \(\varvec{Y}=\{\varvec{y}_i\}_{i=1}^N\subset \mathbb { R}^d\) is fixed. The proofs are carried out by making the following simplifying assumption that can be made without loss of generality.

Assumption 7.1

The points \(\varvec{y}_i\) satisfy the condition \(\sum _{i=1}^n\varvec{y }_i=\varvec{0}\).

First, we recall the density estimate defined by equations (20) and (21) with the bandwidth parametrization \(\varvec{H}= h^2\varvec{I}\). That is,

$$\begin{aligned} \hat{p}_h(\varvec{x})= \frac{1}{(2\pi )^{\frac{d}{2}}h^d n}\sum _{i=1}^n\exp \left( -\frac{\Vert \varvec{x}-\varvec{y}_i\Vert ^2}{2h^2}\right) . \end{aligned}$$
(29)

The following limits hold for the logarithm of the Gaussian kernel density estimate and its derivatives as \(h\) approaches infinity. Uniform convergence in a given compact set \(U\) can be verified by showing that the functions are uniformly bounded (i.e. bounded with respect to all \(\varvec{x}\in U\) and all \(h\ge h_0\) for some \(h_0>0\)) and that they have a uniform Lipschitz constant in \(U\) for all \(h\ge h_0\). Under these conditions, this follows from the Arzelà-Ascoli theorem (e.g. Renardy and Rogers 2004). The proof of the following lemma is omitted due to space constraints.

Lemma 7.1

Let \(\hat{p}_h:\mathbb {R}^d\rightarrow \mathbb {R}\) be a Gaussian kernel density estimate and assume \(7.1\). Then

$$\begin{aligned}&\lim _{h\rightarrow \infty }h^2\log {[(2\pi )^{\frac{d}{2}}h^d\hat{p}_h(\varvec{x} )]}=-\frac{1}{2n}\sum _{i=1}^n\Vert \varvec{x}-\varvec{y}_i\Vert ^2, \end{aligned}$$
(30)
$$\begin{aligned}&\lim _{h\rightarrow \infty }h^2\nabla \log \hat{p}_h(\varvec{x})= -\frac{1}{n}\sum _{i=1}^n (\varvec{x}-\varvec{y}_i)= -\varvec{x}, \end{aligned}$$
(31)
$$\begin{aligned}&\lim _{h\rightarrow \infty }\left[ h^4\nabla ^2\log \hat{p}_h(\varvec{x})+h^2 \varvec{I}\right] = \frac{1}{n}\sum _{i=1}^n\varvec{y}_i\varvec{y}_i^T \end{aligned}$$
(32)

for all \(\varvec{x}\in \mathbb {R}^d\). Furthermore, convergence to these limits is uniform in any compact set.

The following two lemmata facilitate the proof of Theorem 3.3.

Lemma 7.2

Let \(\hat{p}_h:\mathbb {R}^d\rightarrow \mathbb {R}\) be a Gaussian kernel density estimate and let Assumptions \(3.2\) and \(7.1\) be satisfied. Denote the eigenvalues of \(\nabla ^2\log {\hat{p}_h}\) by \(\lambda _1(\cdot ;h)\ge \lambda _2(\cdot ;h)\ge \dots \ge \lambda _d(\cdot ;h)\) and the corresponding eigenvectors by \(\{\varvec{w}_i(\cdot ;h)\}_{i=1}^d\). Then for any compact set \(U\subset \mathbb {R}^d\) there exists \(h_0>0\) such that

$$\begin{aligned} \lambda _1(\varvec{x};h)&< 0,\end{aligned}$$
(33)
$$\begin{aligned} \lambda _i(\varvec{x};h)&\ne \lambda _j(\varvec{x};h) \end{aligned}$$
(34)

for all \(\varvec{x}\in U, h\ge h_0\) and \(i,j=1,2,\dots ,r+1\) such that \(i\ne j\). Furthermore, if we define

$$\begin{aligned} \varvec{W}(\varvec{x};h)= [\varvec{w}_1(\varvec{x};h)\quad \varvec{w}_2(\varvec{x};h)\quad \cdots \quad \varvec{w}_r(\varvec{x};h)] \end{aligned}$$

and

$$\begin{aligned} \varvec{V}= [\varvec{v}_1\quad \varvec{v}_2\quad \cdots \quad \varvec{v}_r], \end{aligned}$$

where \(\{\varvec{v}_i\}_{i=1}^r\) denote the eigenvectors of the matrix \(\hat{\varvec{\Sigma }}_{\varvec{Y}}\) defined by Eq. (3) corresponding to its \(r\) greatest eigenvalues, then for all \(\varepsilon >0\) there exists \(h_0>0\) such that

$$\begin{aligned} \Vert \varvec{W}(\varvec{x};h)-\varvec{V}\Vert <\varepsilon \quad \text{ for } \text{ all } \varvec{x}\in U \text{ and } h\ge h_0. \end{aligned}$$
(35)

Proof

Let \(\tilde{\lambda }_1\ge \tilde{\lambda }_2\ge \dots \ge \tilde{\lambda }_d\) denote the eigenvalues of the matrix \(\hat{\varvec{\Sigma }}_{\varvec{ Y}}\) and let \(\{h_k\}\) be some sequence such that \(\lim _{k\rightarrow \infty }h_k= \infty \). By uniform convergence to the limit (32) under Assumption 7.1 and continuity of eigenvalues of a matrix as a function of its elements (e.g. Ortega 1990 Theorem 3.1.2), for all \(\varepsilon >0\) there exists \(k_0\) such that

$$\begin{aligned} |h_k^4\lambda _i(\varvec{x};h_k)+h_k^2-\frac{n-1}{n}\tilde{\lambda }_i|< \varepsilon \end{aligned}$$
(36)

for all \(i=1,2,\dots ,r+1\), \(\varvec{x}\in U\) and \(k\ge k_0\). Consequently, condition (33) holds for all \(\varvec{x}\in U\) for any sufficiently large \(h\) by Assumption 3.2. It also follows from Assumption 3.2, condition (36) and the reverse triangle inequality that for all \(\varepsilon >0\) and \(i,j=1,2,\dots ,r+1\) such that \(i\ne j\) and \(|\tilde{ \lambda }_i-\tilde{\lambda }_j|>\varepsilon \) there exists \(k_1\) such that

$$\begin{aligned}&h_k^4|\lambda _i(\varvec{x};h_k)-\lambda _j(\varvec{x};h_k)| \\ \ge&\left| |h_k^4\lambda _i(\varvec{x};h_k)-h_k^2|-|h_k^4\lambda _j(\varvec{x};h_k)-h_k^2|\right| > \frac{n-1}{n}\varepsilon \end{aligned}$$

for all \(\varvec{x}\in U\) and \(k\ge k_1\). This implies condition (34). Similarly, condition (35) follows from uniform convergence to the limit (32) under Assumption 7.1, condition (34) and continuity of eigenvectors as a function of matrix elements when the eigenvalues are distinct (e.g. Ortega 1990 Theorem 3.1.3). \(\square \)

Lemma 7.3

Assume \(3.2\) and \(7.1\) and define the function

$$\begin{aligned} \tilde{\varvec{W}}(\varvec{x};h)= \varvec{I}-\varvec{W}(\varvec{x};h)\varvec{W}(\varvec{x};h)^T, \end{aligned}$$

where the function \(\varvec{W}\) is defined as in Lemma 7.2, and the set \(S^r_{\infty }\) as in Theorem 3.3. Then the limit

$$\begin{aligned} \lim _{h\rightarrow \infty }h^2\Vert \tilde{\varvec{W}}(\varvec{x};h)\nabla \log { \hat{p}_h}(\varvec{x})\Vert \end{aligned}$$
(37)

exists for all \(\varvec{x}\in \mathbb {R}^d\). Furthermore, \(\varvec{ x}\in S^r_{\infty }\) if and only if the limit (37) is zero.

Proof

By the limits (31) and (35) the limit (37) exists for all \(\varvec{x} \in \mathbb {R}^d\). Furthermore, for any \(\varvec{x}\in \mathbb {R}^d\), the condition that the limit (37) is zero is equivalent to the condition that

$$\begin{aligned} \varvec{v}_i^T\varvec{x}=0\quad \text{ for } \text{ all } i=r+1,r+2,\dots ,d, \end{aligned}$$

where the vectors \(\varvec{v}_i\) are defined as in Lemma 7.2. By the orthogonality of the vectors \(\varvec{v}_i\), the definition of the set \(S^r_{\infty }\) and Assumption 7.1, this condition is equivalent to the condition that \(\varvec{x}\in S^r_{ \infty }\).\(\square \)

For the proof of Theorem 3.3, we define the set

$$\begin{aligned} S^r_h=\{\varvec{x}\in \mathbb {R}^d\mid \Vert \tilde{\varvec{W}}(\varvec{x};h)\nabla \log {\hat{p}_h}(\varvec{x})\Vert =0\}, \end{aligned}$$
(38)

where the function \(\tilde{\varvec{W}}\) is defined as in Lemma 7.3. Under Assumption 7.1, we prove both claims of Theorem 3.3 by the following two lemmata.

Lemma 7.1

Let \(U\subset \mathbb {R}^d\) be a compact set such that \(U\cap S^r_{\infty }\ne \emptyset \) for some \(0\le r<d\). If Assumptions \(3.2\) and \(7.1\) are satisfied, then for all \(\varepsilon >0\) there exists \(h_0>0\) such that

$$\begin{aligned} \sup _{\varvec{x}\in S^r_h\cap U}\inf _{\varvec{y}\in S^r_{\infty }}\Vert \varvec{x}-\varvec{y}\Vert <\varepsilon \quad \text{ for } \text{ all } h\ge h_0. \end{aligned}$$
(39)

Proof

The proof is by contradiction. Let \(0\le r<d\) and let \(U\subset \mathbb {R}^d\) be a compact set such that \(U\cap S^r_{\infty }\ne \emptyset \). Assume that there exists \(\varepsilon _1>0\) such that for all \(h_0>0\) there exists \(h\ge h_0\) such that condition (39) is not satisfied. This implies that for all \(h_0>0\) there exists \(h\ge h_0\) such that

$$\begin{aligned} \inf _{\varvec{y}\in S^r_{\infty }}\Vert \varvec{x}-\varvec{y}\Vert \ge \varepsilon _1\quad \text{ for } \text{ some } \varvec{x}\in S^r_h\cap U. \end{aligned}$$
(40)

Let \(\{\varvec{x}_k\}\) denote a sequence of such points \(\varvec{x}\) with the corresponding sequence \(h_k\). Since the set \(S^r_h\cap U\) is compact by the compactness of \(U\) and the continuity of \(\tilde{\varvec{W}}(\cdot , h)\) in \(U\) for any sufficiently large \(h\), the sequence \(\{\varvec{x}_k\}\) has a convergent subsequence \(\{\varvec{z}_k\}\) whose limit point we shall denote as \(\varvec{z}^*\). Clearly \(\varvec{z}^*\notin S^r_{\infty }\) by condition (40). Thus, by Lemma 7.3 we deduce that for some \(c>0\),

$$\begin{aligned} \lim _{k\rightarrow \infty } \Vert \varvec{F}(\varvec{z}^*;h_k)\Vert \!=\! c, \quad \! \varvec{F}(\varvec{x};h)\!=\! h^2\tilde{\varvec{W}}(\varvec{x};h) \nabla \log {\hat{p}_h}(\varvec{x}). \end{aligned}$$

In view of the definition (38), the above limit implies that there exists \(\varepsilon _2>0\) and \(k_0\) such that for all \(k\ge k_0\),

$$\begin{aligned} \Vert \varvec{F}(\varvec{z}^*;h_k)-\varvec{F}(\varvec{y};h_k)\Vert \ge \varepsilon _2 \quad \text{ for } \text{ all } \varvec{y}\in S^r_{h_k}\cap U. \end{aligned}$$
(41)

On the other hand, if we define the function \(\varvec{F}^*(\varvec{x}) =-(\varvec{I}-\varvec{V}\varvec{V}^T)\varvec{x}\), the triangle inequality yields

$$\begin{aligned}&\Vert \varvec{F}(\varvec{z}^*;h_k)-\varvec{F}(\varvec{y};h_k)\Vert \\&\quad \le \Vert \varvec{F}(\varvec{z}^*;h_k)-\varvec{F}^*(\varvec{z}^*)\Vert + \Vert \varvec{F}(\varvec{y};h_k)-\varvec{F}^*(\varvec{z}^*)\Vert \\&\quad \le \Vert \varvec{F}(\varvec{z}^*;h_k)-\varvec{F}^*(\varvec{z}^*)\Vert + \Vert \varvec{F}(\varvec{y};h_k)-\varvec{F}^*(\varvec{y})\Vert \\&\qquad +\Vert \varvec{F}^*(\varvec{y})-\varvec{F}^*(\varvec{z}^*)\Vert . \end{aligned}$$

Combining this with the inequality

$$\begin{aligned} \Vert \varvec{F}^*(\varvec{y})-\varvec{F}^*(\varvec{z}^*)\Vert \le \Vert \varvec{I}-\varvec{V}\varvec{V}^T\Vert \Vert \varvec{y}-\varvec{z}^*\Vert = \Vert \varvec{y}-\varvec{z}^*\Vert \end{aligned}$$

and noting the convergence of \(\varvec{F}(\cdot ;h_k)\) to the function \(\varvec{F}^*\) (that is uniform in \(U\)) as \(k\rightarrow \infty \) (by Lemmata 7.1 and 7.2), we deduce from (41) that for all \(\varepsilon _2> \varepsilon _3>0\) there exists \(k_1\) such that

$$\begin{aligned} \Vert \varvec{z}^*-\varvec{y}\Vert +\varepsilon _3\ge \Vert \varvec{F}(\varvec{z}^*;h_k)-\varvec{F}(\varvec{y};h_k)\Vert \ge \varepsilon _2 \end{aligned}$$
(42)

for all \(\varvec{y}\in S^r_{h_k}\cap U\) and \(k\ge k_1\).

Condition (42) implies that for all \(0< \varepsilon _3<\varepsilon _2\) there exists \(k_1\) such that

$$\begin{aligned} \inf _{\varvec{y}\in S^r_{h_k}\cap U}\Vert \varvec{z}^*-\varvec{y}\Vert \ge \varepsilon _2-\varepsilon _3\quad \text{ for } \text{ all } k\ge k_1. \end{aligned}$$
(43)

On the other hand, for all \(\varepsilon >0\) we have \(\varvec{z}_k\in B( \varvec{z}^*;\varepsilon )\) for any sufficiently large \(k\) due to the assumption that \(\varvec{z}_k\) converges to \(\varvec{z}^*\). If we choose \(0<\varepsilon <\varepsilon _2\), then the sequence \(\{\varvec{x}_k \}\), whose subsequence is \(\{\varvec{z}_k\}\), has an element \(\varvec{x}_k\notin S^r_{h_k}\cap U\) for some \(k\) by condition (43). This leads to a contradiction with the construction of the sequence \(\{\varvec{x}_k\}\), which states that \(\varvec{x}_k\in S^r_{h_k}\cap U\) for all \(k\). \(\square \)

Lemma 7.5

Let \(\hat{p}_h\) be a Gaussian kernel density estimate, let \(0\le r<d\), let Assumptions \(3.2\) and \(7.1\) be satisfied and define the set \(S^r_{\infty }\) as in Theorem 3.3. Then for any compact set \(U\subset \mathbb {R}^d\) such that \(U\cap S^r_{\infty } \ne \emptyset \) and \(\varepsilon >0\) there exists \(h_0>0\) such that

$$\begin{aligned} \sup _{\varvec{x}\in S^r_{\infty }\cap U}\inf _{\varvec{y}\in \mathcal {R}^r_{\log {\hat{p}_h}}}\Vert \varvec{x}-\varvec{y}\Vert <\varepsilon \quad \text{ for } \text{ all } h\ge h_0. \end{aligned}$$
(44)

Proof

Let \(0\le r<d\) and let \(\{\varvec{v}_i\}_{i=r+1}^d\) denote a set of orthonormal eigenvectors of the matrix \(\hat{\varvec{\Sigma }}_{\varvec{ Y}}\) corresponding to the \(d-r\) smallest eigenvalues. The vectors \(\{\varvec{v}_i\}_{i=r+1}^d\) are uniquely determined up to the choice of basis because the eigenvectors \(\{\varvec{v}_i\}_{i=1}^r\) spanning their orthogonal complement are uniquely determined by Assumption 3.2. Define the sets

$$\begin{aligned} D_{\varvec{x},\varepsilon }= \left\{ \varvec{x}+\sum _{i=r+1}^d\alpha _{i-r}\varvec{v}_i\mid \sum _{i=1}^r\alpha _i^2\le \varepsilon ^2\right\} \end{aligned}$$

and

$$\begin{aligned} D_{\varepsilon }= \bigcup _{\varvec{x}\in S^r_{\infty }\cap U}D_{\varvec{x}, \varepsilon } \end{aligned}$$

for some orthonormal eigenvectors \(\{\varvec{v}_i\}_{i=r+1}^d\) spanning the orthogonal complement of \(\text{ span }(\varvec{v}_1,\varvec{v}_2,\dots , \varvec{v}_r)\).

Let \(\{\varvec{u}_i(\cdot ;h)\}_{i=1}^d\) denote a set of orthonormal vectors that are orthogonal to the eigenvectors \(\{\varvec{w}_i( \cdot ;h)\}_{i=1}^r\) of \(\nabla ^2\log {\hat{p}_h}\) corresponding to the \(r\) greatest eigenvalues. Define the functions

$$\begin{aligned} \varvec{F}(\varvec{x};h)= h^2\varvec{U}(\varvec{x};h)^T\nabla \log {\hat{p}_h(\varvec{x})} \end{aligned}$$

and

$$\begin{aligned} \tilde{\varvec{F}}_{\varvec{x}_0}(\varvec{y};h)= h^2\varvec{U}(\bar{\varvec{V}}\varvec{y}+\varvec{x}_0 ;h)^T\nabla \log {\hat{p}_h(\bar{\varvec{V}}\varvec{y}+\varvec{x }_0)}, \end{aligned}$$

where

$$\begin{aligned} \varvec{U}(\varvec{x};h)= [\varvec{u}_1(\varvec{x};h)\quad \varvec{u}_2(\varvec{x};h) \quad \cdots \quad \varvec{u}_{d-r}(\varvec{x};h)] \end{aligned}$$

and \(\bar{\varvec{V}}=[\varvec{v}_{r+1}\quad \varvec{v}_{r+2}\quad \cdots \quad \varvec{v}_d]\) assuming that the orientation is chosen so that \(\text{ det }(\bar{\varvec{V}})=1\). To fix the orientation of the vectors \(\varvec{u}_i(\varvec{x};h)\), we impose the constraint

$$\begin{aligned} \varvec{U}(\varvec{x};h)= \arg \min _{\varvec{U}'\in Q_{\varvec{x},h}}\Vert \varvec{U}'-\bar{ \varvec{V}}\Vert _F. \end{aligned}$$
(45)

Here \(\Vert \cdot \Vert _F\) denotes the Frobenius norm,

$$\begin{aligned} Q_{\varvec{x},h}&= \{\varvec{U}'\in O(d,d-r)\mid \varvec{U}'^T\varvec{W}(\varvec{x};h)=\varvec{0},\\&\quad \text{ det }(\varvec{U}')=1)\}, \end{aligned}$$

\(O(d,d-r)\) denotes a \(d\times (d-r)\) matrix having orthonormal columns and the matrix \(\varvec{W}(\varvec{x};h)\) is defined as in Lemma 7.2. It can be shown that the function \(\varvec{U }(\cdot ;h)\) is well-defined for any \(h>0\).Footnote 2 Spanning the orthogonal complement of the columns of \(\varvec{W}(\cdot ;h)\), the columns of \(\varvec{U}(\cdot ;h)\) are also continuous in a given compact set when \(\varvec{W}(\cdot ;h)\) is continuous. That is, when condition (34) is satisfied in such a set by Lemma 7.2.

The above definitions and condition (35) in the compact set \(D_{\varepsilon }\) imply that for all \(\varepsilon _1,\varepsilon _2 >0\) there exists \(h_0>0\) such that

$$\begin{aligned} \Vert \varvec{U}(\varvec{x};h)-\bar{\varvec{V}}\Vert <\varepsilon _2 \quad \text{ for } \text{ all } \varvec{x}\in D_{\varepsilon _1} \text{ and } h\ge h_0. \end{aligned}$$

Consequently, uniform convergence to the limit (31) as \(h\rightarrow \infty \) by Lemma 7.3 together with the property that

$$\begin{aligned} \bar{\varvec{V}}^T(\bar{\varvec{V}}\varvec{y}+\varvec{x}_0)= \varvec{y}\quad \text{ for } \text{ all } \varvec{y}\in \mathbb {R}^{d-r} \text{ and } \varvec{x}_0\in S^r_{\infty } \end{aligned}$$

following from Assumption 7.1 implies that for all \(\varepsilon _1,\varepsilon _2>0\) there exists \(h_0>0\) such that

$$\begin{aligned} \begin{aligned}&\Vert \tilde{\varvec{F}}_{\varvec{x}_0}(\varvec{y};h)-(-\varvec{y})\Vert <\varepsilon _2 \\&\text{ for } \text{ all } \varvec{x}_0\in S^r_{\infty }\cap U,\varvec{y}\in \tilde{D}_{\varepsilon _1} \text{ and } h\ge h_0, \end{aligned} \end{aligned}$$

where \(\tilde{D}_{\varepsilon }=\{\varvec{y}\in \mathbb {R}^{d-r}\mid \Vert \varvec{y}\Vert \le \varepsilon \}\).

By the above condition, for any \(0<\varepsilon _2<\varepsilon _1\) there exists \(h_0>0\) such that for all \(h\ge h_0\) and \(\varvec{x}_0\in S^r_{\infty } \cap U\) we have \(-\tilde{\varvec{F}}_{\varvec{x}_0}(\varvec{y};h )^T\varvec{y}>0\) for all \(\varvec{y}\in \partial \tilde{D}_{\varepsilon _1}\), where \(\partial \) denotes the boundary of a set. On the other hand, \(-\varvec{y}\) is the inward-pointing normal vector of the disk \(\tilde{D}_{\varepsilon _1}\) at any \(\varvec{y}\in \partial \tilde{D }_{\varepsilon _1}\). Together with the continuity of \(\tilde{\varvec{F}}_{ \varvec{x}_0}(\cdot ;h)\) in \(\tilde{D}_{\varepsilon _1}\) when \(h\) is sufficiently large, the well-known results from topology (e.g. Whittlesey 1963) then imply that \(\tilde{\varvec{F}}_{\varvec{x}_0}(\cdot ;h)\) has at least one zero point \(\varvec{y}^*\) in the interior of \(\tilde{D}_{ \varepsilon _1}\) for all \(\varvec{x}_0\in S^r_{\infty }\cap U\) and \(h\ge h_0\). Clearly, for any such \(\varvec{y}^*\) and \(\varvec{x}_0\) the point \(\varvec{x}^*=\bar{\varvec{V}}\varvec{y}^*+\varvec{x}_0\) lies in the set \(D_{\varvec{x}_0,\varepsilon }\) and \(\varvec{F}( \varvec{x}^*;h)=\tilde{\varvec{F}}_{\varvec{x}_0}(\varvec{y}^* ;h)=\varvec{0}\).

From the above we conclude that for all \(\varepsilon >0\) there exists \(h_0>0\) such that for all \(\varvec{x}_0\in S^r_{\infty }\cap U\) condition (6a) holds for \(\log {\hat{p}_h}\) at least at one point in \(D_{\varvec{x}_0,\varepsilon }\) for all \(h\ge h_0\). On the other hand, for all \(\varepsilon >0\) conditions (6b) and (6c) are satisfied in the compact set \(D_{\varepsilon }\) for all sufficiently large \(h\) by conditions (33) and (34). Hence, we have proven that for all \(\varepsilon >0\) condition (44) holds for all sufficiently large \(h\). \(\square \)

Proof (Theorem 3.3)

Follows directly from Lemmata 7.4 and 7.5 by the property that \(\mathcal {R}^r_{ \hat{p}_h}=\mathcal {R}^r_{\log {\hat{p}_h}}\subseteq S^r_h\) for all \(0\le r<d\) and \(h>0\) by Lemma 3.1 and Definition 3.1. \(\square \)

Next, we prove Theorem 3.2 under Assumption 7.1 by using the following lemma.

Lemma 7.6

Let \(\hat{p}_h\) be a Gaussian kernel density estimate, assume 7.1 and define the set

$$\begin{aligned} U_h=\bigcup _{i=1}^n\{\varvec{x}\in \mathbb {R}^d\mid \log {\hat{p}_h}( \varvec{x})\ge \log {\hat{p}_h}(\varvec{y}_i)\}. \end{aligned}$$

Then for some \(r>\max _{i=1,2,\dots ,n}\Vert \varvec{y}_i\Vert \) there exists \(h_0>0\) such that \(U_h\subseteq B(\varvec{0};r)\) for all \(h\ge h_0\).

Proof

The proof is by contradiction. Assume that for all \(r>r_0=\max _{i=1,2,\dots ,n}\Vert \varvec{y}_i\Vert \) and \(h_0>0\) there exists \(h\ge h_0\) such that \(\varvec{x}\in U_h\setminus B(\varvec{0};r)\). Let \(\{\varvec{x}_k\}\), \(\{r_k \}\) and \(\{h_k\}\) denote sequences satisfying these properties such that \(\{ r_k\}\) and \(\{h_k\}\) are monotoneusly increasing. This implies that

$$\begin{aligned} \Vert \varvec{x}_k\Vert >r_k>r_0=\max _{i=1,2,\dots ,n}\Vert \varvec{y}_i\Vert \quad \text{ for } \text{ all } k\ge k_0 \end{aligned}$$
(46)

and also that for all \(k\ge k_0\),

$$\begin{aligned} \log {\hat{p}}_{h_k}(\varvec{x}_k)\ge \log {\hat{p}}_{h_k}(\varvec{y }_j)\quad \text{ for } \text{ some } j\in \{1,2,\dots ,n\}. \end{aligned}$$
(47)

By Assumption 7.1 and condition (46) we have that \(\Vert \varvec{x}_k-\varvec{y}_i\Vert \ge \Vert \varvec{x}_k\Vert -r_0\) for all \(k\ge k_0\) and \(i=1,2,\dots ,n\). Consequently,

$$\begin{aligned}&h_k^2\log {\left[ \frac{1}{n}\sum _{i=1}^n\exp \left( -\frac{\Vert \varvec{x}_k- \varvec{y}_i\Vert ^2}{2h_k^2}\right) \right] } \\ \le&h_k^2\log {\left[ \exp \left( -\frac{(\Vert \varvec{x}_k\Vert -r_0)^2}{2h_k^2}\right) \right] }= -\frac{(\Vert \varvec{x}_k\Vert -r_0)^2}{2} \end{aligned}$$

for all \(k\ge k_0\). By equation (29), this implies that

$$\begin{aligned} h_k^2[\log {\hat{p}}_{h_k}(\varvec{x}_k)+\log {[(2\pi )^{\frac{d}{2}}h_k^d] }]\le -\frac{(\Vert \varvec{x}_k\Vert -r_0)^2}{2} \end{aligned}$$
(48)

for all \(k\ge k_0\). On the other hand, by the limit (30), Assumption 7.1 and the choice of \(r_0\) we have

$$\begin{aligned}&\lim _{k\rightarrow \infty }h_k^2[\log {\hat{p}_{h_k}(\varvec{y}_j)}+\log {[(2\pi )^{\frac{d}{2}}h_k^d]}]\nonumber \\&\quad = -\frac{1}{2n}\sum _{i=1}^n\Vert \varvec{y}_j-\varvec{y}_i\Vert ^2 \nonumber \\&\quad = -\frac{1}{2n}\left( \sum _{i=1}^n\Vert \varvec{y}_j\Vert ^2-2\sum _{i=1}^n \varvec{y}_j^T\varvec{y}_i+\sum _{i=1}^n\Vert \varvec{y}_i\Vert ^2\right) \ge -r_0^2\nonumber \\ \end{aligned}$$
(49)

for all \(j=1,2,\dots ,n\). Plugging the limits (48) and (49) into inequality (47) then leads to a contradiction for any sufficiently large \(k\) since \(\lim _{k\rightarrow \infty }\Vert \varvec{x}_k\Vert =\infty \) by condition (46) and the assumption that the sequence \(\{r_k\}\) is monotoneusly increasing.\(\square \)

Proof (Theorem 3.2)

By Lemma 7.6 there exists

$$\begin{aligned} r>\max _{i=1,2,\dots ,n}\Vert \varvec{y}_i\Vert \end{aligned}$$

such that \(U_h\subseteq B(\varvec{0};r)\) for all sufficiently large \(h\). Thus, condition (9) for all \(\varvec{x}\in U_h\) and such \(h\) follows from Assumption 3.2, compactness of the set \(B(\varvec{0};r)\) and Lemma 7.2. Finally, compactness and connectedness of the set \(U_h\) for all sufficiently large \(h\) follows from the strict concavity of \(\log {\hat{p}_h}\) in \(B(\varvec{0};r)\supseteq U_h\) by condition (33).\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pulkkinen, S. Nonlinear kernel density principal component analysis with application to climate data. Stat Comput 26, 471–492 (2016). https://doi.org/10.1007/s11222-014-9539-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-014-9539-0

Keywords

Navigation