Supervised Distance Preserving Projections

Zhu, Zhanxing; Similä, Timo; Corona, Francesco

doi:10.1007/s11063-013-9285-x

Supervised Distance Preserving Projections

Published: 09 February 2013

Volume 38, pages 445–463, (2013)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Zhanxing Zhu¹,
Timo Similä² &
Francesco Corona¹

473 Accesses
8 Citations
Explore all metrics

Abstract

In this work, we consider dimensionality reduction in supervised settings and, specifically, we focus on regression problems. A novel algorithm, the supervised distance preserving projection (SDPP), is proposed. The SDPP minimizes the difference between pairwise distances among projected input covariates and distances among responses locally. This minimization of distance differences leads to the effect that the local geometrical structure of the low-dimensional subspace retrieved by the SDPP mimics that of the response space. This, not only facilitates an efficient regressor design but it also uncovers useful information for visualization. The SDPP achieves this goal by learning a linear parametric mapping and, thus, it can easily handle out-of-sample data points. For nonlinear data, a kernelized version of the SDPP is also derived. In addition, an intuitive extension of the SDPP is proposed to deal with classification problems. The experimental evaluation on both synthetic and real-world data sets demonstrates the effectiveness of the SDPP, showing that it performs comparably or superiorly to state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Data-dependent kernel sparsity preserving projection and its application for semi-supervised classification

Article 15 February 2018

Ao Zhang & Xianwen Gao

Cluster Distance-Based Regression

Optimal Projections in the Distance-Based Statistical Methods

Notes

http://staff.science.uva.nl/~aloi/.

References

Asuncion A, Newman D (2007) UCI machine learning repository. Report, University of California, Irvine
Barshan E, Ghodsi A, Azimifar Z, Zolghadri Jahromi M (2010) Zolghadri Jahromi, M.: Supervised principal component analysis: visualization, classification and regression on subspaces and submanifolds. Pattern Recognit 44:1357–1371
Article Google Scholar
Baudat G, Anouar F (2000) Generalized discriminant analysis using a kernel approach. Neural Comput 12(10):2385–2404
Article Google Scholar
Belkin M, Niyogi P (2002) Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in neural information processing systems, Cambridge, pp 585–592
Fukumizu K, Bach FR, Jordan M (2009) Kernel dimension reduction in regression. Ann Stat 37:1871–1905
Article MathSciNet MATH Google Scholar
Globerson A, Roweis S (2005) Metric learning by collapsing classes. In. Advances in neural information processing systems, Vancouver, pp 451–458
Gretton A, Bousquet O, Smola A, Schölkopf B (2005) Measuring statistical dependence with Hilbert–Schmidt norms. In: Algorithmic learning theory, Singapore, pp 63–77
Groenen P, van de Velden M (2004) Multidimensional scaling. Technical Report EI 2004–15, Erasmus University, Rotterdam
Hardoon D, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664
Article MATH Google Scholar
Hofmann T, Schölkopf B, Smola A (2008) Kernel methods in machine learning. Ann Stat 36(3):1171–1220
Article MATH Google Scholar
Lee J, Verleysen M (2007) Nonlinear dimensionality reduction. Springer, New York
Book MATH Google Scholar
Li K (1991) Sliced inverse regression for dimension reduction. J Am Stat Assoc 86(414):316–327
Article MATH Google Scholar
Li L, Liu J (2009) Constrained clustering by spectral kernel learning. In: International conference on computer vision, Kyoto, pp 421–427
Mika S, Ratsch G, Weston J, Scholkopf B, Mullers K (1999) Fisher discriminant analysis with kernels. In: Neural networks for signal processing IX, 1999. Proceedings of the 1999 IEEE signal processing society workshop. IEEE, Piscataway, pp 41–48
Rosipal R, Krämer N (2006) Overview and recent advances in partial least squares. In: Subspace, latent structure and feature selection, Bohinj, pp 34–51
Rosipal R, Trejo L (2002) Kernel partial least squares regression in reproducing kernel Hilbert space. J Mach Learn Res 2:97–123
Google Scholar
Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Article Google Scholar
Schölkopf B, Smola A, Müller K (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural comput 10(5):1299–1319
Article Google Scholar
Sha F, Saul L (2005) Analysis and extension of spectral methods for nonlinear dimensionality reduction. In: Proceedings of the twenty second international conference on, machine learning, Bonn, pp 785–792
Takane Y, Young F, De Leeuw J (1977) Nonmetric individual differences multidimensional scaling: an alternating least squares method with optimal scaling features. Psychometrika 42(1):7–67
Article MATH Google Scholar
Tenenbaum J, Silva V, Langford J (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
Article Google Scholar
Tipping M, Lowe D (1998) Shadow targets: a novel algorithm for topographic projections by radial basis functions. Neurocomputing 19:211–222
Article Google Scholar
Toh K, Todd M, Tutuncu R (1999) Sdpt3a matlab software package for semidefinite programming. Optim Methods Softw 11(12):545–581
Article MathSciNet Google Scholar
Tütüncü R, Toh K, Todd M (2003) Solving semidefinite-quadratic-linear programs using SDPT3. Math Program 95(2):189–217
Article MathSciNet MATH Google Scholar
van der Maaten L, Postma E, van den Herik H (2009) Dimensionality reduction: a comparative review. Technical Report TiCC-TR 2009–005, Tilburg University Technical, Tilburg
Venna J, Kaski S (2007) Comparison of visualization methods for an atlas of gene expression data sets. Inf Vis 6:139–154
Article Google Scholar
Webb A (1995) Multidimensional scaling by iterative majorization using radial basis functions. Pattern Recognit 28:753–759
Article Google Scholar
Weinberger K, Sha F, Saul L (2004) Learning a kernel matrix for nonlinear dimensionality reduction. In: Proceedings of the 21st international conference on, machine learning, Banff
Weinberger K, Blitzer J, Saul L (2006) Distance metric learning for large margin nearest neighbor classification. In: Advances in neural information processing systems, Vancouver
Weinberger K, Sha F, Zhu Q, Saul L (2006) Graph laplacian regularization for large-scale semidefinite programming. In: Advances in neural information processing systems, Vancouver, pp 1489–1496
Wold H (1975) Soft modeling by latent variables: the nonlinear iterative partial least squares approach. Perspectives in probability and statistics, papers in honour of MS Bartlett, pp 520–540
Wu X, So A, Li Z, Li S (2009) Fast graph Laplacian regularized kernel learning via semidefinite-quadratic-linear programming. In: Advances in neural information processing systems, Vancouver, pp 1964–1972 (2009)
Xing E, Ng A, Jordan M, Russell S (2003) Distance metric learning with application to clustering with side-information. In: Advances in neural information processing systems, Vancouver, pp 521–528
Yeh Y, Huang S, Lee Y (2009) Nonlinear dimension reduction with kernel sliced inverse regression. IEEE Trans Knowl Data Eng 21:1590–1603
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information and Computer Science, Aalto University, PO Box 15400, 00076 , Aalto, Finland
Zhanxing Zhu & Francesco Corona
Xtract Ltd, Hevosenkenkä 3, 02600 , Espoo, Finland
Timo Similä

Authors

Zhanxing Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Timo Similä
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Corona
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Francesco Corona.

Appendix A: Derivation of a Compact form for ${\nabla _\mathbf{W }J}$

We want to rewrite the gradient ${\nabla _\mathbf{W }J}= \frac{4}{n} \sum _{ij} \mathbf G _{ij}(\mathbf D _{ij}-{\varvec{\varDelta }}_{ij}) {\varvec{\tau }}_{ij} {\varvec{\tau }}_{ij}^T \mathbf W $ into a more compact form. Firstly, we denote $\mathbf Q =\mathbf G \odot (\mathbf D -{\varvec{\varDelta }})$, where $\odot $ represents the element-wise product of two matrices, symmetric matrix $\mathbf R = \mathbf Q + \mathbf Q ^T$, and $\mathbf S $ is a diagonal matrix with $\mathbf S _{ii} = \sum _{j}\mathbf R _{ij}$. Then, we manipulate ${\nabla _\mathbf{W }J}$ as follows:

$$\begin{aligned} {\nabla _\mathbf{W }J}&= \frac{4}{n} \sum _{ij} \mathbf Q _{ij} (\mathbf{x }_i -\mathbf{x }_j)(\mathbf{x }_i-\mathbf{x }_j)^T \mathbf W \nonumber \\&= \frac{4}{n} \sum _{ij} (\mathbf{x }_i \mathbf Q _{ij}\mathbf{x }_i^T + \mathbf{x }_j \mathbf Q _{ij}\mathbf{x }_j^T - \mathbf{x }_i \mathbf Q _{ij}\mathbf{x }_j^T -\mathbf{x }_j \mathbf Q _{ij}\mathbf{x }_i^T) \mathbf W \nonumber \\&= \frac{4}{n} \left[ \sum _{ij}\mathbf{x }_i(\mathbf Q _{ij} + \mathbf Q _{ji})\mathbf{x }_i^T - \sum _{ij}\mathbf{x }_i(\mathbf Q _{ij} + \mathbf Q _{ji})\mathbf{x }_j^T \right] \mathbf W \nonumber \\&= \frac{4}{n} \left( \sum _{ij}\mathbf{x }_i\mathbf R _{ij}\mathbf{x }_i^T - \sum _{ij}\mathbf{x }_i\mathbf R _{ij}\mathbf{x }_j^T \right) \mathbf W \nonumber \\&= \frac{4}{n} \left( \sum _{i}\mathbf{x }_i \sum _{j}\mathbf R _{ij}\mathbf{x }_j^T - \sum _{ij}\mathbf{x }_i\mathbf R _{ij}\mathbf{x }_j^T \right) \mathbf W \nonumber \\&= \frac{4}{n} \left( \sum _{i}\mathbf{x }_i \mathbf S _{ii} \mathbf{x }_j^T - \sum _{ij}\mathbf{x }_i\mathbf R _{ij}\mathbf{x }_j^T \right) \mathbf W \nonumber \\&= \frac{4}{n} \left( \mathbf X ^T\mathbf S \mathbf X - \mathbf X ^T \mathbf R \mathbf X \right) \mathbf W \nonumber \\&= \frac{4}{n} \mathbf X ^T(\mathbf S - \mathbf R ) \mathbf X \mathbf W , \nonumber \end{aligned}$$

where each row of data matrix $\mathbf X $ is a data point $\mathbf{x }_i$ and $\mathbf L = \mathbf S - \mathbf R $ is the Laplacian matrix.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, Z., Similä, T. & Corona, F. Supervised Distance Preserving Projections. Neural Process Lett 38, 445–463 (2013). https://doi.org/10.1007/s11063-013-9285-x

Download citation

Published: 09 February 2013
Issue Date: December 2013
DOI: https://doi.org/10.1007/s11063-013-9285-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Supervised Distance Preserving Projections

Abstract

Access this article

Similar content being viewed by others

Data-dependent kernel sparsity preserving projection and its application for semi-supervised classification

Cluster Distance-Based Regression

Optimal Projections in the Distance-Based Statistical Methods

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix A: Derivation of a Compact form for \({\nabla _\mathbf{W }J}\)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Supervised Distance Preserving Projections

Abstract

Access this article

Similar content being viewed by others

Data-dependent kernel sparsity preserving projection and its application for semi-supervised classification

Cluster Distance-Based Regression

Optimal Projections in the Distance-Based Statistical Methods

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix A: Derivation of a Compact form for \({\nabla _\mathbf{W }J}\)

Appendix A: Derivation of a Compact form for \({\nabla _\mathbf{W }J}\)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation