Skip to main content
Log in

Supervised Distance Preserving Projections

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

In this work, we consider dimensionality reduction in supervised settings and, specifically, we focus on regression problems. A novel algorithm, the supervised distance preserving projection (SDPP), is proposed. The SDPP minimizes the difference between pairwise distances among projected input covariates and distances among responses locally. This minimization of distance differences leads to the effect that the local geometrical structure of the low-dimensional subspace retrieved by the SDPP mimics that of the response space. This, not only facilitates an efficient regressor design but it also uncovers useful information for visualization. The SDPP achieves this goal by learning a linear parametric mapping and, thus, it can easily handle out-of-sample data points. For nonlinear data, a kernelized version of the SDPP is also derived. In addition, an intuitive extension of the SDPP is proposed to deal with classification problems. The experimental evaluation on both synthetic and real-world data sets demonstrates the effectiveness of the SDPP, showing that it performs comparably or superiorly to state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. http://staff.science.uva.nl/~aloi/.

References

  1. Asuncion A, Newman D (2007) UCI machine learning repository. Report, University of California, Irvine

  2. Barshan E, Ghodsi A, Azimifar Z, Zolghadri Jahromi M (2010) Zolghadri Jahromi, M.: Supervised principal component analysis: visualization, classification and regression on subspaces and submanifolds. Pattern Recognit 44:1357–1371

    Article  Google Scholar 

  3. Baudat G, Anouar F (2000) Generalized discriminant analysis using a kernel approach. Neural Comput 12(10):2385–2404

    Article  Google Scholar 

  4. Belkin M, Niyogi P (2002) Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in neural information processing systems, Cambridge, pp 585–592

  5. Fukumizu K, Bach FR, Jordan M (2009) Kernel dimension reduction in regression. Ann Stat 37:1871–1905

    Article  MathSciNet  MATH  Google Scholar 

  6. Globerson A, Roweis S (2005) Metric learning by collapsing classes. In. Advances in neural information processing systems, Vancouver, pp 451–458

  7. Gretton A, Bousquet O, Smola A, Schölkopf B (2005) Measuring statistical dependence with Hilbert–Schmidt norms. In: Algorithmic learning theory, Singapore, pp 63–77

  8. Groenen P, van de Velden M (2004) Multidimensional scaling. Technical Report EI 2004–15, Erasmus University, Rotterdam

  9. Hardoon D, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664

    Article  MATH  Google Scholar 

  10. Hofmann T, Schölkopf B, Smola A (2008) Kernel methods in machine learning. Ann Stat 36(3):1171–1220

    Article  MATH  Google Scholar 

  11. Lee J, Verleysen M (2007) Nonlinear dimensionality reduction. Springer, New York

    Book  MATH  Google Scholar 

  12. Li K (1991) Sliced inverse regression for dimension reduction. J Am Stat Assoc 86(414):316–327

    Article  MATH  Google Scholar 

  13. Li L, Liu J (2009) Constrained clustering by spectral kernel learning. In: International conference on computer vision, Kyoto, pp 421–427

  14. Mika S, Ratsch G, Weston J, Scholkopf B, Mullers K (1999) Fisher discriminant analysis with kernels. In: Neural networks for signal processing IX, 1999. Proceedings of the 1999 IEEE signal processing society workshop. IEEE, Piscataway, pp 41–48

  15. Rosipal R, Krämer N (2006) Overview and recent advances in partial least squares. In: Subspace, latent structure and feature selection, Bohinj, pp 34–51

  16. Rosipal R, Trejo L (2002) Kernel partial least squares regression in reproducing kernel Hilbert space. J Mach Learn Res 2:97–123

    Google Scholar 

  17. Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326

    Article  Google Scholar 

  18. Schölkopf B, Smola A, Müller K (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural comput 10(5):1299–1319

    Article  Google Scholar 

  19. Sha F, Saul L (2005) Analysis and extension of spectral methods for nonlinear dimensionality reduction. In: Proceedings of the twenty second international conference on, machine learning, Bonn, pp 785–792

  20. Takane Y, Young F, De Leeuw J (1977) Nonmetric individual differences multidimensional scaling: an alternating least squares method with optimal scaling features. Psychometrika 42(1):7–67

    Article  MATH  Google Scholar 

  21. Tenenbaum J, Silva V, Langford J (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323

    Article  Google Scholar 

  22. Tipping M, Lowe D (1998) Shadow targets: a novel algorithm for topographic projections by radial basis functions. Neurocomputing 19:211–222

    Article  Google Scholar 

  23. Toh K, Todd M, Tutuncu R (1999) Sdpt3a matlab software package for semidefinite programming. Optim Methods Softw 11(12):545–581

    Article  MathSciNet  Google Scholar 

  24. Tütüncü R, Toh K, Todd M (2003) Solving semidefinite-quadratic-linear programs using SDPT3. Math Program 95(2):189–217

    Article  MathSciNet  MATH  Google Scholar 

  25. van der Maaten L, Postma E, van den Herik H (2009) Dimensionality reduction: a comparative review. Technical Report TiCC-TR 2009–005, Tilburg University Technical, Tilburg

  26. Venna J, Kaski S (2007) Comparison of visualization methods for an atlas of gene expression data sets. Inf Vis 6:139–154

    Article  Google Scholar 

  27. Webb A (1995) Multidimensional scaling by iterative majorization using radial basis functions. Pattern Recognit 28:753–759

    Article  Google Scholar 

  28. Weinberger K, Sha F, Saul L (2004) Learning a kernel matrix for nonlinear dimensionality reduction. In: Proceedings of the 21st international conference on, machine learning, Banff

  29. Weinberger K, Blitzer J, Saul L (2006) Distance metric learning for large margin nearest neighbor classification. In: Advances in neural information processing systems, Vancouver

  30. Weinberger K, Sha F, Zhu Q, Saul L (2006) Graph laplacian regularization for large-scale semidefinite programming. In: Advances in neural information processing systems, Vancouver, pp 1489–1496

  31. Wold H (1975) Soft modeling by latent variables: the nonlinear iterative partial least squares approach. Perspectives in probability and statistics, papers in honour of MS Bartlett, pp 520–540

  32. Wu X, So A, Li Z, Li S (2009) Fast graph Laplacian regularized kernel learning via semidefinite-quadratic-linear programming. In: Advances in neural information processing systems, Vancouver, pp 1964–1972 (2009)

  33. Xing E, Ng A, Jordan M, Russell S (2003) Distance metric learning with application to clustering with side-information. In: Advances in neural information processing systems, Vancouver, pp 521–528

  34. Yeh Y, Huang S, Lee Y (2009) Nonlinear dimension reduction with kernel sliced inverse regression. IEEE Trans Knowl Data Eng 21:1590–1603

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francesco Corona.

Appendix A: Derivation of a Compact form for \({\nabla _\mathbf{W }J}\)

Appendix A: Derivation of a Compact form for \({\nabla _\mathbf{W }J}\)

We want to rewrite the gradient \({\nabla _\mathbf{W }J}= \frac{4}{n} \sum _{ij} \mathbf G _{ij}(\mathbf D _{ij}-{\varvec{\varDelta }}_{ij}) {\varvec{\tau }}_{ij} {\varvec{\tau }}_{ij}^T \mathbf W \) into a more compact form. Firstly, we denote \(\mathbf Q =\mathbf G \odot (\mathbf D -{\varvec{\varDelta }})\), where \(\odot \) represents the element-wise product of two matrices, symmetric matrix \(\mathbf R = \mathbf Q + \mathbf Q ^T\), and \(\mathbf S \) is a diagonal matrix with \(\mathbf S _{ii} = \sum _{j}\mathbf R _{ij}\). Then, we manipulate \({\nabla _\mathbf{W }J}\) as follows:

$$\begin{aligned} {\nabla _\mathbf{W }J}&= \frac{4}{n} \sum _{ij} \mathbf Q _{ij} (\mathbf{x }_i -\mathbf{x }_j)(\mathbf{x }_i-\mathbf{x }_j)^T \mathbf W \nonumber \\&= \frac{4}{n} \sum _{ij} (\mathbf{x }_i \mathbf Q _{ij}\mathbf{x }_i^T + \mathbf{x }_j \mathbf Q _{ij}\mathbf{x }_j^T - \mathbf{x }_i \mathbf Q _{ij}\mathbf{x }_j^T -\mathbf{x }_j \mathbf Q _{ij}\mathbf{x }_i^T) \mathbf W \nonumber \\&= \frac{4}{n} \left[ \sum _{ij}\mathbf{x }_i(\mathbf Q _{ij} + \mathbf Q _{ji})\mathbf{x }_i^T - \sum _{ij}\mathbf{x }_i(\mathbf Q _{ij} + \mathbf Q _{ji})\mathbf{x }_j^T \right] \mathbf W \nonumber \\&= \frac{4}{n} \left( \sum _{ij}\mathbf{x }_i\mathbf R _{ij}\mathbf{x }_i^T - \sum _{ij}\mathbf{x }_i\mathbf R _{ij}\mathbf{x }_j^T \right) \mathbf W \nonumber \\&= \frac{4}{n} \left( \sum _{i}\mathbf{x }_i \sum _{j}\mathbf R _{ij}\mathbf{x }_j^T - \sum _{ij}\mathbf{x }_i\mathbf R _{ij}\mathbf{x }_j^T \right) \mathbf W \nonumber \\&= \frac{4}{n} \left( \sum _{i}\mathbf{x }_i \mathbf S _{ii} \mathbf{x }_j^T - \sum _{ij}\mathbf{x }_i\mathbf R _{ij}\mathbf{x }_j^T \right) \mathbf W \nonumber \\&= \frac{4}{n} \left( \mathbf X ^T\mathbf S \mathbf X - \mathbf X ^T \mathbf R \mathbf X \right) \mathbf W \nonumber \\&= \frac{4}{n} \mathbf X ^T(\mathbf S - \mathbf R ) \mathbf X \mathbf W , \nonumber \end{aligned}$$

where each row of data matrix \(\mathbf X \) is a data point \(\mathbf{x }_i\) and \(\mathbf L = \mathbf S - \mathbf R \) is the Laplacian matrix.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, Z., Similä, T. & Corona, F. Supervised Distance Preserving Projections. Neural Process Lett 38, 445–463 (2013). https://doi.org/10.1007/s11063-013-9285-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-013-9285-x

Keywords

Navigation