Abstract
In this work, we consider dimensionality reduction in supervised settings and, specifically, we focus on regression problems. A novel algorithm, the supervised distance preserving projection (SDPP), is proposed. The SDPP minimizes the difference between pairwise distances among projected input covariates and distances among responses locally. This minimization of distance differences leads to the effect that the local geometrical structure of the low-dimensional subspace retrieved by the SDPP mimics that of the response space. This, not only facilitates an efficient regressor design but it also uncovers useful information for visualization. The SDPP achieves this goal by learning a linear parametric mapping and, thus, it can easily handle out-of-sample data points. For nonlinear data, a kernelized version of the SDPP is also derived. In addition, an intuitive extension of the SDPP is proposed to deal with classification problems. The experimental evaluation on both synthetic and real-world data sets demonstrates the effectiveness of the SDPP, showing that it performs comparably or superiorly to state-of-the-art approaches.
Similar content being viewed by others
References
Asuncion A, Newman D (2007) UCI machine learning repository. Report, University of California, Irvine
Barshan E, Ghodsi A, Azimifar Z, Zolghadri Jahromi M (2010) Zolghadri Jahromi, M.: Supervised principal component analysis: visualization, classification and regression on subspaces and submanifolds. Pattern Recognit 44:1357–1371
Baudat G, Anouar F (2000) Generalized discriminant analysis using a kernel approach. Neural Comput 12(10):2385–2404
Belkin M, Niyogi P (2002) Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in neural information processing systems, Cambridge, pp 585–592
Fukumizu K, Bach FR, Jordan M (2009) Kernel dimension reduction in regression. Ann Stat 37:1871–1905
Globerson A, Roweis S (2005) Metric learning by collapsing classes. In. Advances in neural information processing systems, Vancouver, pp 451–458
Gretton A, Bousquet O, Smola A, Schölkopf B (2005) Measuring statistical dependence with Hilbert–Schmidt norms. In: Algorithmic learning theory, Singapore, pp 63–77
Groenen P, van de Velden M (2004) Multidimensional scaling. Technical Report EI 2004–15, Erasmus University, Rotterdam
Hardoon D, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664
Hofmann T, Schölkopf B, Smola A (2008) Kernel methods in machine learning. Ann Stat 36(3):1171–1220
Lee J, Verleysen M (2007) Nonlinear dimensionality reduction. Springer, New York
Li K (1991) Sliced inverse regression for dimension reduction. J Am Stat Assoc 86(414):316–327
Li L, Liu J (2009) Constrained clustering by spectral kernel learning. In: International conference on computer vision, Kyoto, pp 421–427
Mika S, Ratsch G, Weston J, Scholkopf B, Mullers K (1999) Fisher discriminant analysis with kernels. In: Neural networks for signal processing IX, 1999. Proceedings of the 1999 IEEE signal processing society workshop. IEEE, Piscataway, pp 41–48
Rosipal R, Krämer N (2006) Overview and recent advances in partial least squares. In: Subspace, latent structure and feature selection, Bohinj, pp 34–51
Rosipal R, Trejo L (2002) Kernel partial least squares regression in reproducing kernel Hilbert space. J Mach Learn Res 2:97–123
Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Schölkopf B, Smola A, Müller K (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural comput 10(5):1299–1319
Sha F, Saul L (2005) Analysis and extension of spectral methods for nonlinear dimensionality reduction. In: Proceedings of the twenty second international conference on, machine learning, Bonn, pp 785–792
Takane Y, Young F, De Leeuw J (1977) Nonmetric individual differences multidimensional scaling: an alternating least squares method with optimal scaling features. Psychometrika 42(1):7–67
Tenenbaum J, Silva V, Langford J (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
Tipping M, Lowe D (1998) Shadow targets: a novel algorithm for topographic projections by radial basis functions. Neurocomputing 19:211–222
Toh K, Todd M, Tutuncu R (1999) Sdpt3a matlab software package for semidefinite programming. Optim Methods Softw 11(12):545–581
Tütüncü R, Toh K, Todd M (2003) Solving semidefinite-quadratic-linear programs using SDPT3. Math Program 95(2):189–217
van der Maaten L, Postma E, van den Herik H (2009) Dimensionality reduction: a comparative review. Technical Report TiCC-TR 2009–005, Tilburg University Technical, Tilburg
Venna J, Kaski S (2007) Comparison of visualization methods for an atlas of gene expression data sets. Inf Vis 6:139–154
Webb A (1995) Multidimensional scaling by iterative majorization using radial basis functions. Pattern Recognit 28:753–759
Weinberger K, Sha F, Saul L (2004) Learning a kernel matrix for nonlinear dimensionality reduction. In: Proceedings of the 21st international conference on, machine learning, Banff
Weinberger K, Blitzer J, Saul L (2006) Distance metric learning for large margin nearest neighbor classification. In: Advances in neural information processing systems, Vancouver
Weinberger K, Sha F, Zhu Q, Saul L (2006) Graph laplacian regularization for large-scale semidefinite programming. In: Advances in neural information processing systems, Vancouver, pp 1489–1496
Wold H (1975) Soft modeling by latent variables: the nonlinear iterative partial least squares approach. Perspectives in probability and statistics, papers in honour of MS Bartlett, pp 520–540
Wu X, So A, Li Z, Li S (2009) Fast graph Laplacian regularized kernel learning via semidefinite-quadratic-linear programming. In: Advances in neural information processing systems, Vancouver, pp 1964–1972 (2009)
Xing E, Ng A, Jordan M, Russell S (2003) Distance metric learning with application to clustering with side-information. In: Advances in neural information processing systems, Vancouver, pp 521–528
Yeh Y, Huang S, Lee Y (2009) Nonlinear dimension reduction with kernel sliced inverse regression. IEEE Trans Knowl Data Eng 21:1590–1603
Author information
Authors and Affiliations
Corresponding author
Appendix A: Derivation of a Compact form for \({\nabla _\mathbf{W }J}\)
Appendix A: Derivation of a Compact form for \({\nabla _\mathbf{W }J}\)
We want to rewrite the gradient \({\nabla _\mathbf{W }J}= \frac{4}{n} \sum _{ij} \mathbf G _{ij}(\mathbf D _{ij}-{\varvec{\varDelta }}_{ij}) {\varvec{\tau }}_{ij} {\varvec{\tau }}_{ij}^T \mathbf W \) into a more compact form. Firstly, we denote \(\mathbf Q =\mathbf G \odot (\mathbf D -{\varvec{\varDelta }})\), where \(\odot \) represents the element-wise product of two matrices, symmetric matrix \(\mathbf R = \mathbf Q + \mathbf Q ^T\), and \(\mathbf S \) is a diagonal matrix with \(\mathbf S _{ii} = \sum _{j}\mathbf R _{ij}\). Then, we manipulate \({\nabla _\mathbf{W }J}\) as follows:
where each row of data matrix \(\mathbf X \) is a data point \(\mathbf{x }_i\) and \(\mathbf L = \mathbf S - \mathbf R \) is the Laplacian matrix.
Rights and permissions
About this article
Cite this article
Zhu, Z., Similä, T. & Corona, F. Supervised Distance Preserving Projections. Neural Process Lett 38, 445–463 (2013). https://doi.org/10.1007/s11063-013-9285-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-013-9285-x