On the Role and Impact of the Metaparameters in t-distributed Stochastic Neighbor Embedding

Lee, John A.; Verleysen, Michel

doi:10.1007/978-3-7908-2604-3_31

On the Role and Impact of the Metaparameters in t-distributed Stochastic Neighbor Embedding

John A. Lee³ &
Michel Verleysen⁴

Conference paper
First Online: 01 January 2010

5893 Accesses
4 Citations

Abstract

Similarity-based embedding is a paradigm that recently gained interest in the field of nonlinear dimensionality reduction. It provides an elegant framework that naturally emphasizes the preservation of the local structure of the data set. An emblematic method in this trend is t-distributed stochastic neighbor embedding (t-SNE), which is acknowledged to be an efficient method in the recent literature. This paper aims at analyzing the reasons of this success, together with the impact of the two metaparameters embedded in the method. Moreover, the paper shows that t-SNE can be interpreted as a distance-preserving method with a specific distance transformation, making the link with existing methods. Experiments on artificial data support the theoretical discussion.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

BELKIN, M. and NIYOGI, P. (2002): Laplacian eigenmaps and spectral techniques for embedding and clustering. In: T.G. Dietterich, S. Becker, Z. Ghahramani (Eds.): NIPS 2001 proc., 14. MIT Press, 585-591.
Google Scholar
DEMARTINES, P. and HERAULT, J. (1997): Curvilinear component analysis: A self-organizing neural network for nonlinear mapping of data sets. IEEE Transactions on Neural Networks, 8 (1), 148-154.
Article Google Scholar
ERHANY D., MANZAGOL P.-A., BENGIO Y., BENGIO S. and VINCENT P. (2009): The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training. Journal of Machine Learning Research Proc., 5, 153-160.
Google Scholar
HINTON, G. and ROWEIS, S.T. (2003): Stochastic Neighbor Embedding. In: S. Becker, S. Thrun and K. Obermayer (Eds.): Advances in NeuralInformation Processing Systems (NIPS 2002), 15. MIT Press, 833-840.
Google Scholar
KRUSKAL, J.B. (1964): Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29, 1-28.
Article MathSciNet MATH Google Scholar
LEE, J.A. and VERLEYSEN, M. (2004): Curvilinear Distance Analysis versus Isomap. Neurocomputing, 57, 49-76.
Article Google Scholar
LEE, J.A. and VERLEYSEN, M. (2007): Nonlinear dimensionality reduction. Springer, New York.
Book MATH Google Scholar
LEE, J.A. and VERLEYSEN, M. (2009): Quality assessment of dimensionality reduction: Rank-based criteria. Neurocomputing, 72 (7-9), 1431-1443.
Article Google Scholar
PARVIAINEN E. and VEHTARI A. (2009): Features and metric from a classifier improve visualizations with dimension reduction In: C. Alippi, M. Polycarpou, C. Panayiotou, G. Ellinas (Eds.): ICANN 2009 proc. Springer, LNCS 5769, 225-234.
Google Scholar
ROWEIS, S.T. and SAUL, L.K. (2000): Nonlinear dimensionality reduction by locally linear embedding. Science, 290 (5500), 2323-2326.
Article Google Scholar
SAERENS, M., FOUSS, F., YEN, L. and DUPONT, P. (2004): The principal components analysis of a graph, and its relationships to spectral clustering. In: J.-F. Boulicaut, F. Esposito, F. Giannotti, D. Pedreschi (Eds.): ECML 2004 proc.. Springer, LNCS 3201, 371-383.
Google Scholar
SAMMON, J.W. (1969) A nonlinear mapping algorithm for data structure analysis. IEEE Transactions on Computers, CC-18 (5), 401-409.
Article Google Scholar
SAUL, L.K., WEINBERGER, K.Q., HAM, J.H., SHA, F. and LEE, D.D. (2006): Spectral methods for dimensionality reduction. In: O. Chapelle, B. Schoelkopf, B. and A. Zien, A. (Eds.): Semisupervised Learning. MIT Press, 293-308.
Google Scholar
SCHOLKOPF, B., SMOLA, A. and MULLER, K.-R. (1998): Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10 ,1299–1319.
Article Google Scholar
SHEPARD, R.N. (1962): The analysis of proximities: Multidimensional scaling with an unknown distance function (1 - 2). Psychometrika, 27, 125-140 and 219-249.
Article MathSciNet Google Scholar
TENENBAUM, J.B., DE SILVA, V. and LANGFORD, J.C. (2000): A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, 290 (5500), 2319-2323.
Article Google Scholar
VAN DER MAATEN, L. and HINTON, G. (2008): Visualizing Data using t-SNE. Journal of Machine Learning Research, 9, 2579-2605.
Google Scholar
VAN DER MAATEN, L.J.P. (2009): Learning a Parametric Embedding by Preserving Local Structure. In: D. van Dyk and M. Welling (Eds.): Proc. 12th Int. Conf. on Artificial Intel. and Statistics, Clearwater Beach, FL. JMLR Proceedings 5, 384-391.
Google Scholar
XIAO, L. , SUN, J. and BOYD, S. (2006): A Duality View of Spectral Methods for Dimensionality Reduction. In: W. Cohen and A. Moore (Eds.): ICML proc., Pittsburg (PA). Omni Press, 1041-1048.
Google Scholar
WEINBERGER K.Q. and SAUL, L.K. (2006): Unsupervised Learning of Image Manifolds by Semidefinite Programming. International Journal of Computer Vision, 70 (1), 77-90.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Imagerie Moléculaire et Radiothérapie Expérimentale, Avenue Hippocrate 54, B-1200, Brussels, Belgium
John A. Lee
Machine Learning Group - DICE, Place du Levant 3, B-1348, Louvain-la-Neuve, Belgium
Michel Verleysen

Authors

John A. Lee
View author publications
You can also search for this author in PubMed Google Scholar
Michel Verleysen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to John A. Lee .

Editor information

Editors and Affiliations

Centre de Recherche INRIA Paris-Rocquenc, Domaine de Voluceau, Le Chesnay cedex, 78153, France
Yves Lechevallier
, chaire de statistique appliquée, CNAM, rue Saint Martin 292, Paris, 75003, France
Gilbert Saporta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, J.A., Verleysen, M. (2010). On the Role and Impact of the Metaparameters in t-distributed Stochastic Neighbor Embedding. In: Lechevallier, Y., Saporta, G. (eds) Proceedings of COMPSTAT'2010. Physica-Verlag HD. https://doi.org/10.1007/978-3-7908-2604-3_31

Download citation

DOI: https://doi.org/10.1007/978-3-7908-2604-3_31
Published: 30 September 2010
Publisher Name: Physica-Verlag HD
Print ISBN: 978-3-7908-2603-6
Online ISBN: 978-3-7908-2604-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics