Skip to main content

On the Role and Impact of the Metaparameters in t-distributed Stochastic Neighbor Embedding

  • Conference paper
  • First Online:

Abstract

Similarity-based embedding is a paradigm that recently gained interest in the field of nonlinear dimensionality reduction. It provides an elegant framework that naturally emphasizes the preservation of the local structure of the data set. An emblematic method in this trend is t-distributed stochastic neighbor embedding (t-SNE), which is acknowledged to be an efficient method in the recent literature. This paper aims at analyzing the reasons of this success, together with the impact of the two metaparameters embedded in the method. Moreover, the paper shows that t-SNE can be interpreted as a distance-preserving method with a specific distance transformation, making the link with existing methods. Experiments on artificial data support the theoretical discussion.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • BELKIN, M. and NIYOGI, P. (2002): Laplacian eigenmaps and spectral techniques for embedding and clustering. In: T.G. Dietterich, S. Becker, Z. Ghahramani (Eds.): NIPS 2001 proc., 14. MIT Press, 585-591.

    Google Scholar 

  • DEMARTINES, P. and HERAULT, J. (1997): Curvilinear component analysis: A self-organizing neural network for nonlinear mapping of data sets. IEEE Transactions on Neural Networks, 8 (1), 148-154.

    Article  Google Scholar 

  • ERHANY D., MANZAGOL P.-A., BENGIO Y., BENGIO S. and VINCENT P. (2009): The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training. Journal of Machine Learning Research Proc., 5, 153-160.

    Google Scholar 

  • HINTON, G. and ROWEIS, S.T. (2003): Stochastic Neighbor Embedding. In: S. Becker, S. Thrun and K. Obermayer (Eds.): Advances in NeuralInformation Processing Systems (NIPS 2002), 15. MIT Press, 833-840.

    Google Scholar 

  • KRUSKAL, J.B. (1964): Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29, 1-28.

    Article  MathSciNet  MATH  Google Scholar 

  • LEE, J.A. and VERLEYSEN, M. (2004): Curvilinear Distance Analysis versus Isomap. Neurocomputing, 57, 49-76.

    Article  Google Scholar 

  • LEE, J.A. and VERLEYSEN, M. (2007): Nonlinear dimensionality reduction. Springer, New York.

    Book  MATH  Google Scholar 

  • LEE, J.A. and VERLEYSEN, M. (2009): Quality assessment of dimensionality reduction: Rank-based criteria. Neurocomputing, 72 (7-9), 1431-1443.

    Article  Google Scholar 

  • PARVIAINEN E. and VEHTARI A. (2009): Features and metric from a classifier improve visualizations with dimension reduction In: C. Alippi, M. Polycarpou, C. Panayiotou, G. Ellinas (Eds.): ICANN 2009 proc. Springer, LNCS 5769, 225-234.

    Google Scholar 

  • ROWEIS, S.T. and SAUL, L.K. (2000): Nonlinear dimensionality reduction by locally linear embedding. Science, 290 (5500), 2323-2326.

    Article  Google Scholar 

  • SAERENS, M., FOUSS, F., YEN, L. and DUPONT, P. (2004): The principal components analysis of a graph, and its relationships to spectral clustering. In: J.-F. Boulicaut, F. Esposito, F. Giannotti, D. Pedreschi (Eds.): ECML 2004 proc.. Springer, LNCS 3201, 371-383.

    Google Scholar 

  • SAMMON, J.W. (1969) A nonlinear mapping algorithm for data structure analysis. IEEE Transactions on Computers, CC-18 (5), 401-409.

    Article  Google Scholar 

  • SAUL, L.K., WEINBERGER, K.Q., HAM, J.H., SHA, F. and LEE, D.D. (2006): Spectral methods for dimensionality reduction. In: O. Chapelle, B. Schoelkopf, B. and A. Zien, A. (Eds.): Semisupervised Learning. MIT Press, 293-308.

    Google Scholar 

  • SCHOLKOPF, B., SMOLA, A. and MULLER, K.-R. (1998): Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10 ,1299–1319.

    Article  Google Scholar 

  • SHEPARD, R.N. (1962): The analysis of proximities: Multidimensional scaling with an unknown distance function (1 - 2). Psychometrika, 27, 125-140 and 219-249.

    Article  MathSciNet  Google Scholar 

  • TENENBAUM, J.B., DE SILVA, V. and LANGFORD, J.C. (2000): A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, 290 (5500), 2319-2323.

    Article  Google Scholar 

  • VAN DER MAATEN, L. and HINTON, G. (2008): Visualizing Data using t-SNE. Journal of Machine Learning Research, 9, 2579-2605.

    Google Scholar 

  • VAN DER MAATEN, L.J.P. (2009): Learning a Parametric Embedding by Preserving Local Structure. In: D. van Dyk and M. Welling (Eds.): Proc. 12th Int. Conf. on Artificial Intel. and Statistics, Clearwater Beach, FL. JMLR Proceedings 5, 384-391.

    Google Scholar 

  • XIAO, L. , SUN, J. and BOYD, S. (2006): A Duality View of Spectral Methods for Dimensionality Reduction. In: W. Cohen and A. Moore (Eds.): ICML proc., Pittsburg (PA). Omni Press, 1041-1048.

    Google Scholar 

  • WEINBERGER K.Q. and SAUL, L.K. (2006): Unsupervised Learning of Image Manifolds by Semidefinite Programming. International Journal of Computer Vision, 70 (1), 77-90.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John A. Lee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lee, J.A., Verleysen, M. (2010). On the Role and Impact of the Metaparameters in t-distributed Stochastic Neighbor Embedding. In: Lechevallier, Y., Saporta, G. (eds) Proceedings of COMPSTAT'2010. Physica-Verlag HD. https://doi.org/10.1007/978-3-7908-2604-3_31

Download citation

Publish with us

Policies and ethics