Elsevier

Knowledge-Based Systems

Volume 206, 28 October 2020, 106318
Knowledge-Based Systems

Fast t-SNE algorithm with forest of balanced LSH trees and hybrid computation of repulsive forces

https://doi.org/10.1016/j.knosys.2020.106318Get rights and content

Abstract

An acceleration of the well-known t-Stochastic Neighbor Embedding (t-SNE) (Hinton and Roweis, 2003; Maaten and Hinton, 2008) algorithm, probably the best (nonlinear) dimensionality reduction and visualization method, is proposed in this article.

By using a specially-tuned forest of balanced trees constructed via locality sensitive hashing is improved significantly upon the results presented in Maaten (2014), achieving a complexity significantly closer to true O(nlogn), and vastly improving behavior for huge numbers of instances and attributes. Such acceleration removes the necessity to use PCA to reduce dimensionality before the start of t-SNE.

Additionally, a fast hybrid method for repulsive forces computation (a part of the t-SNE algorithm), which is currently the fastest method known, is proposed.

A parallelized version of our algorithm, characterized by a very good speedup factor, is proposed.

Introduction

Data visualization is an important task in machine learning, as people prefer visual representations of data over numerical ones.

We can enumerate simple (in terms of the underlying mathematics) methods like scatter plots, parallel coordinates, histograms, heat maps, survey plots, dimensional stacking, rad viz, etc. [1], [2], but such methods can be useful for detecting only some (simple) regularities in data.

One of the most universally known methods of dimensionality reduction is Principal Component Analysis (PCA) [3]. PCA can be used to select directions with the highest variances in the given data. PCA results in a linear and orthogonal transformation. However, such a transformation from multidimensional space to 2D or 3D will rarely produce readable shapes, because a strong reduction cannot preserve all the initial distances from a high-dimensional space. An advantage of PCA is its O(mn2) complexity, where m is the number of instances and the n is the number of attributes. The well-known kernel trick [4] changes the linear PCA into kernel nonlinear PCA. Schölkopf [5] has proposed this idea.

Multidimensional scaling (MDS) was one of the first very practical nonlinear dimensionality reduction methods [6]. The goal of MDS is to preserve distances from a high-dimensional space in a low-dimensional space using the below definition of a cost function: CMDS=1i<jxixj2i<jm(xixjyiyj)2.Let us assume we have dataset D which consists of vectors x1,,xm in Rn. yi are the points corresponding, respectively, to xi in a low-dimensional space. Typically, PCA is used to initialize the positions of the low-dimensional points yi. The complexity of MDS is O(mn2+m2n+m2l) and is composed of PCA, a single computation of all distances in high dimensional space and the main loop of MDS.1

The Sammon mapping [7] can be seen as a variant of MDS. The goal of Sammon mapping is similar to the MDS.

A neural approach called Self-Organized Maps (SOM) was proposed by Kohonen [8]. SOM creates a map/grid from the original data. Another nonlinear and neural approach was the Neuronal Gas [9], [10].

The IsoMap [11] algorithm is based on the idea of the geodesic distance. A closely related graph-based approach was proposed in [12]. Both of the above methods are locally oriented, as is another, the Local Linear Embedding [13].

Stochastic Neighbor Embedding (SNE) was proposed by Hinton and Roweis [14]. SNE and its variants are the main subjects of this article. In [15], the authors described two important improvements. Another type of acceleration in the domain of document analysis was proposed in [16].

One of the newer approaches in dimensionality reduction is LargeVis proposed by Tang et al. [17]. This algorithm is based on the construction of an approximated K-nearest neighbor graph. However, while this algorithm is faster than tree-based t-SNE (around two times), it is significantly slower than our algorithm because LargerVis is also slower than UMAP.

Another new approach in dimensionality reduction is UMAP [18], which also uses cross-entropy as t-SNE. The speed of this algorithm is compared with our results in Section 4.

For a review of dimensionality reduction methods please see [19], where several algorithms are presented, their short descriptions and additional remarks and comparison. In the review, you can find that most of algorithms are strongly time-consuming.

The next section describes the Stochastic Neighbor Embedding and its improvements in more detail. The following section describes several accelerations of t-SNE which exhibit better complexity. Section 4 presents several testing procedures and an analysis of the obtained results, which show the superiority of the proposed methods.

Section snippets

Previous work on t-Distributed stochastic neighbor embedding

First assume there is a dataset D=[x1,,xm], xiRn (m is the number of instances and n is the number of attributes). t-Distributed Stochastic Neighbor Embedding (t-SNE) is a positive extension of work on Stochastic Neighbor Embedding [14]. The description of t-SNE starts by defining two probabilities. First for the base dissimilarity in the high-dimensional space: pj|i=exixj22σi2kiexixk22σi2, σi is selected by a binary search so that the probability Pi reaches a fixed perplexity

Stochastic neighbor embedding with a forest of locality-sensitive hashing trees and hybrid computation of repulsive forces

This section presents three ways of t-SNE speedup. In consequence, all elements of t-SNE are significantly faster.

  • First, a dedicated type of locality-sensitive hashing trees that are used for attractive forces computation in t-SNE in place of the Vantage-point tree is introduced. This yields much better computational costs.

  • Second, a hybrid computation of repulsive forces that combines Barnes–Hut approximation with piecewise polynomial interpolation is proposed, also reducing computational costs.

Experiments and result analysis

The same benchmark datasets and a similar method of results comparison as presented by Maaten [20] and Linderman et al. [23] and additionally a few others are used to present a trustworthy comparison of different versions of t-SNE algorithms.

Conclusions

A few important acceleration approaches to the well-known t-SNE algorithm, the best known, in terms of complexity and the accuracy of the resulting dimensionality reduction, were presented. First, the LSHF-BH t-SNE algorithm was presented, which is significantly faster than the previous t-SNE variant proposed by Maaten [20]. Such forest of dedicated balanced LSH trees is much faster than a Vantage-point tree or an Annoy for nearest neighbors computation.

The LSHF-Hybrid t-SNE algorithm was

CRediT authorship contribution statement

Marek Orliński: Conceptualization, Methodology, Software, Data, Writing, Visualization. Norbert Jankowski: Conceptualization, Methodology, Data, Writing, Visualization.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (40)

  • IngramS. et al.

    Dimensionality reduction for documents with nearest neighbor queries

    Neurocomputing

    (2015)
  • GrinsteinG. et al.

    High-dimensional visualizations

  • BertiniE. et al.

    Quality metrics in high-dimensional data visualization: An overview and systematization

    IEEE Trans. Vis. Comput. Graphics

    (2011)
  • HotellingH.

    Analysis of a complex of statistical variables into principal components

    J. Educ. Psychol.

    (1933)
  • BoserB.E. et al.

    A training algorithm for optimal margin classifiers

  • SchölkopfB.

    Nonlinear component analysis as a kernel eigenvalue problem

    Neural Comput.

    (1998)
  • TorgersonW.

    Multidimensional scaling i: Theory and method

    Psychometrika

    (1952)
  • SammonJ.

    A nonlinear mapping for data structure analysis

    IEEE Trans. Comput.

    (1969)
  • KohonenT.

    Self-Organizing Maps

    (1995)
  • FritzkeB.

    A growing neural gas network learns topologies

  • FritzkeB.

    A self-organizing network that can follow non-stationary distributions

  • TenenbaumJ.

    Mapping a manifold of perceptual observations

  • WeinbergerK. et al.

    Learning a kernel matrix for nonlinear dimensionality reduction

  • RoweisS. et al.

    Nonlinear dimensionality reduction by locally linear embedding

    Science

    (2000)
  • HintonG. et al.

    Stochastic neighbor embedding

  • MaatenL. et al.

    Visualizing data using t-SNE

    J. Mach. Learn. Res.

    (2008)
  • J. Tang, J. Liu, M. Zhang, Q. Mei, Visualizing large-scale and high-dimensional data, in: Proceedings of the 25th...
  • McInnesL. et al.

    Umap: Uniform manifold approximation and projection for dimension reduction

    (2018)
  • van der MaatenL. et al.

    Dimensionality Reduction: A Comparative ReviewTech. Rep. TiCC–TR 2009–005

    (2009)
  • MaatenL.

    Accelerating t-SNE using tree-based algorithms

    J. Mach. Learn. Res.

    (2014)
  • Cited by (7)

    • A Fast and Efficient Algorithm for Filtering the Training Dataset

      2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    • Revdbscan and Flexscan— O(nlog n) Clustering Algorithms

      2021, Communications in Computer and Information Science
    View all citing articles on Scopus
    View full text