Fast t-SNE algorithm with forest of balanced LSH trees and hybrid computation of repulsive forces

doi:10.1016/j.knosys.2020.106318

Knowledge-Based Systems

Volume 206, 28 October 2020, 106318

https://doi.org/10.1016/j.knosys.2020.106318 Get rights and content

Abstract

An acceleration of the well-known t-Stochastic Neighbor Embedding (t-SNE) (Hinton and Roweis, 2003; Maaten and Hinton, 2008) algorithm, probably the best (nonlinear) dimensionality reduction and visualization method, is proposed in this article.

By using a specially-tuned forest of balanced trees constructed via locality sensitive hashing is improved significantly upon the results presented in Maaten (2014), achieving a complexity significantly closer to true $O (n log n)$ , and vastly improving behavior for huge numbers of instances and attributes. Such acceleration removes the necessity to use PCA to reduce dimensionality before the start of t-SNE.

Additionally, a fast hybrid method for repulsive forces computation (a part of the t-SNE algorithm), which is currently the fastest method known, is proposed.

A parallelized version of our algorithm, characterized by a very good speedup factor, is proposed.

Introduction

Data visualization is an important task in machine learning, as people prefer visual representations of data over numerical ones.

We can enumerate simple (in terms of the underlying mathematics) methods like scatter plots, parallel coordinates, histograms, heat maps, survey plots, dimensional stacking, rad viz, etc. [1], [2], but such methods can be useful for detecting only some (simple) regularities in data.

One of the most universally known methods of dimensionality reduction is Principal Component Analysis (PCA) [3]. PCA can be used to select directions with the highest variances in the given data. PCA results in a linear and orthogonal transformation. However, such a transformation from multidimensional space to 2D or 3D will rarely produce readable shapes, because a strong reduction cannot preserve all the initial distances from a high-dimensional space. An advantage of PCA is its $O (m n^{2})$ complexity, where $m$ is the number of instances and the $n$ is the number of attributes. The well-known kernel trick [4] changes the linear PCA into kernel nonlinear PCA. Schölkopf [5] has proposed this idea.

Multidimensional scaling (MDS) was one of the first very practical nonlinear dimensionality reduction methods [6]. The goal of MDS is to preserve distances from a high-dimensional space in a low-dimensional space using the below definition of a cost function: $C_{M D S} = \frac{1}{\sum_{i < j} {‖ x_{i} - x_{j} ‖}^{2}} \sum_{i < j}^{m} {(‖ x_{i} - x_{j} ‖ - ‖ y_{i} - y_{j} ‖)}^{2} .$ Let us assume we have dataset $D$ which consists of vectors $x_{1}, \dots, x_{m}$ in $R^{n}$ . $y_{i}$ are the points corresponding, respectively, to $x_{i}$ in a low-dimensional space. Typically, PCA is used to initialize the positions of the low-dimensional points $y_{i}$ . The complexity of MDS is $O (m n^{2} + m^{2} n + m^{2} l)$ and is composed of PCA, a single computation of all distances in high dimensional space and the main loop of MDS.¹

The Sammon mapping [7] can be seen as a variant of MDS. The goal of Sammon mapping is similar to the MDS.

A neural approach called Self-Organized Maps (SOM) was proposed by Kohonen [8]. SOM creates a map/grid from the original data. Another nonlinear and neural approach was the Neuronal Gas [9], [10].

The IsoMap [11] algorithm is based on the idea of the geodesic distance. A closely related graph-based approach was proposed in [12]. Both of the above methods are locally oriented, as is another, the Local Linear Embedding [13].

Stochastic Neighbor Embedding (SNE) was proposed by Hinton and Roweis [14]. SNE and its variants are the main subjects of this article. In [15], the authors described two important improvements. Another type of acceleration in the domain of document analysis was proposed in [16].

One of the newer approaches in dimensionality reduction is LargeVis proposed by Tang et al. [17]. This algorithm is based on the construction of an approximated K-nearest neighbor graph. However, while this algorithm is faster than tree-based t-SNE (around two times), it is significantly slower than our algorithm because LargerVis is also slower than UMAP.

Another new approach in dimensionality reduction is UMAP [18], which also uses cross-entropy as t-SNE. The speed of this algorithm is compared with our results in Section 4.

For a review of dimensionality reduction methods please see [19], where several algorithms are presented, their short descriptions and additional remarks and comparison. In the review, you can find that most of algorithms are strongly time-consuming.

The next section describes the Stochastic Neighbor Embedding and its improvements in more detail. The following section describes several accelerations of t-SNE which exhibit better complexity. Section 4 presents several testing procedures and an analysis of the obtained results, which show the superiority of the proposed methods.

Section snippets

Previous work on t-Distributed stochastic neighbor embedding

First assume there is a dataset $D = [x_{1}, \dots, x_{m}]$ , $x_{i} \in R^{n}$ ( $m$ is the number of instances and $n$ is the number of attributes). t-Distributed Stochastic Neighbor Embedding (t-SNE) is a positive extension of work on Stochastic Neighbor Embedding [14]. The description of t-SNE starts by defining two probabilities. First for the base dissimilarity in the high-dimensional space: $p_{j | i} = \frac{e^{- {‖ x_{i} - x_{j} ‖}^{2} ∕ 2 σ_{i}^{2}}}{\sum_{k \neq i} e^{- {‖ x_{i} - x_{k} ‖}^{2} ∕ 2 σ_{i}^{2}}},$ $σ_{i}$ is selected by a binary search so that the probability $P_{i}$ reaches a fixed perplexity

Stochastic neighbor embedding with a forest of locality-sensitive hashing trees and hybrid computation of repulsive forces

This section presents three ways of t-SNE speedup. In consequence, all elements of t-SNE are significantly faster.

•
First, a dedicated type of locality-sensitive hashing trees that are used for attractive forces computation in t-SNE in place of the Vantage-point tree is introduced. This yields much better computational costs.
•
Second, a hybrid computation of repulsive forces that combines Barnes–Hut approximation with piecewise polynomial interpolation is proposed, also reducing computational costs.

Experiments and result analysis

The same benchmark datasets and a similar method of results comparison as presented by Maaten [20] and Linderman et al. [23] and additionally a few others are used to present a trustworthy comparison of different versions of t-SNE algorithms.

Conclusions

A few important acceleration approaches to the well-known t-SNE algorithm, the best known, in terms of complexity and the accuracy of the resulting dimensionality reduction, were presented. First, the LSHF-BH t-SNE algorithm was presented, which is significantly faster than the previous t-SNE variant proposed by Maaten [20]. Such forest of dedicated balanced LSH trees is much faster than a Vantage-point tree or an Annoy for nearest neighbors computation.

The LSHF-Hybrid t-SNE algorithm was

CRediT authorship contribution statement

Marek Orliński: Conceptualization, Methodology, Software, Data, Writing, Visualization. Norbert Jankowski: Conceptualization, Methodology, Data, Writing, Visualization.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (40)

IngramS. et al.
Dimensionality reduction for documents with nearest neighbor queries
Neurocomputing
(2015)
GrinsteinG. et al.
High-dimensional visualizations
BertiniE. et al.
Quality metrics in high-dimensional data visualization: An overview and systematization
IEEE Trans. Vis. Comput. Graphics
(2011)
HotellingH.
Analysis of a complex of statistical variables into principal components
J. Educ. Psychol.
(1933)
BoserB.E. et al.
A training algorithm for optimal margin classifiers
SchölkopfB.
Nonlinear component analysis as a kernel eigenvalue problem
Neural Comput.
(1998)
TorgersonW.
Multidimensional scaling i: Theory and method
Psychometrika
(1952)
SammonJ.
A nonlinear mapping for data structure analysis
IEEE Trans. Comput.
(1969)
KohonenT.
Self-Organizing Maps
(1995)
FritzkeB.
A growing neural gas network learns topologies

FritzkeB.

A self-organizing network that can follow non-stationary distributions

TenenbaumJ.

Mapping a manifold of perceptual observations

WeinbergerK. et al.

Learning a kernel matrix for nonlinear dimensionality reduction

RoweisS. et al.

Nonlinear dimensionality reduction by locally linear embedding

Science

(2000)

HintonG. et al.

Stochastic neighbor embedding

MaatenL. et al.

Visualizing data using t-SNE

J. Mach. Learn. Res.

(2008)

J. Tang, J. Liu, M. Zhang, Q. Mei, Visualizing large-scale and high-dimensional data, in: Proceedings of the 25th...

McInnesL. et al.

Umap: Uniform manifold approximation and projection for dimension reduction

(2018)

van der MaatenL. et al.

Dimensionality Reduction: A Comparative ReviewTech. Rep. TiCC–TR 2009–005

(2009)

MaatenL.

Accelerating t-SNE using tree-based algorithms

J. Mach. Learn. Res.

(2014)

Cited by (7)

CFs-focused intelligent diagnosis scheme via alternative kernels networks with soft squeeze-and-excitation attention for fast-precise fault detection under slow & sharp speed variations
2022, Knowledge-Based Systems
The evolution of deep learning-based intelligent fault diagnosis is mainly confronted with challenges on the analysis of complex non-stationary signals and the design of strong robust models. The causal relationship between the two promotes the development of better models, among them, convolution frameworks (CFs) are regarded as one of the most versatile structures. In standard CFs, the receptive fields of the kernels in each layer are designed to share the same scale, which easily leads to model performance degradation in non-stationary data analysis. Consequently, we propose a dynamically selective mechanism in CFs that allows every kernel to adaptively adjust its receptive field by multi-scale information, which is named as alternative kernels networks (AkNets). Combining with specially designed squeeze-and-excitation (SE) attention, the AkNets utilize information-guided soft attention to fuse multiple branches with different kernel scales, which generates different effective receptive fields of kernels in fusion layer. Five bearing vibration data collected under slow & sharp speed variations verify the effectiveness of proposed approach. The results indicate that the AkNets greatly improve the efficiency on the premise of high recognition accuracy. Moreover, the extended application of the AkNets’ unit can assist various state-of-art CFs-based models improve the recognition accuracy by 3%–12%.
A novel constrained dense convolutional autoencoder and DNN-based semi-supervised method for shield machine tunnel geological formation recognition
2022, Mechanical Systems and Signal Processing
Accurately acquiring the geological information of the tunnel face will help to set the optimal operational parameters, so that the shield machine can achieve better tunneling performance. The design of the shield machine prevents the operator from observing the surrounding environment directly, and the soft methods which can utilize machine parameters to recognize geological conditions indirectly are becoming a research hotspot. However, current soft methods are all supervised methods which can only use the few machine data with geological type labels, and the unlabeled machine data that is much more than the labeled machine data is wasted. To make the most of all the collected in-situ data to boost the performance of the geological formation recognition model, a novel constrained dense convolutional autoencoder and DNN-based semi-supervised method is proposed. To begin with, 177 machine parameters related to geological conditions are selected and preprocessed. Then, a novel geological feature extractor is obtained via the proposed constrained dense convolutional autoencoder and the unlabeled data. Eventually, a DNN-based geological feature classifier is trained on the basis of the established feature extractor and the labeled machine data, which is capable of recognizing geological formation of the tunnel face. In-situ data collected from a Singapore project (stacked twin bored tunnels) was used to prove the superiority of the proposed method. The results show the constrained dense convolutional autoencoder can extract geological-related features accurately, and the proposed method outperforms other supervised soft methods. Its classification performance in one tunnel is 23.98%, 17.47%, 1.93%, and 18.52% higher than the random forest-based, decision tree-based, KNN-based, and SVM-based methods, respectively. Its classification performance in another tunnel is 33.54%, 33.75%, 42.87%, 43.58%, 33.75%, 49.91%, 37.77%, and 27.04% higher than the random forest-based, SVM-based, decision tree-based, DNN-based, KNN-based, CNN-based, ResNet-based, and DenseNet-based methods, respectively. Thus, the novel semi-supervised method has significantly better generalizability than the currently adopted supervised soft methods.
A Fast and Efficient Algorithm for Filtering the Training Dataset
2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
A hybrid tool wear prediction model based on JDA
2022, Research Square
Applying t-sne to estimate image sharpness of low-cost nailfold capillaroscopy
2022, Intelligent Automation and Soft Computing
Revdbscan and Flexscan— O(nlog n) Clustering Algorithms
2021, Communications in Computer and Information Science

View all citing articles on Scopus

View full text

Fast t-SNE algorithm with forest of balanced LSH trees and hybrid computation of repulsive forces

Abstract

Introduction

Section snippets

Previous work on t-Distributed stochastic neighbor embedding

Stochastic neighbor embedding with a forest of locality-sensitive hashing trees and hybrid computation of repulsive forces

Experiments and result analysis

Conclusions

CRediT authorship contribution statement

Declaration of Competing Interest

Neurocomputing

High-dimensional visualizations

Quality metrics in high-dimensional data visualization: An overview and systematization

IEEE Trans. Vis. Comput. Graphics

Analysis of a complex of statistical variables into principal components

J. Educ. Psychol.

A training algorithm for optimal margin classifiers

Nonlinear component analysis as a kernel eigenvalue problem

Neural Comput.

Multidimensional scaling i: Theory and method

Psychometrika

A nonlinear mapping for data structure analysis

IEEE Trans. Comput.

Self-Organizing Maps

A growing neural gas network learns topologies

A self-organizing network that can follow non-stationary distributions

Mapping a manifold of perceptual observations

Learning a kernel matrix for nonlinear dimensionality reduction

Nonlinear dimensionality reduction by locally linear embedding

Science

Stochastic neighbor embedding

Visualizing data using t-SNE

J. Mach. Learn. Res.

Umap: Uniform manifold approximation and projection for dimension reduction

Dimensionality Reduction: A Comparative ReviewTech. Rep. TiCC–TR 2009–005

Accelerating t-SNE using tree-based algorithms

J. Mach. Learn. Res.