Abstract:
Dimensionality reduction is a fundamental technique to address the curse of dimensionality problem in real-world big datasets. However, most existing methods either only ...Show MoreMetadata
Abstract:
Dimensionality reduction is a fundamental technique to address the curse of dimensionality problem in real-world big datasets. However, most existing methods either only target raw datasets that contain explicit relationships between data points, or construct the complete neighborhood graph of the dataset by calculating pairwise similarities, and then generate contexts of data points by random walking to measure the structure of the dataset, which are computationally expensive. In this paper, we propose a fast nonlinear locality-preserving dimensionality reduction approach called FVec2vec, which extends the Skip-gram model to embedding representation of general numerical matrices. Specifically, instead of constructing neighborhood graph by calculating pairwise similarities between data points, we approximate the k-nearest neighbors (kNN) of each data point in matrices by exploring its neighbors’ neighbors first. Then, we design a novel sampling algorithm to randomly sample on the kNN to depict the structure of the dataset. Experimental results show that FVec2vec is faster than most existing methods while achieving acceptable accuracy, and the accuracy is even higher than the state-of-the-art method under certain similarity metrics.
Date of Conference: 17-20 December 2022
Date Added to IEEE Xplore: 26 January 2023
ISBN Information: