skip to main content
10.1145/3292500.3330834acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

AtSNE: Efficient and Robust Visualization on GPU through Hierarchical Optimization

Published: 25 July 2019 Publication History

Abstract

Visualization of high-dimensional data is a fundamental yet challenging problem in data mining. These visualization techniques are commonly used to reveal the patterns in the high-dimensional data, such as clusters and the similarity among clusters. Recently, some successful visualization tools (e.g., BH-t-SNE and LargeVis) have been developed. However, there are two limitations with them : (1) they cannot capture the global data structure well. Thus, their visualization results are sensitive to initialization, which may cause confusions to the data analysis. (2) They cannot scale to large-scale datasets. They are not suitable to be implemented on the GPU platform because their complex algorithm logic, high memory cost, and random memory access mode will lead to low hardware utilization. To address the aforementioned problems, we propose a novel visualization approach named as Anchor-t-SNE (AtSNE), which provides efficient GPU-based visualization solution for large-scale and high-dimensional data. Specifically, we generate a number of anchor points from the original data and regard them as the skeleton of the layout, which holds the global structure information. We propose a hierarchical optimization approach to optimize the positions of the anchor points and ordinary data points in the layout simultaneously. Our approach presents much better and robust visual effects on 11 public datasets, and achieve 5 to 28 times speed-up on different datasets, compared with the current state-of-the-art methods. In particular, we deliver a high-quality 2-D layout for a 20 million and 96-dimension dataset within 5 hours, while the current methods fail to give results due to running out of the memory.

References

[1]
Mikhail Belkin and Partha Niyogi. 2002. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in neural information processing systems. 585--591.
[2]
David M Chan, Roshan Rao, Forrest Huang, and John F Canny. 2018. t-SNE-CUDA: GPU-Accelerated t-SNE and its Applications to Modern Data. In 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). IEEE, 330--338.
[3]
Edouard Grave, Tomas Mikolov, Armand Joulin, and Piotr Bojanowski. 2017. Bag of tricks for efficient text classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL. 3--7.
[4]
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 855--864.
[5]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[6]
Geoffrey E Hinton and Sam T Roweis. 2003. Stochastic neighbor embedding. In Advances in neural information processing systems. 857--864.
[7]
Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2011. Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence, Vol. 33, 1 (2011), 117--128.
[8]
Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2017. Billion-scale similarity search with GPUs. arXiv:1702.08734 (2017).
[9]
Ian Jolliffe. 2011. Principal component analysis. In International encyclopedia of statistical science. Springer, 1094--1096.
[10]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature, Vol. 521, 7553 (2015), 436.
[11]
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research, Vol. 9, Nov (2008), 2579--2605.
[12]
Leland McInnes, John Healy, and James Melville. 2018. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018).
[13]
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining ACM, 701--710.
[14]
Sam T Roweis and Lawrence K Saul. 2000. Nonlinear dimensionality reduction by locally linear embedding. science, Vol. 290, 5500 (2000), 2323--2326.
[15]
Jian Tang, Jingzhou Liu, Ming Zhang, and Qiaozhu Mei. 2016. Visualizing large-scale and high-dimensional data. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 287--297.
[16]
Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1067--1077.
[17]
Joshua B Tenenbaum, Vin De Silva, and John C Langford. 2000. A global geometric framework for nonlinear dimensionality reduction. science, Vol. 290, 5500 (2000), 2319--2323.
[18]
Warren S Torgerson. 1952. Multidimensional scaling: I. Theory and method. Psychometrika, Vol. 17, 4 (1952), 401--419.
[19]
Laurens Van Der Maaten. 2014. Accelerating t-SNE using tree-based algorithms. The Journal of Machine Learning Research, Vol. 15, 1 (2014), 3221--3245.
[20]
Roger Weber, Hans-Jörg Schek, and Stephen Blott. 1998. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. Proceedings of the International Conference on Very Large Data Bases, Vol. 98 (1998), 194--205.
[21]
Justin Zobel and Alistair Moffat. 2006. Inverted files for text search engines. ACM computing surveys (CSUR), Vol. 38, 2 (2006), 6.

Cited By

View all
  • (2024): A Cluster Ambiguity Measure for Estimating Perceptual Variability in Visual ClusteringIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332720130:1(770-780)Online publication date: 1-Jan-2024
  • (2024)Balancing Between the Local and Global Structures (LGS) in Graph EmbeddingGraph Drawing and Network Visualization10.1007/978-3-031-49272-3_18(263-279)Online publication date: 1-Jan-2024
  • (2023)A Gaussian Process Decoder with Spectral Mixtures and a Locally Estimated Manifold for Data VisualizationApplied Sciences10.3390/app1314801813:14(8018)Online publication date: 9-Jul-2023
  • Show More Cited By

Index Terms

  1. AtSNE: Efficient and Robust Visualization on GPU through Hierarchical Optimization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
    July 2019
    3305 pages
    ISBN:9781450362016
    DOI:10.1145/3292500
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 July 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. GPU
    2. high-dimensional visualization
    3. large-scale data

    Qualifiers

    • Research-article

    Conference

    KDD '19
    Sponsor:

    Acceptance Rates

    KDD '19 Paper Acceptance Rate 110 of 1,200 submissions, 9%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)31
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024): A Cluster Ambiguity Measure for Estimating Perceptual Variability in Visual ClusteringIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332720130:1(770-780)Online publication date: 1-Jan-2024
    • (2024)Balancing Between the Local and Global Structures (LGS) in Graph EmbeddingGraph Drawing and Network Visualization10.1007/978-3-031-49272-3_18(263-279)Online publication date: 1-Jan-2024
    • (2023)A Gaussian Process Decoder with Spectral Mixtures and a Locally Estimated Manifold for Data VisualizationApplied Sciences10.3390/app1314801813:14(8018)Online publication date: 9-Jul-2023
    • (2023)Natural Language Generation Meets Data Visualization: Vis-to-Text and its Duality with Text-to-Vis2023 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM58522.2023.00171(1337-1342)Online publication date: 1-Dec-2023
    • (2023)SpectralMAP: Approximating Data Manifold With Spectral DecompositionIEEE Access10.1109/ACCESS.2023.325742711(31530-31540)Online publication date: 2023
    • (2023)Optimization of t-SNE by Tuning Perplexity for Dimensionality Reduction in NLPProceedings of International Conference on Communication and Computational Technologies10.1007/978-981-99-3485-0_41(519-528)Online publication date: 1-Sep-2023
    • (2022)Interpretable machine learning: Fundamental principles and 10 grand challengesStatistics Surveys10.1214/21-SS13316:noneOnline publication date: 1-Jan-2022
    • (2022)Uniform Manifold Approximation with Two-phase Optimization2022 IEEE Visualization and Visual Analytics (VIS)10.1109/VIS54862.2022.00025(80-84)Online publication date: Oct-2022
    • (2022)Interactive Visual Cluster Analysis by Contrastive Dimensionality ReductionIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.3209423(1-11)Online publication date: 2022
    • (2022)Measuring and Explaining the Inter-Cluster Reliability of Multidimensional ProjectionsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.311483328:1(551-561)Online publication date: Jan-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media