Skip to main content
Log in

Dimensionality reduction by t-Distribution adaptive manifold embedding

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

High-dimensional data are difficult to explore and analyze due to they are highly correlative and redundant. Although previous dimensionality reduction methods have achieved promising performance, there are still some limitations. For example, the constructed distribution of data in the embedding space could not be approximated adaptively, and the parameters in these model lack of interpretation. To handle these problems, in this paper, a novel dimensionality reduction method named t-Distribution Adaptive Manifold Embedding (t-AME) is proposed. Firstly, t-AME constructs the pairwise distance similarity probability in the embedding space by Student-t distribution, and distributions generated by different degrees of freedom are learned according to the data itself to better match high-dimensional data distributions. Afterwards, to pull similar points together and push apart dissimilar points, an objective function with the corresponding optimization strategy is designed. Therefore, both the local and global structure of the original data could be well preserved in the embedding space. Finally, numerical experiments on synthetic and real datasets illustrate that the proposed method achieves a significant improvement over some representative and state-of-the-art dimensionality reduction methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_s_curve.html.

  2. https://github.com/KrishnaswamyLab/PHATE/blob/master/data/TreeData.mat.

  3. https://github.com/YingfanWang/PaCMAP/blob/master/data/mammoth_3d.json.

  4. https://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php.

  5. https://www.cs.columbia.edu/CAVE/software/softlib/coil-100.php.

  6. https://www.kaggle.com/bistaumanga/usps-dataset.

  7. https://github.com/zalandoresearch/fashion-mnist

References

  1. Han J, Cheng G, Li Z, Zhang D (2017) A unified metric learning-based framework for co-saliency detection. IEEE Trans Circuit Syst Video Technol 28(10):2473–2483

    Article  Google Scholar 

  2. Zhang D, Han J, Jiang L, Ye S, Chang X (2017) Revealing event saliency in unconstrained video collection. IEEE Trans Image Process 26(4):1746–1758

    Article  MathSciNet  MATH  Google Scholar 

  3. Raunak V, Gupta V, Metze F (2019) Effective dimensionality reduction for word embeddings. Proceedings of the 4th Workshop on Representation Learning for NLP 235–243

  4. Mironczuk MM, Protasiewicz J (2018) A recent overview of the state-of-the-art elements of text classification. Expert Syst Appl 106:36–54

    Article  Google Scholar 

  5. Dorrity MW, Saunders LM, Queitsch C, Fields S, Trapnell C (2020) Dimensionality reduction by UMAP to visualize physical and genetic interactions. Nature Commun 11(1):1537

    Article  Google Scholar 

  6. Kobak D, Berens P (2019) The art of using t-SNE for single-cell transcriptomics. Nature Commun 10(1):5416

    Article  Google Scholar 

  7. Linderman GC, Rachh M, Hoskins JG, Steinerberger S, Kluger Y (2019) Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data. Nature Method 16(3):243–245

    Article  Google Scholar 

  8. Turk M (1991) Pentland. Eigenfaces for recognition. J Cogn Neurosci 3(1):71–86

    Article  Google Scholar 

  9. Belhumeur PN, Hepanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720

  10. Luo QL (1981) Introduction to Multidimensional Scaling. Math Practice Theory 3:54–62

    MathSciNet  Google Scholar 

  11. Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323

    Article  Google Scholar 

  12. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326

    Article  Google Scholar 

  13. Belkin M, Niyogi P (2002) Laplacian eigenmaps and spectral techniques for embedding and clustering. Neural Inf Process Syst 14:585–591

    Google Scholar 

  14. van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11):2579–2605

    MATH  Google Scholar 

  15. Mcinnes L, Healy J, Melville J (2018) UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv:1802.03426

  16. Senanayake DA, Wang W, Naik SH, Halgamuge S (2021) Self-organizing nebulous growths for robust and incremental data visualization. IEEE Trans Neural Netw Learn Sys 32(10):4588–4602

    Article  Google Scholar 

  17. Moon KR, Dijk DV, Wang Z, Gigante S, Burkhardt DB, Chen WS, Yim K, Antonia VDE, Hirn MJ, Coifman RR (2019) Visualizing structure and transitions in high-dimensional biological data. Nature Biotechnol 37(10):1482–1492

    Article  Google Scholar 

  18. Amid E, Warmuth MK (2019) TriMap: large-scale dimensionality reduction using triplets. arXiv:1910.00204

  19. Narayan A, Berger B, Cho H (2020) Density-preserving data visualization unveils dynamic patterns of single-cell transcriptomic variability. Cold Spring Harbor Laboratory

  20. Ding J, Condon A, Shah SP (2018) Interpretable dimensionality reduction of single cell transcriptome data with deep generative models. Nature Commun 9(1):2002

  21. Becht E (2019) Dimensionality reduction for visualizing single-cell data using UMAP. Nature Biotechnol 37(1):38

    Article  Google Scholar 

  22. Szubert B, Cole JE, Monaco C, Drozdov I (2019) Structure-preserving visualisation of high dimensional single-cell datasets. Sci Rep 9(1):1–10

    Article  Google Scholar 

  23. Sainburg T, Mcinnes L, Gentner TQ (2020) Parametric UMAP: learning embeddings with deep neural networks for representation and semi-supervised learning. arXiv:2009.12981

  24. Wang Y, Huang H, Rudin C, Shaposhnik Y (2021) Understanding how dimension reduction tools work: an empirical approach to deciphering t-SNE, UMAP, TriMap, and PaCMAP for data visualization. J Mach Learn Res 22(201):1–73

  25. Damrich S, Hamprecht F (2021) On UMAP’s true loss function. arXiv:2103.14608

  26. Tang J, Liu J, Zhang M, Mei Q (2016) Visualizing large-scale and high-dimensional data. Proceedings of the 25th international conference on world wide web 287–297

  27. Zhang S, Ma Z, Gan W (2021) Dimensionality reduction for tensor data based on local decision margin maximization. IEEE Trans Image Process 30:234–248

    Article  MathSciNet  Google Scholar 

  28. Gultepe E, Makrehchi M (2018) Improving clustering performance using independent component analysis and unsupervised feature learning. Hum-centric Comput Inf Sci 225(8)

  29. Cheng D, Zhu Q, Huang J, Yang QWL (2017) Natural neighbor-based clustering algorithm with local representatives. Knowl-Based Syst 123:238–253

    Article  Google Scholar 

  30. Wu J, Jian C, Hui X, Ming X (2009) External validation measures for K-means clustering: A data distribution perspective. Expert Syst Appl 36(3):6050–6061

    Article  Google Scholar 

  31. Santos JM, Embrechts M (2009) On the use of the adjusted rand index as a metric for evaluating supervised classification. Artif Neural Netw 5769:175-184

    Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant no. 12001057 and 61976174, the Fundamental Research Funds for the Central Universities in Chang’an University under Grant no. 300102122101 and 300102120201, the Key Research and Development of Shaanxi Province of China under Grant no. 2021NY-170

Author information

Authors and Affiliations

Authors

Contributions

Changpeng Wang: Conceptualization, Writing - review & editing. Linlin Feng: Software, Investigation. Lijuan Yang: Formal analysis, Validation. Tianjun Wu: Reviewing, Editing,. Jiangshe Zhang: Methodology, Supervision

Corresponding author

Correspondence to Changpeng Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, C., Feng, L., Yang, L. et al. Dimensionality reduction by t-Distribution adaptive manifold embedding. Appl Intell 53, 23853–23863 (2023). https://doi.org/10.1007/s10489-023-04838-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04838-4

Keywords

Navigation