Reduced multidimensional scaling

Paradis, Emmanuel

doi:10.1007/s00180-021-01116-0

Reduced multidimensional scaling

Original paper
Published: 05 June 2021

Volume 37, pages 91–105, (2022)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Emmanuel Paradis ORCID: orcid.org/0000-0003-3092-2199¹

374 Accesses
3 Citations
Explore all metrics

Abstract

Dimension reduction is a common problem when analysing large data sets. The present paper proposes a method called reduced multidimensional scaling based on performing an initial standard multidimensional scaling on a reduced data set. This method faces the problem of finding a representative reduced sample. An algorithm is presented to perform this selection based on alternating sampling in outlier areas and observations in high density areas. A space is then constructed with the selected reduced sample by standard multidimentional scaling using pairwise distances. The observations not included in the reduced sample are then projected on the constructed space using Gower’s formula in order to obtain a final representation of the whole data set. The only requirement is the ability to compute distances among observations. A simulation study showed that the proposed algorithm results performs well to detect outliers. Evaluation of running times suggests that the proposed method could run in a few hours with data sets that would take more than one year to analyse with standard multidimensional scaling. An application is presented with a dataset of 9547 DNA sequences of human immunodeficiency viruses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multidimensional scaling for big data

Article Open access 13 April 2024

The generalized ratios intrinsic dimension estimator

Article Open access 21 November 2022

Multidimensional Scaling for Genomic Data

References

Abraham G, Inouye M (2014) Fast principal component analysis of large-scale genome-wide data. PLoS ONE 9(4):e93766. https://doi.org/10.1371/journal.pone.0093766
Article Google Scholar
Baglama J, Lothar R (2005) Augmented implicitly restarted Lanczos bidiagonalization methods. SIAM J Sci Comput 27(1):19–42
Article MathSciNet Google Scholar
Baglama J, Reichel L, Lewis BW (2019) irlba: fast truncated singular value decomposition and principal components analysis for large dense and sparse matrices. https://CRAN.R-project.org/package=irlba, R package version 2.3.3
Becht E, McInnes L, Healy J, Dutertre CA, Kwok IWH, Ng LG, Ginhoux F, Newell EW (2019) Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol 37:38–44. https://doi.org/10.1038/nbt.4314
Article Google Scholar
Beugin MP, Gayet T, Pontier D, Devillard S, Jombart T (2018) A fast likelihood solution to the genetic clustering problem. Methods Ecol Evol 9(4):1006–1016. https://doi.org/10.1111/2041-210X.12968
Article Google Scholar
Degras D, Cardot H (2016) Online principal component analysis. https://CRAN.R-project.org/package=onlinePCA, r package version 1.3.1
D’Enza AI, Markos A, Buttarazzi D (2018) The idm package: incremental decomposition methods in R. J Stat Softw Code Snippets 86(4):1–24. https://doi.org/10.18637/jss.v086.c04
Article Google Scholar
Erichson NB, Voronin S, Brunton SL, Kutz JN (2019) Randomized matrix decompositions using R. J Stat Softw 89(11):1–48. https://doi.org/10.18637/jss.v089.i11
Article Google Scholar
Franch G, Jurman G, Coviello L, Pendesini M, Furlanello C (2019) MASS-UMAP: fast and accurate analog ensemble search in weather radar archives. Remote Sens 11(24):2922. https://doi.org/10.3390/rs11242922
Article Google Scholar
Gower JC (1968) Adding a point to vector diagrams in multivariate analysis. Biometrika 55(3):582–585
Article Google Scholar
Halko N, Martinsson PG, Tropp JA (2011) Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev 53(2):217–288. https://doi.org/10.1137/090771806
Article MathSciNet MATH Google Scholar
Kruskal JB (1964) Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29(1):1–27
Article MathSciNet Google Scholar
Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137. https://doi.org/10.1371/journal.pone.00937660
Article MathSciNet MATH Google Scholar
McInnes L, Healy J, Saul N, Großberger L (2018) UMAP: uniform manifold approximation and projection. J Open Sour Softw 3:861. https://doi.org/10.21105/joss.00861
Article Google Scholar
Mirarab S, Nguyen N, Guo S, Wang LS, Kim J, Warnow T (2015) PASTA: ultra-large multiple sequence alignment for nucleotide and amino-acid sequences. J Comput Biol 22(5):377–386. https://doi.org/10.1371/journal.pone.00937662
Article Google Scholar
Paradis E (2018) Multidimensional scaling with very large data sets. J Comput Gr Stat 27(4):935–939. https://doi.org/10.1080/10618600.2018.1470001
Article MathSciNet Google Scholar
Paradis E (2020) Population genomics with R. Chapman & Hall, Boca Raton, FL
Book Google Scholar
Paradis E, Schliep K (2019) ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35(3):526–528. https://doi.org/10.1093/bioinformatics/bty633
Article Google Scholar
Qiu Y, Mei J (2019) RSpectra: solvers for large-scale eigenvalue and SVD problems. https://doi.org/10.1371/journal.pone.00937664, r package version 0.16-0
R Core Team (2021) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://doi.org/10.1371/journal.pone.00937665
Roweis S (1998) EM algorithms for PCA and SPCA. In: Neural Information Processing Systems 10 (NIPS’97), pp 626–632
Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, Efron MJ, Iyer R, Schatz MC, Sinha S, Robinson GE (2015) Big data: astronomical or genomical? PLoS Biol 13(7):e1002195
Article Google Scholar
Sun S, Zhu J, Ma Y, Zhou X (2019) Accuracy, robustness and scalability of dimensionality reduction methods for single-cell RNA-seq analysis. Genome Biol 20:269. https://doi.org/10.1371/journal.pone.00937666
Article Google Scholar
Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, New York
Book Google Scholar
Wan S, Kim J, Won KJ (2020) SHARP: hyperfast and accurate processing of single-cell RNA-seq via ensemble random projection. Genome Res 30:205–213. https://doi.org/10.1371/journal.pone.00937667
Article Google Scholar

Download references

Acknowledgements

I am grateful to two anonymous reviewers for their constructive comments on a previous version of this article. This is publication ISEM 2021-118.

Author information

Authors and Affiliations

ISEM, IRD, CNRS, EPHE, University of Montpellier, Montpellier, France
Emmanuel Paradis

Authors

Emmanuel Paradis
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Emmanuel Paradis.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Paradis, E. Reduced multidimensional scaling. Comput Stat 37, 91–105 (2022). https://doi.org/10.1007/s00180-021-01116-0

Download citation

Received: 22 May 2020
Accepted: 27 May 2021
Published: 05 June 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s00180-021-01116-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reduced multidimensional scaling

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multidimensional scaling for big data

The generalized ratios intrinsic dimension estimator

Multidimensional Scaling for Genomic Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now