Skip to main content

t-SNE Highlights Phylogenetic and Temporal Patterns of SARS-CoV-2 Spike and Nucleocapsid Protein Evolution

  • Conference paper
  • First Online:
Bioinformatics Research and Applications (ISBRA 2022)

Abstract

Since the beginning of the COVID-19 pandemic, whole-genome sequences of SARS-CoV-2 have been continuously added to public databases, such as NCBI Virus [4] and GISAID [3].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Adams, H., Blumstein, M., Kassab, L.: Multidimensional scaling on metric measure spaces. Rocky Mt. J. Math. 50(2), 397–413 (2020)

    Article  Google Scholar 

  2. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)

    Article  Google Scholar 

  3. Elbe, S., Buckland-Merrett, G.: Data, disease and diplomacy: GISAID’s innovative contribution to global health. Glob. Challenges 1(1), 33–46 (2017). https://doi.org/10.1002/gch2.1018

    Article  Google Scholar 

  4. Hatcher, E.L., et al.: Virus variation resource - improved response to emergent viral outbreaks. Nucleic Acids Res. 45(D1), D482–D490 (2016). https://doi.org/10.1093/nar/gkw1065

  5. Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. 89(22), 10915–10919 (1992). https://doi.org/10.1073/pnas.89.22.10915

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Hozumi, Y., Wang, R., Yin, C., Wei, G.-W.: UMAP-assisted K-means clustering of large-scale SARS-CoV-2 mutation datasets. Comput. Biol. Med. 131, 104264 (2021)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Katoh, K., Standley, D.M.: MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30(4), 772–780 (2013). https://doi.org/10.1093/molbev/mst010

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Krijthe, J.H.: Rtsne: T-Distributed Stochastic Neighbor Embedding using Barnes-Hut Implementation. R package version 0.16 (2015). https://github.com/jkrijthe/Rtsne

  9. Kroshnin, A., Stepanov, E., Trevisan, D.: Infinite multidimensional scaling for metric measure spaces. In: ESAIM: COCV, pp. 28: 58 (2022). https://doi.org/10.1051/cocv/2022053

  10. Lim, S., Memoli, F.: Classical MDS on metric measure spaces. arXiv preprint arXiv:2201.09385 (2022)

  11. Lin, Q., Huang, Y., Jiang, Z., Feng, W., Ma, L.: Deciphering the subtype differentiation history of SARS-CoV-2 based on a new breadth-first searching optimized alignment method over a global data set of 24,768 sequences. Front. Genet. 11, 591833 (2021)

    Article  PubMed  PubMed Central  Google Scholar 

  12. McInnes, L., Healy, J., Melville, J.: Umap: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018)

  13. O’Toole, Á., et al.: Tracking the international spread of SARS-CoV-2 lineages B.1.1.7 and B.1.351/501Y-V2 with grinch [version 2; peer review: 3 approved]. Wellcome Open Res. 6(121) (2021). https://doi.org/10.12688/wellcomeopenres.16661.2

  14. O’Toole, Á., et al.: Assignment of epidemiological lineages in an emerging pandemic using the pangolin tool. Virus Evol. 7(2) (2021). https://doi.org/10.1093/ve/veab064.veab064

  15. Okada, P., et al.: Early transmission patterns of coronavirus disease 2019 (COVID-19) in travellers from Wuhan to Thailand, January 2020. Eurosurveillance 25(8), 2000097 (2020). https://doi.org/10.2807/1560-7917.ES.2020.25.8.2000097

  16. Pershina, E.V., et al.: The evolutionary space model to be used for the metagenomic analysis of molecular and adaptive evolution in the bacterial communities. In: Pontarotti, P. (eds) Evolutionary Biology: Genome Evolution, Speciation, Coevolution and Origin of Life, pp. 339–355. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07623-2_16

  17. Team, R.C.: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2018). https://www.R-project.org/

  18. Rambaut, A., et al.: A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat. Microbiol. 5(11), 1403–1407 (2020). https://doi.org/10.1038/s41564-020-0770-5

  19. van der Maaten, L.J.P., Hinton, G.E.: Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)

    Google Scholar 

  20. Wang, J.: Geometric Structure of High-Dimensional Data and Dimensionality Reduction, vol. 13. Springer, Cham (2012)

    Google Scholar 

  21. Wickham, H.: ggplot2: Elegant Graphics for Data Analysis. Springer, New York (2016)

    Book  Google Scholar 

  22. Wickham, H., et al.: Welcome to the tidyverse. J. Open Source Softw. 4(43), 1686 (2019). https://doi.org/10.21105/joss.01686

Download references

Acknowledgment

G. Tamazian was supported by Peter the Great St. Petersburg Polytechnic University in the framework of the Russian Federation’s Priority 2030 Strategic Academic Leadership Programme (Agreement 075-15-2021-1333).

S. Kryzhevich was supported by Gdańsk University of Technology by the DEC 14/2021/IDUB/I.1 grant under the Nobelium - ‘Excellence Initiative - Research University’ program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergey Kryzhevich .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tamazian, G. et al. (2022). t-SNE Highlights Phylogenetic and Temporal Patterns of SARS-CoV-2 Spike and Nucleocapsid Protein Evolution. In: Bansal, M.S., Cai, Z., Mangul, S. (eds) Bioinformatics Research and Applications. ISBRA 2022. Lecture Notes in Computer Science(), vol 13760. Springer, Cham. https://doi.org/10.1007/978-3-031-23198-8_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-23198-8_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-23197-1

  • Online ISBN: 978-3-031-23198-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics