Abstract
The unprecedented level of genome sequencing during the SARS-CoV-2 pandemic brought about the challenge of processing this genomic data. However, the state-of-the-art phylogenetic methods were mostly designed for analyzing data that are significantly sparser and require extensive subsampling of strains. We present \((\varepsilon ,\tau )\)-MSN, a novel tool that reconstructs a viral genetic relatedness network based on genetic distances, that can process hundreds of thousands of sequences in under several hours. We applied \((\varepsilon ,\tau )\)-MSN to the global COVID-19 outbreak data and were able to build a genetic network on more than 100,000 SARS-CoV-2 sequences. We show that \((\varepsilon ,\tau )\)-MSN can accurately detect transmission events and build a genetic network with significantly higher assortativity with respect to continent and country attributes of SARS-CoV-2 samples. The source code for this software suite is available at https://github.com/Sergey-Knyazev/eMST.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alexiev, I., et al.: Molecular epidemiological analysis of the origin and transmission dynamics of the HIV-1 CRF01\(\_\)AE sub-epidemic in Bulgaria. Viruses 13(1), 116 (2021)
Alexiev, I., et al.: Molecular epidemiology of the HIV-1 subtype b sub-epidemic in Bulgaria. Viruses 12(4), 441 (2020)
Bandelt, H.J., Forster, P., Rohl, A.: Median-joining networks for inferring intraspecific phylogenies. Mol. Biol. Evol. 16(1), 37–48 (1999)
Campbell, E.M., et al.: MicrobeTrace: retooling molecular epidemiology for rapid public health response. PLOS Comput. Biol. 17(9), e1009300 (2021)
Campbell, E.M., et al.: Detailed transmission network analysis of a large opiate-driven outbreak of HIV infection in the united states. J. Infect. Dis. 216(9), 1053–1062 (2017)
Campbell, F., Didelot, X., Fitzjohn, R., Ferguson, N., Cori, A., Jombart, T.: outbreaker2: a modular platform for outbreak reconstruction. BMC Bioinformatics 19(S11) (2018). https://doi.org/10.1186/s12859-018-2330-z
Campo, D.S., et al.: Next-generation sequencing reveals large connected networks of intra-host HCV variants. BMC Genomics 15(S5) (2014). https://doi.org/10.1186/1471-2164-15-s5-s4
Campo, D.S., et al.: Accurate genetic detection of hepatitis c virus transmissions in outbreak settings. J. Infect. Dis. 213(6), 957–965 (2015)
Campo, D.S., Zhang, J., Ramachandran, S., Khudyakov, Y.: Transmissibility of intra-host hepatitis c virus variants. BMC Genomics 18(S10) (2017). https://doi.org/10.1186/s12864-017-4267-4
Excoffier, L., Smouse, P.E.: Using allele frequencies and geographic subdivision to reconstruct gene trees within a species: molecular variance parsimony. Genetics 136(1), 343–359 (1994)
Fauver, J.R., et al.: Coast-to-coast spread of SARS-CoV-2 in the United States revealed by genomic epidemiology (2020). https://doi.org/10.1101/2020.03.25.20043828
Felsenstein, J.: Inferring Phylogenies. Sinauer Associates is an imprint of Oxford University Press, paperback edn., September 2003. https://lead.to/amazon/com/?op=bt&la=en&cu=usd&key=0878931775
Forster, P., Forster, L., Renfrew, C., Forster, M.: Phylogenetic network analysis of SARS-CoV-2 genomes. Proc. Natl. Acad. Sci. 117(17), 9241–9243 (2020)
Glebova, O., et al.: Inference of genetic relatedness between viral quasispecies from sequencing data. BMC Genomics 18(S10) (2017). https://doi.org/10.1186/s12864-017-4274-5
Gonzalez-Reiche, A.S., et al.: Introductions and early spread of SARS-CoV-2 in the New York city area. Science 369(6501), 297–301 (2020)
Grande, K.M., Schumann, C.L., Ocfemia, M.C.B., Vergeront, J.M., Wertheim, J.O., Oster, A.M.: Transmission patterns in a low HIV-morbidity state — Wisconsin, 2014–2017. MMWR. Morb. Mortal. Wkly. Rep. 68(6), 149–152 (2019). https://doi.org/10.15585/mmwr.mm6806a5
Houldcroft, C.J., Beale, M.A., Breuer, J.: Clinical and biological insights from viral genome sequencing. Nat. Rev. Microbiol. 15(3), 183–192 (2017). https://doi.org/10.1038/nrmicro.2016.182
Klinkenberg, D., Backer, J., Didelot, X., Colijn, C., Wallinga, J.: New method to reconstruct phylogenetic and transmission trees with sequence data from infectious disease outbreaks (2016)
Knyazev, S., Hughes, L., Skums, P., Zelikovsky, A.: Epidemiological data analysis of viral quasispecies in the next-generation sequencing era. Briefings Bioinform. 22(1), 96–108 (2020)
Knyazev, S., et al.: Accurate assembly of minority viral haplotypes from next-generation sequencing through efficient noise reduction. Nucleic Acids Res. 49, e102 (2021)
Longmire, A.G., et al.: Ghost: global hepatitis outbreak and surveillance technology. BMC Genomics 18(S10) (2017). https://doi.org/10.1186/s12864-017-4268-3
Melnyk, A., Knyazev, S., Vannberg, F., Bunimovich, L., Skums, P., Zelikovsky, A.: Using earth mover’s distance for viral outbreak investigations. BMC Genomics 21(S5) (2020). https://doi.org/10.1186/s12864-020-06982-4
Melnyk, A., et al.: Clustering based identification of SARS-CoV-2 subtypes. In: Jha, S.K., Măndoiu, I., Rajasekaran, S., Skums, P., Zelikovsky, A. (eds.) ICCABS 2020. LNCS, vol. 12686, pp. 127–141. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-79290-9_11
Novikov, D., Knyazev, S., Grinshpon, M., Baykal, P.I., Skums, P., Zelikovsky, A.: Scalable reconstruction of SARS-CoV-2 phylogeny with recurrent mutations. J. Comput. Biol. (to appear)
Oster, A.M., et al.: Identifying clusters of recent and rapid HIV transmission through analysis of molecular surveillance data. JAIDS J. Acquir. Immune Defic. Syndr. 79(5), 543–550 (2018)
Pond, S.L.K., Weaver, S., Brown, A.J.L., Wertheim, J.O.: HIV-TRACE (TRAnsmission cluster engine): a tool for large scale molecular epidemiology of HIV-1 and other rapidly evolving pathogens. Mol. Biol. Evol. 35(7), 1812–1819 (2018)
Prabhakaran, S., Rey, M., Zagordi, O., Beerenwinkel, N., Roth, V.: HIV haplotype inference using a propagating Dirichlet process mixture model. IEEE/ACM Trans. Comput. Biol. Bioinform. 11(1), 182–191 (2014)
Sanjuán, R., Domingo-Calap, P.: Mechanisms of viral mutation. Cell. Mol. Life Sci. 73(23), 4433–4448 (2016)
Skums, P., Kirpich, A., Baykal, P.I., Zelikovsky, A., Chowell, G.: Global transmission network of SARS-CoV-2: from outbreak to pandemic (2020). https://doi.org/10.1101/2020.03.22.20041145
Skums, P., et al.: QUENTIN: reconstruction of disease transmissions from viral quasispecies genomic data. Bioinformatics 34(1), 163–170 (2017)
Stamatakis, A.: RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9), 1312–1313 (2014)
Tamura, K., Nei, M.: Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol. 10, 512–526 (1993)
Wymant, C., et al.: PHYLOSCANNER: inferring transmission from within- and between-host pathogen genetic diversity. Mol. Biol. Evol. 35(3), 719–733 (2017). https://doi.org/10.1093/molbev/msx304
Acknowledgement
DN, SK, and AZ were partially supported by NSF grants 1564899 and 16119110 and by NIH grant 1R01EB025022-01. PS was partially supported by NIH grant 1R01EB025022-01 and NSF grant 2047828. SK was partially supported by the GSU Molecular Basis of Disease Fellowship. SM was partially supported by NSF grant 2041984.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Knyazev, S. et al. (2021). A Novel Network Representation of SARS-CoV-2 Sequencing Data. In: Wei, Y., Li, M., Skums, P., Cai, Z. (eds) Bioinformatics Research and Applications. ISBRA 2021. Lecture Notes in Computer Science(), vol 13064. Springer, Cham. https://doi.org/10.1007/978-3-030-91415-8_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-91415-8_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91414-1
Online ISBN: 978-3-030-91415-8
eBook Packages: Computer ScienceComputer Science (R0)