Skip to main content

Summarizing Global SARS-CoV-2 Geographical Spread by Phylogenetic Multitype Branching Models

  • Conference paper
  • First Online:
Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB 2021)

Abstract

Using available phylogeographical data of 3585 SARS–CoV–2 genomes we attempt at providing a global picture of the virus’s dynamics in terms of directly interpretable parameters. To this end we fit a hidden state multistate speciation and extinction model to a pre-estimated phylogenetic tree with information on the place of sampling of each strain. We find that even with such coarse–grained data the dominating transition rates exhibit weak similarities with the most popular, continent–level aggregated, airline passenger flight routes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Availability of Data and Materials

The R scripts, RevBayes scripts, MCMC chains, along with the used phylogenetic tree, geographical classification, inside and between regions air passenger volume fractions are available at https://github.com/KHDS-mod/COVID-19-HiSSE and https://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-185867. An already constructed phylogenetic tree and strain (i.e. leaf) data were downloaded from NextStrain (https://nextstrain.org/ncov/global) on 26\(^{\textrm{th}}\) April 2020. This data set contains 3585 genomes sampled between December 2019 and April 2020. A full acknowledgments table of the research groups and authors from the whole world generating the sequence data, from which NextStrain’s phylogenetic tree is constructed, is provided in the nextstrain_ncov_global_authors.tsv file in COVID-19-HiSSE repository. The geographic distribution of COVID–19 case fatalities worldwide (presented in Tab. 1) were downloaded from European Centre for Disease Prevention and Control (https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide ECDC) on 11\(^{\textrm{th}}\) May 2020. We took a subset of the case fatalities for 26\(^{\textrm{th}}\) April 2020 corresponding to NextStrain’s sequences. The region of North America includes the following countries: Canada, Mexico, Panama, USA. The region of South America includes the following countries: Brazil, Chile, Colombia, Ecuador, Peru, Uruguay. The 5 deaths from Georgia were subtracted from Europe and added to Asia, because Georgia is classified as Asia in the NextStrain data. In addition, there are 7 deaths not classified in any of the regions by ECDC. These are labelled as “Cases on an international conveyance Japan” and seem to correspond to deaths on cruise ships. We excluded these completely. The air passenger data have been obtained through the commercial provider SABRE [18]. Data are consolidated for the years 2019 and 2020.

References

  1. Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Parzen, E., Tanabe, K., Kitagawa, G. (eds.) Selected Papers of Hirotugu Akaike. Springer Series in Statistics, pp. 199–213. Springer, New York (1998). https://doi.org/10.1007/978-1-4612-1694-0_15

    Chapter  Google Scholar 

  2. Beaulieu, J.M., O’Meara, B.C.: Detecting hidden diversification shifts in models of trait-dependent speciation and extinction. Syst. Biol. 65(4), 583–601 (2016). https://doi.org/10.1093/sysbio/syw022

    Article  PubMed  Google Scholar 

  3. Cole, D.J.: Parameter redundancy and identifiability in hidden Markov models. METRON 77, 105–118 (2019). https://doi.org/10.1007/s40300-019-00156-3

    Article  Google Scholar 

  4. FitzJohn, R.G.: Diversitree: comparative phylogenetic analyses of diversification in R. Methods Ecol. Evol. 3, 1084–1092 (2012). https://doi.org/10.1111/j.2041-210X.2012.00234.x

    Article  Google Scholar 

  5. Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis, 2nd edn, pp. 296–297. CRC Press, Boca Raton (2004)

    Google Scholar 

  6. Geoghegan, J.L., et al.: Genomic epidemiology reveals transmission patterns and dynamics of SARS-CoV-2 in Aotearoa New Zealand. Nature Commun. 11(1), 6351 (2020). https://doi.org/10.1038/s41467-020-20235-8, https://www.nature.com/articles/s41467-020-20235-8

  7. Hadfield, J., et al.: Nextstrain: real-time tracking of pathogen evolution. Bioinformatics 34(23), 4121–4123 (2018). https://doi.org/10.1093/bioinformatics/bty407

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Höhna, S., et al.: RevBayes: Bayesian phylogenetic inference using graphical models and an interactive model-specification language. Syst. Biol. 65(4), 726–736 (2016). https://doi.org/10.1093/sysbio/syw021

    Article  PubMed  PubMed Central  Google Scholar 

  9. Kermack, W.O., McKendrick, A.G., Walker, G.T.: A contribution to the mathematical theory of epidemics. Proc. Roy. Soc. Lond. Ser. A Containing Papers Math. Phys. Character 115(772), 700–721 (1927). https://doi.org/10.1098/rspa.1927.0118

  10. Lemieux, J.E., et al.: Phylogenetic analysis of SARS-CoV-2 in Boston highlights the impact of superspreading events. Science 371(6529) (2021). https://doi.org/10.1126/science.abe3261, https://science.sciencemag.org/content/371/6529/eabe3261

  11. Newton, M.A., Raftery, A.E.: Approximate Bayesian inference with the weighted likelihood bootstrap. J. Roy. Stat. Soc. Ser. B (Methodol.) 56(1), 3–26 (1994)

    Google Scholar 

  12. Pan, B., et al.: Identification of epidemiological traits by analysis of SARS-CoV-2 sequences. Viruses 13(5), 764 (2021). https://doi.org/10.3390/v13050764

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Paradis, E., Schliep, K.: ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35, 526–528 (2019)

    Google Scholar 

  14. Popa, A., et al.: Genomic epidemiology of superspreading events in Austria reveals mutational dynamics and transmission properties of SARS-CoV-2. Sci. Transl. Med. 12(573) (2020). https://doi.org/10.1126/scitranslmed.abe2555, https://stm.sciencemag.org/content/12/573/eabe2555

  15. Price, M.N., Dehal, P.S., Arkini, A.P.: Fasttree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 26(7), 1641–1650 (2009). https://doi.org/10.1093/molbev/msp077

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2020). https://www.R-project.org/

  17. Revell, L.J.: phytools: An R package for phylogenetic comparative biology (and other things). Methods Ecol. Evol. 3, 217–223 (2012)

    Article  Google Scholar 

  18. SABRE: Sabre market intelligence platform (2020). https://www.sabreairlinesolutions.com/images/uploads/AirVision-Market-Intelligence_GDD_Profile_Sabre.pdf

  19. Sagulenko, P., Puller, V., Neher, R.A.: TreeTime: maximum-likelihood phylodynamic analysis. Virus Evol. 4(1), vex042 (2018). https://doi.org/10.1093/ve/vex042

  20. Schwarz, G.E.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978). https://doi.org/10.1214/aos/1176344136

    Article  Google Scholar 

  21. Sjaarda, C.P., et al.: Phylogenomics reveals viral sources, transmission, and potential superinfection in early-stage COVID-19 patients in Ontario, Canada. Sci. Rep. 11(1) (2021). https://doi.org/10.1038/s41598-021-83355-1, https://www.nature.com/articles/s41598-021-83355-1

  22. Takahashi, S., Greenhouse, B., Rodríguez-Barraquer, I.: Are seroprevalence estimates for severe acute respiratory syndrome coronavirus \(2\) biased? J. Infect. Dis. 222(11), 1772–1775 (2020). https://doi.org/10.1093/infdis/jiaa523

    Article  CAS  PubMed  Google Scholar 

  23. Yanev, N.M., Stoimenova, V.K., Atanasov, D.V.: Branching stochastic processes as models of Covid-\(19\) epidemic development. arXiv e-prints (2020)

    Google Scholar 

Download references

Acknowledgements

We thank Fredrik Ronquist for very valuable comments. K.B.’s research is supported by Vetenskapsrådets Grant 2017–04951 and partially by an ELLIIT Call C grant. H.K.’s research is partially supported by Vetenskapsrådets Grant 2017–04951.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hao Chi Kiang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 30674 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kiang, H.C., Bartoszek, K., Sakowski, S., Iacus, S.M., Vespe, M. (2022). Summarizing Global SARS-CoV-2 Geographical Spread by Phylogenetic Multitype Branching Models. In: Chicco, D., et al. Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2021. Lecture Notes in Computer Science(), vol 13483. Springer, Cham. https://doi.org/10.1007/978-3-031-20837-9_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20837-9_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20836-2

  • Online ISBN: 978-3-031-20837-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics