Skip to main content

A Cautionary Tale on Using Covid-19 Data for Machine Learning

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12721))

Abstract

Introduction: Good quality and real-time epidemiological COVID-19 data are paramount to fight this pandemic through statistical/machine-learning based decision-making support mechanisms.

Aims: Evaluate the resources available and used to gather COVID-19 epidemiological data by Portuguese health authorities from the onset of the pandemic until December 2020. The analysis laid on two main topics: (a) work processes at the Public Health Unit (PHU) level and (b) registry forms for epidemiological reporting and control procedures. Recommendations on requirements to overcome problems related to data integration and interoperability in order to build robust decision-making support mechanisms will also be produced.

Methods: For topic (a), we revised the Portuguese Directorate-General of Health (DGS) guidelines for data treatment. For topic (b), we analysed the forms used during first and second waves, while comparing them with DGS metadata provided to researchers.

Results: On topic (a), we detected the use of two complementary and non-interoperable systems. Further, the workflow does not seem to promote data quality and facilitates the occurrence of communication problems between health professionals. On topic (b), we found 27 deleted questions, 6 new questions, 1 displaced question, and 1 text modification between the 2 form versions.

Discussion: Both the workflow and data gathering methods are not the best suited for the generation of good quality data. They do not effectively support Public Health Professionals (PHP) nor provide the elements for posterior data analysis. The use of data by decision-making support mechanisms demands a careful planning of the data used to depict reality, and this condition is not met by the currently used forms.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. WHO Regional Office for Europe: Healthy, prosperous lives for all: the European Health Equity Status Report (2019). World Health Organization, Copenhagen (2019)

    Google Scholar 

  2. Morgan, O.: How decision makers can use quantitative approaches to guide outbreak responses. Philos. Trans. R. Soc. B Biol. Sci. 374(1776), 20180365 (2019)

    Article  Google Scholar 

  3. Yozwiak, N.L., Schaffner, S.F., Sabeti, P.C.: Data sharing: make outbreak research open access. Nature 518(7540), 477–479 (2015)

    Article  Google Scholar 

  4. Bo, X., et al.: Open access epidemiological data from the COVID-19 outbreak. Lancet Infect. Dis. 20(5), 534 (2020)

    Article  Google Scholar 

  5. WHO Regional Office for Europe: European Action Plan for Strengthening Public Health Capacities and Services. World Health Organization, Malta (2012)

    Google Scholar 

  6. Assembleia da República: Lei n.º 81/2009 de 21 de Agosto - Institui um sistema de vigilância em saúde pública, que identifica situações de risco, recolhe, actualiza, analisa e divulga os dados relativos a doenças transmissíveis e outros riscos em saúde pública, bem como prepara planos de contingência face a situações de emergência ou tão graves como de calamidade pública. Diário da República n.º 162/2009, Série I, pp. 5491–5495 (2009). Publication in Portuguese

    Google Scholar 

  7. Direção-Geral da Saúde: COVID-19 metadata (2020). https://covid19.min-saude.pt/wpcontent/uploads/2020/04/PT_COVID19_metadata-1.pdf. Accessed 13 Jan 2021. Publication in Portuguese

  8. Direção-Geral da Saúde: Norma 004/2020 de 23/03/2020, COVID-19: Abordagem do Doente com Suspeita ou Confirmação de COVID-19. https://covid19.min-saude.pt/wp-content/uploads/2020/12/Norma-004_2020.pdf. Accessed 13 Jan 2021. First published 2020/03/23, updated 2020/10/14. Publication in Portuguese

  9. Direção-Geral da Saúde: Norma 020/2020 de 09/11/2020, COVID-19: Definição de Caso de COVID-19. https://covid19.min-saude.pt/wp-content/uploads/2020/11/Norma_020_2020.pdf. Accessed 13 Jan 2021. First published 2020/11/09. Publication in Portuguese

  10. Direção-Geral da Saúde: Despacho n.º 5855/2014. Ministério da Saúde, Diário da República n.º 85/2014, Série II, p. 11660 (2014). Publication in Portuguese

    Google Scholar 

  11. Direção-Geral da Saúde: Norma 015/2020 de 24/07/2020, COVID-19: Rastreio de Contactos. https://covid19.min-saude.pt/wp-content/uploads/2020/08/i026538.pdf. Accessed 13 Jan 2021. First published 2020/07/24. Publication in Portuguese

  12. Presidência do Conselho de Ministros: Decreto n.º 2-A/2020 de 20 de Março. Diário da República n.º 57/2020, 1º Suplemento, Série I, pp. 11(5)–11(17) (2020). Publication in Portuguese

    Google Scholar 

  13. European Centre for Disease Prevention and Control: Contact tracing for COVID-19: current evidence, options for scale-up and an assessment of resources needed. European Centre for Disease Prevention and Control (2020)

    Google Scholar 

  14. European Centre for Disease Prevention and Control: Clinical characteristics of COVID-19. https://www.ecdc.europa.eu/en/covid-19/latest-evidence/clinical. Accessed 13 Jan 2021

  15. Costa-Santos, C., Luísa Neves, A., Correia, R., Santos, P., Monteiro-Soares, M., Freitas, A., et al.: COVID-19 surveillance - a descriptive study on data quality issues (2020). https://doi.org/10.1101/2020.11.03.20225565

  16. Associação Nacional de Médicos de Saúde Pública: COVID-19 - Mapa Epidemiológico Portugal. https://www.anmsp.pt/covid19-mapa. Accessed 13 Jan 2021

Download references

Acknowledgments

This work has been done under the scope of - and funded by - the PhD Program in Health Data Science of the Faculty of Medicine of the University of Porto, Portugal - heads.med.up.pt.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to João Miguel Alves .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nogueira-Leite, D., Alves, J.M., Marques-Cruz, M., Cruz-Correia, R. (2021). A Cautionary Tale on Using Covid-19 Data for Machine Learning. In: Tucker, A., Henriques Abreu, P., Cardoso, J., Pereira Rodrigues, P., Riaño, D. (eds) Artificial Intelligence in Medicine. AIME 2021. Lecture Notes in Computer Science(), vol 12721. Springer, Cham. https://doi.org/10.1007/978-3-030-77211-6_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-77211-6_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-77210-9

  • Online ISBN: 978-3-030-77211-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics