Abstract
The work aims to develop a method for assessing the quality of publicly available data collections on the spread of the COVID-19 pandemic with daily infection statistics, recoveries and deaths. The World Health Organization, European Center for Disease Prevention and Control, Johns Hopkins University and Ministry of Health of the Republic of Poland provide this data as proof of concept. Metrics have been proposed that describe the most important quality features for this type of data collection - accuracy, completeness and consistency. Additional measures have also been defined based on anomaly detection, credibility and correlation between sets. A quality assessment method has been developed that uses specific metrics. The effectiveness of measures was tested on original and modified data. The findings showed that the measures were defined correctly. The method assigns lower-quality categories to datasets containing irregularities and higher for data with fewer errors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Behkamal, B., Kahani, M., Bagheri, E., Jeremic, Z.: A metrics-driven approach for quality assessment of linked open data. J. Theor. Appl. Electron. Commer. Res. 9(2), 11–12 (2014)
Benford, F.: The law of anomalous numbers. Proc. Am. Philos. Soc. 78(4), 551–572 (1938)
Chen, H., Hailey, D., Wang, N., Yu, P.: A review of data quality assessment methods for public health information systems. Int. J. Environ. Res. Public Health 11(5), 5170–5207 (2014)
Farhadi, N.: Can we rely on COVID-19 data? An assessment of data from over 200 countries worldwide. Sci. Progr. 104(2), 1–19 (2021)
Farhadi, N., Lahooti, H.: Forensic analysis of COVID-19 data from 198 countries two years after the pandemic outbreak. COVID 2(4), 472–484 (2022)
Kolias, P.: Applying Benford’s law to COVID-19 data: the case of the European Union. J. Public Health 44, e221–e226 (2022)
Pucher, S., Król, D.: A Quality Assessment Tool for Koblenz Datasets Using Metrics-Driven Approach. In: Fujita, H., Fournier-Viger, P., Ali, M., Sasaki, J. (eds.) IEA/AIE 2020. LNCS (LNAI), vol. 12144, pp. 747–758. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-55789-8_64
Wang, G., et al.: Comparing and integrating us COVID-19 data from multiple sources with anomaly detection and repairing. J. Appl. Stat. 50(11–12), 2408–2434 (2023)
Acknowledgments
Part of the work presented in this paper received financial support from the statutory funds at the Wrocław University of Science and Technology.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Król, D., Bodek, A. (2023). Assessing Data Quality: An Approach for the Spread of COVID-19. In: Nguyen, N.T., et al. Recent Challenges in Intelligent Information and Database Systems. ACIIDS 2023. Communications in Computer and Information Science, vol 1863. Springer, Cham. https://doi.org/10.1007/978-3-031-42430-4_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-42430-4_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42429-8
Online ISBN: 978-3-031-42430-4
eBook Packages: Computer ScienceComputer Science (R0)