skip to main content
10.1145/3678717.3691293acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
short-paper
Open access

An Infectious Disease Spread Simulation to Control Data Bias

Published: 22 November 2024 Publication History

Abstract

The increased availability of datasets during the COVID-19 pandemic enabled machine-learning approaches for modeling and forecasting infectious diseases. However, such approaches are known to amplify the bias in the data they are trained on. Bias in such input data like clinical case data for COVID-19 is difficult to measure due to disparities in testing availability, reporting standards, and healthcare access among different populations and regions. Furthermore, the way such biases may propagate through the modeling pipeline to decision-making is relatively unknown. Therefore, we present a system that leverages a highly detailed agent-based model (ABM) of infectious disease spread in a city to simulate the collection of biased clinical case data where the bias is known. Our system allows users to load either a pre-selected region or select their own (using OpenStreetMap data for the environment and census data for the population), specify population and infectious disease parameters, and the degree(s) to which different populations will be overrep-resented or underrepresented in the case data. In addition to the system, we provide a large number of benchmark datasets that produce case data at different levels of bias for different regions. We hope that infectious disease modelers will use these datasets to investigate how well their models are robust to data bias or whether their model is overfit to biased data.

References

[1]
Icek Ajzen. 1991. The Theory of planned behavior. Organizational behavior and human decision processes 50, 2 (1991), 179--211.
[2]
Hossein Amiri, Shiyang Ruan, Joon-Seok Kim, Hyunjee Jin, Hamdi Kavak, Andrew Crooks, Dieter Pfoser, Carola Wenk, and Andreas Züfle. 2023. Massive Trajectory Data Based on Patterns of Life. In ACM SIGSPATIAL.
[3]
Christopher Antoun, Chan Zhang, et al. 2016. Comparisons of online recruitment strategies for convenience samples: Craigslist, Google AdWords, Facebook, and Amazon Mechanical Turk. Field methods 28, 3 (2016), 231--246.
[4]
Pierre-Yves Boëlle, Cécile Souty, Titouan Launay, et al. 2020. Excess cases of influenza-like illnesses synchronous with coronavirus disease (COVID-19) epidemic, France, March 2020. Eurosurveillance 25, 14 (2020), 2000326.
[5]
Logan C Brooks, David C Farrow, Sangwon Hyun, Ryan J Tibshirani, and Roni Rosenfeld. 2018. Nonmechanistic forecasts of seasonal influenza with iterative one-week-ahead distributions. PLoS computational biology 14, 6 (2018), e1006134.
[6]
Songgaojun Deng, Shusen Wang, et al. 2020. Cola-GNN: Cross-location attention based graph neural networks for long-term ILI prediction. In CIKM. 245--254.
[7]
Catherine Dodds and Ibidun Fakoya. 2020. Covid-19: ensuring equality of access to testing for ethnic minorities.
[8]
Justin Elarde, Joon-Seok Kim, Hamdi Kavak, Andreas Züfle, and Taylor Anderson. 2021. Change of human mobility during COVID-19: A United States case study. PloS one 16, 11 (2021), e0259031.
[9]
Gareth J Griffith, Tim T Morris, Matthew J Tudball, et al. 2020. Collider bias undermines our understanding of COVID-19 disease risk and severity. Nature communications 11, 1 (2020), 5749.
[10]
Melanie Henwood. 2020. Care home deaths: The untold and largely unrecorded tragedy of COVID-19. British Policy and Politics at LSE (2020).
[11]
William Ogilvy Kermack and Anderson G McKendrick. 1932. Contributions to the mathematical theory of epidemics. II.---The problem of endemicity. Proceedings of the Royal Society of London. Series A 138, 834 (1932), 55--83.
[12]
J. S. Kim, H. Jin, H. Kavak, O. C. Rouly, A. Crooks, D. Pfoser, C. Wenk, and A. Züfle. 2020. Location-Based Social Network Data Generation Based on Patterns of Life. In MDM. 158--167. https://doi.org/10.1109/MDM48529.2020.00038
[13]
Will Kohn, Hossein Amiri, and Andreas Züfle. 2023. EPIPOL: An Epidemiological Patterns of Life Simulation (Demonstration Paper). In 4th ACM SIGSPATIAL International Workshop on Spatial Computing for Epidemiology. 13--16.
[14]
T Kuchler, D Russel, and J Stroebel. 2020. The Geographic Spread of COVID-19 Correlates with Structure of Social Networks as Measured by Facebook (2020). Technical Report. CESifo Working Paper.
[15]
Abraham H Maslow. 1943. A theory of human motivation. Psychological review 50, 4 (1943), 370.
[16]
Nicholas G Reich, Logan C Brooks, Spencer J Fox, et al. 2019. A collaborative multiyear, multimodel assessment of seasonal influenza forecasting in the United States. Proceedings of the National Academy of Sciences 116, 8 (2019), 3146--3154.
[17]
Alexander Rodríguez, Harshavardhan Kamarthi, Pulak Agarwal, Javen Ho, Mira Patel, Suchet Sapre, and B Aditya Prakash. 2022. Data-centric epidemic forecasting: A survey. arXiv preprint arXiv:2207.09370 (2022).
[18]
Ruochen Kong. Accessed 04/04/2024. Source Code, Data, and Supplemental for This Submission (https://github.com/RuochenKong/disease-simulator).
[19]
Anish Susarla, Austin Liu, Duy Hoang Thai, Minh Tri Le, and Andreas Züfle. 2022. Spatiotemporal Disease Case Prediction Using Contrastive Predictive Coding. In 3rd ACM SIGSPATIAL Workshop on Spatial Computing for Epidemiology.
[20]
Alma Tostmann, John Bradley, et al. 2020. Strong associations and moderate predictive value of early symptoms for SARS-CoV-2 test positivity among healthcare workers, the Netherlands, March 2020. Eurosurveillance 25, 16 (2020), 2000508.
[21]
Jessica Tyrrell, Jie Zheng, et al. 2021. Genetic predictors of participation in optional components of UK Biobank. Nature communications 12, 1 (2021), 886.
[22]
United States Census Bureau. Accessed 04/04/2024. United States Census Data Shapefiles (https://www2.census.gov/geo/tiger/TIGER2020PL/STATE/).
[23]
Svitlana Volkova, Ellyn Ayton, Katherine Porterfield, and Courtney D Corley. 2017. Forecasting influenza-like illness dynamics for military populations using neural networks and social media. PloS one 12, 12 (2017), e0188941.
[24]
Emma Von Hoene, Amira Roess, Shivani Achuthan, and Taylor Anderson. 2023. A Framework for Simulating Emergent Health Behaviors in Spatial Agent-Based Models of Disease Spread. In ACM SIGSPATIAL GeoSim Workshop. 1--9.
[25]
Andreas Züfle, Dieter Pfoser, Carola Wenk, et al. 2024. In Silico Human Mobility Data Science: Leveraging Massive Simulated Mobility Data (Vision Paper). ACM Transactions on Spatial Algorithms and Systems 10, 2 (2024), 1--27.
[26]
Andreas Züfle, Flora Salim, Taylor Anderson, et al. 2024. Leveraging Simulation Data to Understand Bias in Predictive Models of Infectious Disease Spread. ACM Transactions on Spatial Algorithms and Systems 10, 2 (2024), 1--22.
[27]
Andreas Züfle, Goce Trajcevski, Dieter Pfoser, and Joon-Seok Kim. 2020. Managing uncertainty in evolving geo-spatial data. In 2020 21st IEEE International Conference on Mobile Data Management (MDM). IEEE, 5--8.
[28]
Andreas Züfle, Carola Wenk, Dieter Pfoser, Andrew Crooks, Joon-Seok Kim, Hamdi Kavak, Umar Manzoor, and Hyunjee Jin. 2023. Urban life: a model of people and places. Computational and Mathematical Organization Theory (2023).

Index Terms

  1. An Infectious Disease Spread Simulation to Control Data Bias

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGSPATIAL '24: Proceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems
    October 2024
    743 pages
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 November 2024

    Check for updates

    Author Tags

    1. Bias Simulation
    2. Data Bias
    3. Data Simulation
    4. Infectious Disease Data

    Qualifiers

    • Short-paper
    • Research
    • Refereed limited

    Funding Sources

    Conference

    SIGSPATIAL '24
    Sponsor:

    Acceptance Rates

    SIGSPATIAL '24 Paper Acceptance Rate 37 of 122 submissions, 30%;
    Overall Acceptance Rate 257 of 1,238 submissions, 21%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 72
      Total Downloads
    • Downloads (Last 12 months)72
    • Downloads (Last 6 weeks)28
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media