Abstract
Algorithmic bias in healthcare mirrors existing data biases. However, the factors driving unfairness are not always known. Medical devices capture significant amounts of data but are prone to errors; for instance, pulse oximeters overestimate the arterial oxygen saturation of darker-skinned individuals, leading to worse outcomes. The impact of this bias in machine learning (ML) models remains unclear. This study addresses the technical challenges of quantifying the impact of medical device bias in downstream ML. Our experiments compare a “perfect world”, without pulse oximetry bias, using SaO\(_2\) (blood-gas), to the “actual world”, with biased measurements, using SpO\(_2\) (pulse oximetry). Under this counterfactual design, two models are trained with identical data, features, and settings, except for the method of measuring oxygen saturation: models using SaO\(_2\) are a “control” and models using SpO\(_2\) a “treatment”. The blood-gas oximetry linked dataset was a suitable test-bed, containing 163,396 nearly-simultaneous SpO\(_2\) - SaO\(_2\) paired measurements, aligned with a wide array of clinical features and outcomes. We studied three classification tasks: in-hospital mortality, respiratory SOFA score in the next 24 h, and SOFA score increase by two points. Models using SaO\(_2\) instead of SpO\(_2\) generally showed better performance. Patients with overestimation of O\(_2\) by pulse oximetry of \(\ge \) 3% had significant decreases in mortality prediction recall, from 0.63 to 0.59, P < 0.001. This mirrors clinical processes where biased pulse oximetry readings provide clinicians with false reassurance of patients’ oxygen levels. A similar degradation happened in ML models, with pulse oximetry biases leading to more false negatives in predicting adverse outcomes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Laboratory reference ranges in healthy adults. https://emedicine.medscape.com/article/2172316-overview?form=fpf. Accessed 12 June 2024
Balagopalan, A., et al.: Machine learning for healthcare that matters: reorienting from technical novelty to equitable impact. PLOS Digit. Health 3(4), e0000474 (2024)
Bhavani, S.V., Wiley, Z., Verhoef, P.A., Coopersmith, C.M., Ofotokun, I.: Racial differences in detection of fever using temporal vs oral temperature measurements in hospitalized patients. JAMA 328(9), 885–886 (2022)
Charpignon, M.L., et al.: Critical bias in critical care devices. Crit. Care Clin. 39(4), 795–813 (2023)
Dempsey, K., Lindsay, M., Tcheng, J.E., Wong, A.K.I.: The high price of equity in pulse oximetry: a cost evaluation and need for interim solutions. medRxiv (2023)
Fawzy, A., et al.: Racial and ethnic discrepancy in pulse oximetry and delayed identification of treatment eligibility among patients with COVID-19. JAMA Intern. Med. 182(7), 730–738 (2022)
Hao, S., et al.: Utility of skin tone on pulse oximetry in critically ill patients: a prospective cohort study. medRxiv (2024)
Hempel, L., Sadeghi, S., Kirsten, T.: Prediction of intensive care unit length of stay in the MIMIC-IV dataset. Appl. Sci. 13(12), 6930 (2023)
Johnson, A.E., et al.: MIMIC-IV, a freely accessible electronic health record dataset. Sci. Data 10(1), 1 (2023)
Johnson, A.E., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3(1), 1–9 (2016)
Jones, A.E., Trzeciak, S., Kline, J.A.: The sequential organ failure assessment score for predicting outcome in patients with severe sepsis and evidence of hypoperfusion at the time of emergency department presentation. Crit. Care Med. 37(5), 1649–1654 (2009)
Jubran, A.: Pulse oximetry. Crit. Care 19(1) (2015)
Matos, J., Struja, T., Gallifant, J., Charpignon, M.L., Cardoso, J.S., Celi, L.A.: Shining light on dark skin: pulse oximetry correction models. In: 2023 IEEE 7th Portuguese Meeting on Bioengineering (ENBENG), pp. 211–214. IEEE (2023)
Matos, J., et al.: Bold: blood-gas and oximetry linked dataset. Sci. Data 11(1), 535 (2024)
Moran-Thomas, A.: How a popular medical device encodes racial bias. Boston Rev. 8(5), 2020 (2020)
Obermeyer, Z., Powers, B., Vogeli, C., Mullainathan, S.: Dissecting racial bias in an algorithm used to manage the health of populations. Science 366(6464), 447–453 (2019)
Pollard, T.J., Johnson, A.E., Raffa, J.D., Celi, L.A., Mark, R.G., Badawi, O.: The eICU collaborative research database, a freely available multi-center database for critical care research. Sci. Data 5(1), 1–13 (2018)
Sauer, C.M., Chen, L.C., Hyland, S.L., Girbes, A., Elbers, P., Celi, L.A.: Leveraging electronic health records for data science: common pitfalls and how to avoid them. Lancet Digit. Health 4(12), E893–E898 (2022). https://doi.org/10.1016/S2589-7500(22)00154-6, Open Access, Published: 22 September 2022
Singer, M., et al.: The third international consensus definitions for sepsis and septic shock (sepsis-3). JAMA 315(8), 801–810 (2016)
Sjoding, M.W., Dickson, R.P., Iwashyna, T.J., Gay, S.E., Valley, T.S.: Racial bias in pulse oximetry measurement. N. Engl. J. Med. 383(25), 2477–2478 (2020)
Sun, Y., He, Z., Ren, J., Wu, Y.: Prediction model of in-hospital mortality in intensive care unit patients with cardiac arrest: a retrospective analysis of MIMIC-IV database based on machine learning. BMC Anesthesiol. 23(1), 178 (2023)
Wong, A.K.I., et al.: Analysis of discrepancies between pulse oximetry and arterial oxygen saturation measurements by race and ethnicity and association with organ dysfunction and mortality. JAMA Netw. Open 4(11), e2131674–e2131674 (2021)
Zhang, Y., Hu, J., Hua, T., Zhang, J., Zhang, Z., Yang, M.: Development of a machine learning-based prediction model for sepsis-associated delirium in the intensive care unit. Sci. Rep. 13(1), 12697 (2023)
Acknowledgements
This work has received funding from the Portuguese Foundation for Science and Technology (FCT) through the Ph.D. Grant “2020.06434.BD”.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors have no competing interests in the paper.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Martins, I., Matos, J., Gonçalves, T., Celi, L.A., Wong, AK.I., Cardoso, J.S. (2025). Evaluating the Impact of Pulse Oximetry Bias in Machine Learning Under Counterfactual Thinking. In: Wu, S., Shabestari, B., Xing, L. (eds) Applications of Medical Artificial Intelligence. AMAI 2024. Lecture Notes in Computer Science, vol 15384. Springer, Cham. https://doi.org/10.1007/978-3-031-82007-6_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-82007-6_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-82006-9
Online ISBN: 978-3-031-82007-6
eBook Packages: Computer ScienceComputer Science (R0)