Skip to main content

Evaluating the Impact of Pulse Oximetry Bias in Machine Learning Under Counterfactual Thinking

  • Conference paper
  • First Online:
Applications of Medical Artificial Intelligence (AMAI 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15384))

Included in the following conference series:

Abstract

Algorithmic bias in healthcare mirrors existing data biases. However, the factors driving unfairness are not always known. Medical devices capture significant amounts of data but are prone to errors; for instance, pulse oximeters overestimate the arterial oxygen saturation of darker-skinned individuals, leading to worse outcomes. The impact of this bias in machine learning (ML) models remains unclear. This study addresses the technical challenges of quantifying the impact of medical device bias in downstream ML. Our experiments compare a “perfect world”, without pulse oximetry bias, using SaO\(_2\) (blood-gas), to the “actual world”, with biased measurements, using SpO\(_2\) (pulse oximetry). Under this counterfactual design, two models are trained with identical data, features, and settings, except for the method of measuring oxygen saturation: models using SaO\(_2\) are a “control” and models using SpO\(_2\) a “treatment”. The blood-gas oximetry linked dataset was a suitable test-bed, containing 163,396 nearly-simultaneous SpO\(_2\) - SaO\(_2\) paired measurements, aligned with a wide array of clinical features and outcomes. We studied three classification tasks: in-hospital mortality, respiratory SOFA score in the next 24 h, and SOFA score increase by two points. Models using SaO\(_2\) instead of SpO\(_2\) generally showed better performance. Patients with overestimation of O\(_2\) by pulse oximetry of \(\ge \) 3% had significant decreases in mortality prediction recall, from 0.63 to 0.59, P < 0.001. This mirrors clinical processes where biased pulse oximetry readings provide clinicians with false reassurance of patients’ oxygen levels. A similar degradation happened in ML models, with pulse oximetry biases leading to more false negatives in predicting adverse outcomes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/InesAMar/PulseOxBias.

References

  1. Laboratory reference ranges in healthy adults. https://emedicine.medscape.com/article/2172316-overview?form=fpf. Accessed 12 June 2024

  2. Balagopalan, A., et al.: Machine learning for healthcare that matters: reorienting from technical novelty to equitable impact. PLOS Digit. Health 3(4), e0000474 (2024)

    Article  Google Scholar 

  3. Bhavani, S.V., Wiley, Z., Verhoef, P.A., Coopersmith, C.M., Ofotokun, I.: Racial differences in detection of fever using temporal vs oral temperature measurements in hospitalized patients. JAMA 328(9), 885–886 (2022)

    Article  Google Scholar 

  4. Charpignon, M.L., et al.: Critical bias in critical care devices. Crit. Care Clin. 39(4), 795–813 (2023)

    Article  MATH  Google Scholar 

  5. Dempsey, K., Lindsay, M., Tcheng, J.E., Wong, A.K.I.: The high price of equity in pulse oximetry: a cost evaluation and need for interim solutions. medRxiv (2023)

    Google Scholar 

  6. Fawzy, A., et al.: Racial and ethnic discrepancy in pulse oximetry and delayed identification of treatment eligibility among patients with COVID-19. JAMA Intern. Med. 182(7), 730–738 (2022)

    Article  MATH  Google Scholar 

  7. Hao, S., et al.: Utility of skin tone on pulse oximetry in critically ill patients: a prospective cohort study. medRxiv (2024)

    Google Scholar 

  8. Hempel, L., Sadeghi, S., Kirsten, T.: Prediction of intensive care unit length of stay in the MIMIC-IV dataset. Appl. Sci. 13(12), 6930 (2023)

    Article  MATH  Google Scholar 

  9. Johnson, A.E., et al.: MIMIC-IV, a freely accessible electronic health record dataset. Sci. Data 10(1), 1 (2023)

    Article  MATH  Google Scholar 

  10. Johnson, A.E., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3(1), 1–9 (2016)

    Google Scholar 

  11. Jones, A.E., Trzeciak, S., Kline, J.A.: The sequential organ failure assessment score for predicting outcome in patients with severe sepsis and evidence of hypoperfusion at the time of emergency department presentation. Crit. Care Med. 37(5), 1649–1654 (2009)

    Article  Google Scholar 

  12. Jubran, A.: Pulse oximetry. Crit. Care 19(1) (2015)

    Google Scholar 

  13. Matos, J., Struja, T., Gallifant, J., Charpignon, M.L., Cardoso, J.S., Celi, L.A.: Shining light on dark skin: pulse oximetry correction models. In: 2023 IEEE 7th Portuguese Meeting on Bioengineering (ENBENG), pp. 211–214. IEEE (2023)

    Google Scholar 

  14. Matos, J., et al.: Bold: blood-gas and oximetry linked dataset. Sci. Data 11(1), 535 (2024)

    Google Scholar 

  15. Moran-Thomas, A.: How a popular medical device encodes racial bias. Boston Rev. 8(5), 2020 (2020)

    MATH  Google Scholar 

  16. Obermeyer, Z., Powers, B., Vogeli, C., Mullainathan, S.: Dissecting racial bias in an algorithm used to manage the health of populations. Science 366(6464), 447–453 (2019)

    Article  MATH  Google Scholar 

  17. Pollard, T.J., Johnson, A.E., Raffa, J.D., Celi, L.A., Mark, R.G., Badawi, O.: The eICU collaborative research database, a freely available multi-center database for critical care research. Sci. Data 5(1), 1–13 (2018)

    Article  Google Scholar 

  18. Sauer, C.M., Chen, L.C., Hyland, S.L., Girbes, A., Elbers, P., Celi, L.A.: Leveraging electronic health records for data science: common pitfalls and how to avoid them. Lancet Digit. Health 4(12), E893–E898 (2022). https://doi.org/10.1016/S2589-7500(22)00154-6, Open Access, Published: 22 September 2022

  19. Singer, M., et al.: The third international consensus definitions for sepsis and septic shock (sepsis-3). JAMA 315(8), 801–810 (2016)

    Article  MATH  Google Scholar 

  20. Sjoding, M.W., Dickson, R.P., Iwashyna, T.J., Gay, S.E., Valley, T.S.: Racial bias in pulse oximetry measurement. N. Engl. J. Med. 383(25), 2477–2478 (2020)

    Article  Google Scholar 

  21. Sun, Y., He, Z., Ren, J., Wu, Y.: Prediction model of in-hospital mortality in intensive care unit patients with cardiac arrest: a retrospective analysis of MIMIC-IV database based on machine learning. BMC Anesthesiol. 23(1), 178 (2023)

    Article  MATH  Google Scholar 

  22. Wong, A.K.I., et al.: Analysis of discrepancies between pulse oximetry and arterial oxygen saturation measurements by race and ethnicity and association with organ dysfunction and mortality. JAMA Netw. Open 4(11), e2131674–e2131674 (2021)

    Article  Google Scholar 

  23. Zhang, Y., Hu, J., Hua, T., Zhang, J., Zhang, Z., Yang, M.: Development of a machine learning-based prediction model for sepsis-associated delirium in the intensive care unit. Sci. Rep. 13(1), 12697 (2023)

    Article  MATH  Google Scholar 

Download references

Acknowledgements

This work has received funding from the Portuguese Foundation for Science and Technology (FCT) through the Ph.D. Grant “2020.06434.BD”.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Inês Martins or Jaime S. Cardoso .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors have no competing interests in the paper.

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Martins, I., Matos, J., Gonçalves, T., Celi, L.A., Wong, AK.I., Cardoso, J.S. (2025). Evaluating the Impact of Pulse Oximetry Bias in Machine Learning Under Counterfactual Thinking. In: Wu, S., Shabestari, B., Xing, L. (eds) Applications of Medical Artificial Intelligence. AMAI 2024. Lecture Notes in Computer Science, vol 15384. Springer, Cham. https://doi.org/10.1007/978-3-031-82007-6_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-82007-6_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-82006-9

  • Online ISBN: 978-3-031-82007-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics