Evaluating the Impact of Pulse Oximetry Bias in Machine Learning Under Counterfactual Thinking

Martins, Inês; Matos, João; Gonçalves, Tiago; Celi, Leo A.; Wong, An-Kwok Ian; Cardoso, Jaime S.

doi:10.1007/978-3-031-82007-6_21

Inês Martins^10,11,
João Matos^12,13,
Tiago Gonçalves^10,11,
Leo A. Celi¹³,
An-Kwok Ian Wong¹² &
…
Jaime S. Cardoso^10,11

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15384))

Included in the following conference series:

International Workshop on Applications of Medical AI

Abstract

Algorithmic bias in healthcare mirrors existing data biases. However, the factors driving unfairness are not always known. Medical devices capture significant amounts of data but are prone to errors; for instance, pulse oximeters overestimate the arterial oxygen saturation of darker-skinned individuals, leading to worse outcomes. The impact of this bias in machine learning (ML) models remains unclear. This study addresses the technical challenges of quantifying the impact of medical device bias in downstream ML. Our experiments compare a “perfect world”, without pulse oximetry bias, using SaO$_2$ (blood-gas), to the “actual world”, with biased measurements, using SpO$_2$ (pulse oximetry). Under this counterfactual design, two models are trained with identical data, features, and settings, except for the method of measuring oxygen saturation: models using SaO$_2$ are a “control” and models using SpO$_2$ a “treatment”. The blood-gas oximetry linked dataset was a suitable test-bed, containing 163,396 nearly-simultaneous SpO$_2$ - SaO$_2$ paired measurements, aligned with a wide array of clinical features and outcomes. We studied three classification tasks: in-hospital mortality, respiratory SOFA score in the next 24 h, and SOFA score increase by two points. Models using SaO$_2$ instead of SpO$_2$ generally showed better performance. Patients with overestimation of O$_2$ by pulse oximetry of $\ge $ 3% had significant decreases in mortality prediction recall, from 0.63 to 0.59, P < 0.001. This mirrors clinical processes where biased pulse oximetry readings provide clinicians with false reassurance of patients’ oxygen levels. A similar degradation happened in ML models, with pulse oximetry biases leading to more false negatives in predicting adverse outcomes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/InesAMar/PulseOxBias.

References

Laboratory reference ranges in healthy adults. https://emedicine.medscape.com/article/2172316-overview?form=fpf. Accessed 12 June 2024
Balagopalan, A., et al.: Machine learning for healthcare that matters: reorienting from technical novelty to equitable impact. PLOS Digit. Health 3(4), e0000474 (2024)
Article Google Scholar
Bhavani, S.V., Wiley, Z., Verhoef, P.A., Coopersmith, C.M., Ofotokun, I.: Racial differences in detection of fever using temporal vs oral temperature measurements in hospitalized patients. JAMA 328(9), 885–886 (2022)
Article Google Scholar
Charpignon, M.L., et al.: Critical bias in critical care devices. Crit. Care Clin. 39(4), 795–813 (2023)
Article MATH Google Scholar
Dempsey, K., Lindsay, M., Tcheng, J.E., Wong, A.K.I.: The high price of equity in pulse oximetry: a cost evaluation and need for interim solutions. medRxiv (2023)
Google Scholar
Fawzy, A., et al.: Racial and ethnic discrepancy in pulse oximetry and delayed identification of treatment eligibility among patients with COVID-19. JAMA Intern. Med. 182(7), 730–738 (2022)
Article MATH Google Scholar
Hao, S., et al.: Utility of skin tone on pulse oximetry in critically ill patients: a prospective cohort study. medRxiv (2024)
Google Scholar
Hempel, L., Sadeghi, S., Kirsten, T.: Prediction of intensive care unit length of stay in the MIMIC-IV dataset. Appl. Sci. 13(12), 6930 (2023)
Article MATH Google Scholar
Johnson, A.E., et al.: MIMIC-IV, a freely accessible electronic health record dataset. Sci. Data 10(1), 1 (2023)
Article MATH Google Scholar
Johnson, A.E., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3(1), 1–9 (2016)
Google Scholar
Jones, A.E., Trzeciak, S., Kline, J.A.: The sequential organ failure assessment score for predicting outcome in patients with severe sepsis and evidence of hypoperfusion at the time of emergency department presentation. Crit. Care Med. 37(5), 1649–1654 (2009)
Article Google Scholar
Jubran, A.: Pulse oximetry. Crit. Care 19(1) (2015)
Google Scholar
Matos, J., Struja, T., Gallifant, J., Charpignon, M.L., Cardoso, J.S., Celi, L.A.: Shining light on dark skin: pulse oximetry correction models. In: 2023 IEEE 7th Portuguese Meeting on Bioengineering (ENBENG), pp. 211–214. IEEE (2023)
Google Scholar
Matos, J., et al.: Bold: blood-gas and oximetry linked dataset. Sci. Data 11(1), 535 (2024)
Google Scholar
Moran-Thomas, A.: How a popular medical device encodes racial bias. Boston Rev. 8(5), 2020 (2020)
MATH Google Scholar
Obermeyer, Z., Powers, B., Vogeli, C., Mullainathan, S.: Dissecting racial bias in an algorithm used to manage the health of populations. Science 366(6464), 447–453 (2019)
Article MATH Google Scholar
Pollard, T.J., Johnson, A.E., Raffa, J.D., Celi, L.A., Mark, R.G., Badawi, O.: The eICU collaborative research database, a freely available multi-center database for critical care research. Sci. Data 5(1), 1–13 (2018)
Article Google Scholar
Sauer, C.M., Chen, L.C., Hyland, S.L., Girbes, A., Elbers, P., Celi, L.A.: Leveraging electronic health records for data science: common pitfalls and how to avoid them. Lancet Digit. Health 4(12), E893–E898 (2022). https://doi.org/10.1016/S2589-7500(22)00154-6, Open Access, Published: 22 September 2022
Singer, M., et al.: The third international consensus definitions for sepsis and septic shock (sepsis-3). JAMA 315(8), 801–810 (2016)
Article MATH Google Scholar
Sjoding, M.W., Dickson, R.P., Iwashyna, T.J., Gay, S.E., Valley, T.S.: Racial bias in pulse oximetry measurement. N. Engl. J. Med. 383(25), 2477–2478 (2020)
Article Google Scholar
Sun, Y., He, Z., Ren, J., Wu, Y.: Prediction model of in-hospital mortality in intensive care unit patients with cardiac arrest: a retrospective analysis of MIMIC-IV database based on machine learning. BMC Anesthesiol. 23(1), 178 (2023)
Article MATH Google Scholar
Wong, A.K.I., et al.: Analysis of discrepancies between pulse oximetry and arterial oxygen saturation measurements by race and ethnicity and association with organ dysfunction and mortality. JAMA Netw. Open 4(11), e2131674–e2131674 (2021)
Article Google Scholar
Zhang, Y., Hu, J., Hua, T., Zhang, J., Zhang, Z., Yang, M.: Development of a machine learning-based prediction model for sepsis-associated delirium in the intensive care unit. Sci. Rep. 13(1), 12697 (2023)
Article MATH Google Scholar

Download references

Acknowledgements

This work has received funding from the Portuguese Foundation for Science and Technology (FCT) through the Ph.D. Grant “2020.06434.BD”.

Author information

Authors and Affiliations

Faculty of Engineering, University of Porto, Porto, Portugal
Inês Martins, Tiago Gonçalves & Jaime S. Cardoso
Institute for Systems and Computer Engineering, Technology and Science, Porto, Portugal
Inês Martins, Tiago Gonçalves & Jaime S. Cardoso
Duke University, Durham, USA
João Matos & An-Kwok Ian Wong
Massachusetts Institute of Technology, Cambridge, USA
João Matos & Leo A. Celi

Authors

Inês Martins
View author publications
You can also search for this author in PubMed Google Scholar
João Matos
View author publications
You can also search for this author in PubMed Google Scholar
Tiago Gonçalves
View author publications
You can also search for this author in PubMed Google Scholar
Leo A. Celi
View author publications
You can also search for this author in PubMed Google Scholar
An-Kwok Ian Wong
View author publications
You can also search for this author in PubMed Google Scholar
Jaime S. Cardoso
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Inês Martins or Jaime S. Cardoso .

Editor information

Editors and Affiliations

University of Pittsburgh, Pittsburgh, PA, USA
Shandong Wu
National Institute of Biomedical Imaging and Bioengineering, Bethesda, MD, USA
Behrouz Shabestari
Stanford University, Stanford, CA, USA
Lei Xing

Ethics declarations

Disclosure of Interests

The authors have no competing interests in the paper.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martins, I., Matos, J., Gonçalves, T., Celi, L.A., Wong, AK.I., Cardoso, J.S. (2025). Evaluating the Impact of Pulse Oximetry Bias in Machine Learning Under Counterfactual Thinking. In: Wu, S., Shabestari, B., Xing, L. (eds) Applications of Medical Artificial Intelligence. AMAI 2024. Lecture Notes in Computer Science, vol 15384. Springer, Cham. https://doi.org/10.1007/978-3-031-82007-6_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-82007-6_21
Published: 08 February 2025
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-82006-9
Online ISBN: 978-3-031-82007-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Evaluating the Impact of Pulse Oximetry Bias in Machine Learning Under Counterfactual Thinking