FairLens: Auditing black-box clinical decision support systems

https://doi.org/10.1016/j.ipm.2021.102657Get rights and content
Under a Creative Commons license
open access

Highlights

  • We present a pipeline to detect and explain potential fairness issues in Clinical DSS

  • We study and compare different multi-label classification disparity measures

  • We explore ICD9 bias in MIMIC-IV, an openly available ICU benchmark dataset

Abstract

The pervasive application of algorithmic decision-making is raising concerns on the risk of unintended bias in AI systems deployed in critical settings such as healthcare. The detection and mitigation of model bias is a very delicate task that should be tackled with care and involving domain experts in the loop. In this paper we introduce FairLens, a methodology for discovering and explaining biases. We show how this tool can audit a fictional commercial black-box model acting as a clinical decision support system (DSS). In this scenario, the healthcare facility experts can use FairLens on their historical data to discover the biases of the model before incorporating it into the clinical decision flow. FairLens first stratifies the available patient data according to demographic attributes such as age, ethnicity, gender and healthcare insurance; it then assesses the model performance on such groups highlighting the most common misclassifications. Finally, FairLens allows the expert to examine one misclassification of interest by explaining which elements of the affected patients’ clinical history drive the model error in the problematic group. We validate FairLens’ ability to highlight bias in multilabel clinical DSSs introducing a multilabel-appropriate metric of disparity and proving its efficacy against other standard metrics.

Keywords

Clinical decision support systems
Fairness and bias in machine learning systems
eXplainable artificial intelligence

Cited by (0)

DP and CP acknowledge funding from the European Union’s Horizon 2020 Excellent Science - European Research Council (ERC) programme under grant n. 834756 ”XAI - Science and technology of eXplainable AI decision making” and partial support from the European Union’s Horizon 2020 research and innovation programme under grant n. 952026 ”HumanE AI Network”, grant n. 871042 ”SoBigData++: European Integrated Infrastructure for Social Mining and Big Data Analytics” and grant n. 952215 ”TAILOR - Foundations of Trustworthy AI - Integrating Reasoning, Learning and Optimization”. PB, AP and AP acknowledge partial support from Research Project ”Casa Nel Parco” (POR FESR14/20 - CANP - Cod. 320 - 16 - Piattaforma Tecnologica ”Salute e Benessere”) funded by Regione Piemonte in the context of the Regional Platform on Health and Wellbeing, Italy and from Intesa Sanpaolo Innovation Center, Italy . The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.