Towards Assessing Data Bias in Clinical Trials

Criscuolo, Chiara; Dolci, Tommaso; Salnitri, Mattia

doi:10.1007/978-3-031-23905-2_5

Chiara Criscuolo¹⁶,
Tommaso Dolci¹⁶ &
Mattia Salnitri¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13814))

Included in the following conference series:

VLDB Workshop on Data Management and Analytics for Medicine and Healthcare
VLDB Workshop on Polystore Systems for Heterogeneous Data in Multiple Databases with Privacy and Security Assurances

382 Accesses

Abstract

Algorithms and technologies are essential tools that pervade all aspects of our daily lives. In the last decades, health care research benefited from new computer-based recruiting methods, the use of federated architectures for data storage, the introduction of innovative analyses of datasets, and so on. Nevertheless, health care datasets can still be affected by data bias. Due to data bias, they provide a distorted view of reality, leading to wrong analysis results and, consequently, decisions. For example, in a clinical trial that studied the risk of cardiovascular diseases, predictions were wrong due to the lack of data on ethnic minorities. It is, therefore, of paramount importance for researchers to acknowledge data bias that may be present in the datasets they use, eventually adopt techniques to mitigate them and control if and how analyses results are impacted.

This paper proposes a method to address bias in datasets that: (i) defines the types of data bias that may be present in the dataset, (ii) characterizes and quantifies data bias with adequate metrics, (iii) provides guidelines to identify, measure, and mitigate data bias for different data sources. The method we propose is applicable both for prospective and retrospective clinical trials. We evaluate our proposal both through theoretical considerations and through interviews with researchers in the health care environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Reporting and Transparency in Big Data: The Nexus of Ethics and Methodology

Clinical Trial Registries, Results Databases, and Research Data Repositories

References

Adebayo, J.A., et al.: FairML: toolbox for diagnosing bias in predictive modeling. Ph.D. thesis, Massachusetts Institute of Technology (2016)
Google Scholar
Angwin, J., Larson, J., Mattu, S., Kirchner, L.: Machine bias. In: Ethics of Data and Analytics, pp. 254–264. Auerbach Publications (2016)
Google Scholar
Asudeh, A., Jin, Z., Jagadish, H.: Assessing and remedying coverage for a given dataset. In: 2019 IEEE 35th International Conference on Data Engineering, pp. 554–565. IEEE (2019)
Google Scholar
Asudeh, A., Shahbazi, N., Jin, Z., Jagadish, H.: Identifying insufficient data coverage for ordinal continuous-valued attributes. In: Proceedings of International Conference on Management of Data, pp. 129–141 (2021)
Google Scholar
Balayn, A., Lofi, C., Houben, G.-J.: Managing bias and unfairness in data for decision support: a survey of machine learning and data engineering approaches to identify and mitigate bias and unfairness within data management and analytics systems. VLDB J. 30(5), 739–768 (2021). https://doi.org/10.1007/s00778-021-00671-8
Article Google Scholar
Batini, C., Cappiello, C., Francalanci, C., Maurino, A.: Methodologies for data quality assessment and improvement. ACM Comput. Surv. 41(3), 1–52 (2009)
Article Google Scholar
Batini, C., Scannapieco, M.: Data and Information Quality. DSA, Springer, Cham (2016). https://doi.org/10.1007/978-3-319-24106-7
Book MATH Google Scholar
Beam, A.L., Kohane, I.S.: Big data and machine learning in health care. Jama 319(13), 1317–1318 (2018)
Article Google Scholar
Bellamy, R.K., et al.: Ai fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias. IBM J. Res. Dev. 63(4/5), 1–4 (2019)
Google Scholar
Char, D.S., Shah, N.H., Magnus, D.: Implementing machine learning in health care—addressing ethical challenges. N. Engl. J. Med. 378(11), 981 (2018)
Google Scholar
Cohen, I.G., Amarasingham, R., Shah, A., Xie, B., Lo, B.: The legal and ethical concerns that arise from using complex predictive analytics in health care. Health Affairs 33(7), 1139–1147 (2014)
Article Google Scholar
Drosou, M., Jagadish, H.V., Pitoura, E., Stoyanovich, J.: Diversity in big data: a review. Big Data 5(2), 73–84 (2017)
Article Google Scholar
Esteva, A., et al.: Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639), 115–118 (2017)
Google Scholar
Gerhard, T.: Bias: considerations for research practice. Am. J. Health Syst. Pharm. 65(22), 2159–2168 (2008)
Article Google Scholar
Grote, T., Berens, P.: On the ethics of algorithmic decision-making in healthcare. J. Med. Ethics 46(3), 205–211 (2020)
Article Google Scholar
Gulshan, V., et al.: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. Jama 316(22), 2402–2410 (2016)
Article Google Scholar
Holzinger, A., Langs, G., Denk, H., Zatloukal, K., Müller, H.: Causability and explainability of artificial intelligence in medicine. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 9(4), e1312 (2019)
Google Scholar
Holzinger, A., Plass, M., Holzinger, K., Crisan, G.C., Pintea, C.M., Palade, V.: A glass-box interactive machine learning approach for solving np-hard problems with the human-in-the-loop. arXiv preprint arXiv:1708.01104 (2017)
Ibrahim, J.G., Chen, M.H., Lipsitz, S.R., Herring, A.H.: Missing-data methods for generalized linear models: a comparative review. J. Am. Stat. Assoc. 100(469), 332–346 (2005)
Article MATH Google Scholar
Ibrahim, J.G., Chu, H., Chen, M.H.: Missing data in clinical studies: issues and methods. J. Clin. Oncol. 30(26), 3297 (2012)
Google Scholar
Knoppers, B.M.: International ethics harmonization and the global alliance for genomics and health. Genome Med. 6(2), 1–3 (2014)
Article Google Scholar
Krause, J., et al.: Grader variability and the importance of reference standards for evaluating machine learning models for diabetic retinopathy. Ophthalmology 125(8), 1264–1272 (2018)
Article Google Scholar
Lambrecht, A., Tucker, C.: Algorithmic bias? An empirical study of apparent gender-based discrimination in the display of stem career ads. Manag. Sci. 65(7), 2966–2981 (2019)
Article Google Scholar
Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data, vol. 793. John Wiley & Sons, Hoboken (2019)
Google Scholar
Manrai, A.K., et al.: Genetic misdiagnoses and the potential for health disparities. N. Engl. J. Med. 375(7), 655–665 (2016)
Article Google Scholar
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., Galstyan, A.: A survey on bias and fairness in machine learning. ACM Comput. Surv. 54(6), 1–35 (2021)
Article Google Scholar
Naumann, F., Freytag, J.C., Leser, U.: Completeness of integrated information sources. Inf. Syst. 29(7), 583–615 (2004)
Article Google Scholar
van Ommen, G.J.B., et al.: BBMRI-ERIC as a resource for pharmaceutical and life science industries: the development of biobank-based expert centres. Eur. J. Hum. Genetics 23(7), 893–900 (2015)
Article Google Scholar
Papakyriakopoulos, O., Mboya, A.M.: Beyond algorithmic bias: a socio-computational interrogation of the google search by image algorithm. Soc. Sci. Comput. Rev. (2021). https://doi.org/10.1177/08944393211073169
Pitoura, E.: Social-minded measures of data quality: fairness, diversity, and lack of bias. J. Data Inf. Qual. 12(3), 1–8 (2020)
Article Google Scholar
Rajkomar, A., Hardt, M., Howell, M.D., Corrado, G., Chin, M.H.: Ensuring fairness in machine learning to advance health equity. Ann. Internal Med. 169(12), 866–872 (2018)
Article Google Scholar
Saxena, N.A., Huang, K., DeFilippis, E., Radanovic, G., Parkes, D.C., Liu, Y.: How do fairness definitions fare? Testing public attitudes towards three algorithmic definitions of fairness in loan allocations. Artif. Intell. 283, 103238 (2020)
Google Scholar
Stoyanovich, J., Abiteboul, S., Miklau, G.: Data, responsibly: fairness, neutrality and transparency in data analysis. In: International Conference on Extending Database Technology (2016)
Google Scholar
Stoyanovich, J., Howe, B.: Nutritional labels for data and models. IEEE Data Eng. Bull. 42(3), 13–23 (2019)
Google Scholar
Tillin, T., et al.: Ethnicity and prediction of cardiovascular disease: performance of qrisk2 and Framingham scores in a UK tri-ethnic prospective cohort study (sabre—southall and brent revisited). Heart 100(1), 60–67 (2014)
Google Scholar
Topol, E.J.: High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25(1), 44–56 (2019)
Article Google Scholar
Tramer, F., et al.: Fairtest: discovering unwarranted associations in data-driven applications. In: IEEE European Symposium on Security and Privacy, pp. 401–416 (2017)
Google Scholar
Verma, S., Rubin, J.: Fairness definitions explained. In: 2018 IEEE/ACM International Workshop on Software Fairness (fairware), pp. 1–7 (2018)
Google Scholar
Wapner, J.: Cancer scientists have ignored African DNA in the search for cures. Newsweek Magazine (July 2018). https://www.newsweek.com/2018/07/27/cancer-cure-genome-cancer-treatment-africa-genetic-charles-rotimi-dna-human-1024630.html. Accessed 23 June 2022
Zaki, M.J., Meira Jr, W.: Data Mining and Machine Learning: Fundamental Concepts and Algorithms. Cambridge University Press, Cambridge (2020)
Google Scholar

Download references

Acknowledgment

This work has been partially supported by the Health Big Data Project (CCR-2018-23669122), funded by the Italian Ministry of Economy and Finance and coordinated by the Italian Ministry of Health and the network Alleanza Contro il Cancro.

Author information

Authors and Affiliations

Politecnico di Milano, Piazza Leonardo da Vinci, 32, 20133, Milan, Italy
Chiara Criscuolo, Tommaso Dolci & Mattia Salnitri

Authors

Chiara Criscuolo
View author publications
You can also search for this author in PubMed Google Scholar
Tommaso Dolci
View author publications
You can also search for this author in PubMed Google Scholar
Mattia Salnitri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chiara Criscuolo .

Editor information

Editors and Affiliations

Massachusetts Institute of Technology, Cambridge, MA, USA
El Kindi Rezig
Massachusetts Institute of Technology, Lexington, KY, USA
Vijay Gadepally
Intel Corporation, Portland, OR, USA
Timothy Mattson
Massachusetts Institute of Technology, Cambridge, MA, USA
Michael Stonebraker
Massachusetts Institute of Technology, Cambridge, MA, USA
Tim Kraska
Georgia State University, Atlanta, GA, USA
Jun Kong
University of Washington, Seattle, WA, USA
Gang Luo
Shandong University, Qingdao, China
Dejun Teng
Stony Brook University, Stony Brook, NY, USA
Fusheng Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Criscuolo, C., Dolci, T., Salnitri, M. (2022). Towards Assessing Data Bias in Clinical Trials. In: Rezig, E.K., et al. Heterogeneous Data Management, Polystores, and Analytics for Healthcare. DMAH Poly 2022 2022. Lecture Notes in Computer Science, vol 13814. Springer, Cham. https://doi.org/10.1007/978-3-031-23905-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-23905-2_5
Published: 21 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23904-5
Online ISBN: 978-3-031-23905-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards Assessing Data Bias in Clinical Trials

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Reporting and Transparency in Big Data: The Nexus of Ethics and Methodology

Clinical Trial Registries, Results Databases, and Research Data Repositories

Clinical Trial Registries, Results Databases, and Research Data Repositories

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us