research-article

Training data debugging for the fairness of machine learning software

Authors:

Baowen XuAuthors Info & Claims

ICSE '22: Proceedings of the 44th International Conference on Software Engineering

Pages 2215 - 2227

https://doi.org/10.1145/3510003.3510091

Published: 05 July 2022 Publication History

Abstract

With the widespread application of machine learning (ML) software, especially in high-risk tasks, the concern about their unfairness has been raised towards both developers and users of ML software. The unfairness of ML software indicates the software behavior affected by the sensitive features (e.g., sex), which leads to biased and illegal decisions and has become a worthy problem for the whole software engineering community.

According to the "data-driven" programming paradigm of ML software, we consider the root cause of the unfairness as biased features in training data. Inspired by software debugging, we propose a novel method, Linear-regression based Training Data Debugging (LTDD), to debug feature values in training data, i.e., (a) identify which features and which parts of them are biased, and (b) exclude the biased parts of such features to recover as much valuable and unbiased information as possible to build fair ML software. We conduct an extensive study on nine data sets and three classifiers to evaluate the effect of our method LTDD compared with four baseline methods. Experimental results show that (a) LTDD can better improve the fairness of ML software with less or comparable damage to the performance, and (b) LTDD is more actionable for fairness improvement in realistic scenarios.

References

[1]

1988. Heart Disease Data Set. https://archive.ics.uci.edu/ml/datasets/heart+disease.

[2]

1994. Statlog (German Credit Data) Data Set. https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data).

[3]

1996. Adult Data Set. https://archive.ics.uci.edu/ml/datasets/adult.

[4]

2011. The algorithm that beats your bank manager. https://www.forbes.com/sites/parmyolson/2011/03/15/the-algorithm-that-beats-your-bank-manager/#15da2651ae99.

[5]

2012. Bank Marketing Data Set. https://archive.ics.uci.edu/ml/datasets/Bank+Marketing.

[6]

2014. Student Performance Data Set. https://archive.ics.uci.edu/ml/datasets/Student+Performance.

[7]

2015. MEPS Data Set. https://meps.ahrq.gov/mepsweb.

[8]

2016. Amazon just showed us that unbiased algorithms can be inadvertently racist. https://www.businessinsider.com/how-algorithms-can-be-racist-2016-4.

[9]

2016. default of credit card clients Data Set. https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients.

[10]

2017. compas-analysis. https://github.com/propublica/compas-analysis.

[11]

Aniya Aggarwal, Pranay Lohia, Seema Nagar, Kuntal Dey, and Diptikalyan Saha. 2019. Black box fairness testing of machine learning models. In Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, August 26--30, 2019, Marlon Dumas, Dietmar Pfahl, Sven Apel, and Alessandra Russo (Eds.). ACM, 625--635.

Digital Library

[12]

Sousuke Amasaki and Chris Lokan. 2015. On the effectiveness of weighted moving windows: Experiment on linear regression based software effort estimation. Journal of Software: Evolution and Process 27, 7 (2015), 488--507.

Digital Library

[13]

Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 291--300.

Digital Library

[14]

Solon Barocas and Andrew D Selbst. 2016. Big data's disparate impact. Calif. L. Rev. 104 (2016), 671.

[15]

R. K. E. Bellamy, K. Dey, M. Hind, S. C. Hofman, S. Houde, K. Kannan, P. Lohia, J. Martino, S. Mehta, A. Mojsilovi, S. Nagar, K. Natesan Ramamurthy, J. Richards, D. Saha, P. Sattigeri, M. Singh, K. R. Varshney, and Y. Zhang. 2019. AI Fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias. IBM Journal of Research and Development 63, 4/5 (2019), 4:1--4:15.

[16]

Rachel K. E. Bellamy, Kuntal Dey, Michael Hind, Samuel C. Hofman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilovic, Seema Nagar, Karthikeyan Natesan Ramamurthy, John T. Richards, Diptikalyan Saha, Prasanna Sattigeri, Moninder Singh, Kush R. Varshney, and Yunfeng Zhang. 2018. AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias. CoRR abs/1810.01943 (2018). arXiv:1810.01943 http://arxiv.org/abs/1810.01943

[17]

Richard Berk, Hoda Heidari, Shahin Jabbari, Matthew Joseph, Michael Kearns, Jamie Morgenstern, Seth Neel, and Aaron Roth. 2017. A convex framework for fair regression. arXiv preprint arXiv:1706.02409 (2017).

[18]

Sumon Biswas and Hridesh Rajan. 2020. Do the machine learning models on a crowd sourced platform exhibit bias? an empirical study on model fairness. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 642--653.

Digital Library

[19]

Sumon Biswas and Hridesh Rajan. 2020. Do the machine learning models on a crowd sourced platform exhibit bias? an empirical study on model fairness. In ESEC/FSE '20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, November 8--13, 2020, Prem Devanbu, Myra B. Cohen, and Thomas Zimmermann (Eds.). ACM, 642--653.

Digital Library

[20]

Yuriy Brun and Alexandra Meliou. 2018. Software fairness. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 754--759.

Digital Library

[21]

Toon Calders and Sicco Verwer. 2010. Three naive Bayes approaches for discrimination-free classification. Data Min. Knowl. Discov. 21, 2 (2010), 277--292.

Digital Library

[22]

Joymallya Chakraborty, Suvodeep Majumder, and Tim Menzies. 2021. Bias in machine learning software: why? how? what to do?. In ESEC/FSE '21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, August 23--28, 2021, Diomidis Spinellis, Georgios Gousios, Marsha Chechik, and Massimiliano Di Penta (Eds.). ACM, 429--440.

Digital Library

[23]

Joymallya Chakraborty, Suvodeep Majumder, Zhe Yu, and Tim Menzies. 2020. Fairway: a way to build fair ML software. In ESEC/FSE '20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, November 8--13, 2020, Prem Devanbu, Myra B. Cohen, and Thomas Zimmermann (Eds.). ACM, 654--665.

Digital Library

[24]

Joymallya Chakraborty, Kewen Peng, and Tim Menzies. 2020. Making Fair ML Software using Trustworthy Explanation. In 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020, Melbourne, Australia, September 21--25, 2020. IEEE, 1229--1233.

Digital Library

[25]

Joymallya Chakraborty, Tianpei Xia, Fahmid M Fahid, and Tim Menzies. 2019. Software engineering for fairness: A case study with hyperparameter optimization. arXiv preprint arXiv:1905.05786 (2019).

[26]

L. De Capitani and D. De Martini. 2011. On stochastic orderings of the Wilcoxon Rank Sum test statistic-With applications to reproducibility probability estimation testing. Statistics & Probability Letters 81, 8 (2011), 937--946.

[27]

Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and Removing Disparate Impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, August 10--13, 2015, Longbing Cao, Chengqi Zhang, Thorsten Joachims, Geoffrey I. Webb, Dragos D. Margineantu, and Graham Williams (Eds.). ACM, 259--268.

Digital Library

[28]

Joseph L Gastwirth. 1988. A clarification of some statistical issues in Watson v. Fort Worth Bank and Trust. Jurimetrics J. 29 (1988), 267.

[29]

Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of Opportunity in Supervised Learning. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5--10, 2016, Barcelona, Spain, Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett (Eds.). 3315--3323. https://proceedings.neurips.cc/paper/2016/hash/9d2682367c3935defcb1f9e247a97c0d-Abstract.html

Digital Library

[30]

Georg Heinze and Michael Schemper. 2002. A solution to the problem of separation in logistic regression. Statistics in medicine 21, 16 (2002), 2409--2419.

[31]

Faisal Kamiran and Toon Calders. 2011. Data preprocessing techniques for classification without discrimination. Knowl. Inf. Syst. 33, 1 (2011), 1--33.

Digital Library

[32]

Lukas Kirschner, Ezekiel O. Soremekun, and Andreas Zeller. 2020. Debugging inputs. In ICSE '20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June -- 19 July, 2020, Gregg Rothermel and Doo-Hwan Bae (Eds.). ACM, 75--86.

Digital Library

[33]

Yibin Liu, Yanhui Li, Jianbo Guo, Yuming Zhou, and Baowen Xu. 2018. Connecting software metrics across versions to predict defects. In 25th International Conference on Software Analysis, Evolution and Reengineering, SANER 2018, Campobasso, Italy, March 20--23, 2018, Rocco Oliveto, Massimiliano Di Penta, and David C. Shepherd (Eds.). IEEE Computer Society, 232--243.

[34]

Linghan Meng, Yanhui Li, Lin Chen, Zhi Wang, Di Wu, Yuming Zhou, and Baowen Xu. 2021. Measuring Discrimination to Boost Comparative Testing for Multiple Deep Learning Models. In 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22--30 May 2021. IEEE, 385--396.

Digital Library

[35]

Vivek Nair, Zhe Yu, Tim Menzies, Norbert Siegmund, and Sven Apel. 2018. Finding faster configurations using flash. IEEE Transactions on Software Engineering 46, 7 (2018), 794--811.

[36]

Jeanine Romano, Jeffrey D Kromrey, Jesse Coraggio, Jeff Skowronek, and Linda Devine. 2006. Exploring methods for evaluating group differences on the NSSE and other surveys: Are the t-test and Cohen's d indices the most appropriate choices. In annual meeting of the Southern Association for Institutional Research. Citeseer, 1--51.

[37]

Ricardo Salazar, Felix Neutatz, and Ziawasch Abedjan. 2021. Automated feature engineering for algorithmic fairness. Proceedings of the VLDB Endowment 14, 9 (2021), 1694--1702.

Digital Library

[38]

Qinbao Song, Martin Shepperd, Michelle Cartwright, and Carolyn Mair. 2006. Software defect association mining and defect correction effort prediction. IEEE Transactions on Software Engineering 32, 2 (2006), 69--82.

Digital Library

[39]

Chad D. Sterling and Ronald A. Olsson. 2007. Automated bug isolation via program chipping. Softw. Pract. Exp. 37, 10 (2007), 1061--1086.

[40]

Latanya Sweeney. 2013. Discrimination in online ad delivery. Commun. ACM 56, 5 (2013), 44--54.

Digital Library

[41]

Chakkrit Tantithamthavorn, Ahmed E Hassan, and Kenichi Matsumoto. 2018. The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. IEEE Transactions on Software Engineering 46, 11 (2018), 1200--1219.

[42]

Sakshi Udeshi, Pryanshu Arora, and Sudipta Chattopadhyay. 2018. Automated directed fairness testing. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3--7, 2018, Marianne Huchard, Christian Kästner, and Gordon Fraser (Eds.). ACM, 98--108.

Digital Library

[43]

Weiyuan Wu, Lampros Flokas, Eugene Wu, and Jiannan Wang. 2020. Complaint-driven Training Data Debugging for Query 2.0. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14--19, 2020, David Maier, Rachel Pottinger, AnHai Doan, Wang-Chiew Tan, Abdussalam Alawini, and Hung Q. Ngo (Eds.). ACM, 1317--1334.

Digital Library

[44]

Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi. 2017. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th international conference on world wide web. 1171--1180.

Digital Library

[45]

Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. 2018. Mitigating Unwanted Biases with Adversarial Learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, AIES 2018, New Orleans, LA, USA, February 02--03, 2018, Jason Furman, Gary E. Marchant, Huw Price, and Francesca Rossi (Eds.). ACM, 335--340.

Digital Library

[46]

Jie M. Zhang and Mark Harman. 2021. "Ignorance and Prejudice" in Software Fairness. In 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22--30 May 2021. IEEE, 1436--1447.

Digital Library

[47]

Jie M Zhang, Mark Harman, Lei Ma, and Yang Liu. 2020. Machine learning testing: Survey, landscapes and horizons. IEEE Transactions on Software Engineering (2020).

Digital Library

[48]

Peixin Zhang, Jingyi Wang, Jun Sun, Guoliang Dong, Xinyu Wang, Xingen Wang, Jin Song Dong, and Ting Dai. 2020. White-box fairness testing through adversarial sampling. In ICSE '20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June - 19 July, 2020, Gregg Rothermel and Doo-Hwan Bae (Eds.). ACM, 949--960.

Digital Library

[49]

Yuming Zhou, Hareton Leung, and Baowen Xu. 2009. Examining the potentially confounding effect of class size on the associations between object-oriented metrics and change-proneness. IEEE Transactions on Software Engineering 35, 5 (2009), 607--623.

Digital Library

[50]

Yuming Zhou, Baowen Xu, Hareton Leung, and Lin Chen. 2014. An in-depth study of the potentially confounding effect of class size in fault prediction. ACM Transactions on Software Engineering and Methodology (TOSEM) 23, 1 (2014), 1--51.

Digital Library

Cited By

Pham NPham Ngoc HNguyen-Duc A(2025)Fairness for machine learning software in educationJournal of Systems and Software10.1016/j.jss.2024.112244219:COnline publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.jss.2024.112244
Ma MLi YChen YChen LZhou Y(2025)Why and How We Combine Multiple Deep Learning Models With Functional OverlapsJournal of Software: Evolution and Process10.1002/smr.7000337:2Online publication date: 16-Feb-2025
https://doi.org/10.1002/smr.70003
Jiang LWang X(2024)Dataset Constrution through Ontology-Based Data Requirements AnalysisApplied Sciences10.3390/app1406223714:6(2237)Online publication date: 7-Mar-2024
https://doi.org/10.3390/app14062237
Show More Cited By

Recommendations

Transductive Multilabel Learning via Label Set Propagation

The problem of multilabel classification has attracted great interest in the last decade, where each instance can be assigned with a set of multiple class labels simultaneously. It has a wide variety of real-world applications, e.g., automatic image ...
Airtime Fairness for IEEE 802.11 Multirate Networks

Under a multi rate network scenario, the IEEE 802.11 DCF MAC fails to provide air-time fairness for all competing stations since the protocol is designed for ensuring max-min throughput fairness and the maximum achievable throughput by any station gets ...
Debugging Big Data Analytics in Spark with BigDebug
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

To process massive quantities of data, developers leverage Data-Intensive Scalable Computing (DISC) systems such as Apache Spark. In terms of debugging, DISC systems support only post-mortem log analysis and do not provide any debugging functionality. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICSE '22: Proceedings of the 44th International Conference on Software Engineering

May 2022

2508 pages

ISBN:9781450392211

DOI:10.1145/3510003

General Chair:
Matthew B Dwyer
University of Virginia
,
Program Chairs:
Daniela Damian
University of Victoria, Canada
,
Andreas Zeller
CISPA, Germany

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

In-Cooperation

IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 July 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Conference

ICSE '22

Sponsor:

SIGSOFT

ICSE '22: 44th International Conference on Software Engineering

May 21 - 29, 2022

Pennsylvania, Pittsburgh

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

29
Total Citations
View Citations
745
Total Downloads

Downloads (Last 12 months)224
Downloads (Last 6 weeks)16

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Pham NPham Ngoc HNguyen-Duc A(2025)Fairness for machine learning software in educationJournal of Systems and Software10.1016/j.jss.2024.112244219:COnline publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.jss.2024.112244
Ma MLi YChen YChen LZhou Y(2025)Why and How We Combine Multiple Deep Learning Models With Functional OverlapsJournal of Software: Evolution and Process10.1002/smr.7000337:2Online publication date: 16-Feb-2025
https://doi.org/10.1002/smr.70003
Jiang LWang X(2024)Dataset Constrution through Ontology-Based Data Requirements AnalysisApplied Sciences10.3390/app1406223714:6(2237)Online publication date: 7-Mar-2024
https://doi.org/10.3390/app14062237
Rezaei Nasab ADashti MShahin MZahedi MKhalajzadeh HArora CLiang P(2024)Fairness Concerns in App Reviews: A Study on AI-Based Mobile AppsACM Transactions on Software Engineering and Methodology10.1145/369063334:2(1-30)Online publication date: 29-Aug-2024
https://dl.acm.org/doi/10.1145/3690633
Kabir MAhmed MBegum SBarua SIslam M(2024)Balancing Fairness: Unveiling the Potential of SMOTE-Driven Oversampling in AI Model EnhancementProceedings of the 2024 9th International Conference on Machine Learning Technologies10.1145/3674029.3674034(21-29)Online publication date: 24-May-2024
https://dl.acm.org/doi/10.1145/3674029.3674034
Chen ZZhang JHort MHarman MSarro F(2024)Fairness Testing: A Comprehensive Survey and Analysis of TrendsACM Transactions on Software Engineering and Methodology10.1145/365215533:5(1-59)Online publication date: 4-Jun-2024
https://dl.acm.org/doi/10.1145/3652155
Dasu VKumar ATizpaz-Niari STan GChristakis MPradel M(2024)NeuFair: Neural Network Fairness Repair with DropoutProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680380(1541-1553)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3680380
Gao XJiang WZhai JMa SZhang XShen CChristakis MPradel M(2024)Efficient DNN-Powered Software with Fair Sparse ModelsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680336(983-995)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3680336
Yin YFeng YWeng SYao YLiu JZhao ZChristakis MPradel M(2024)Datactive: Data Fault Localization for Object Detection SystemsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680329(895-907)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3680329
Hort MChen ZZhang JHarman MSarro F(2024)Bias Mitigation for Machine Learning Classifiers: A Comprehensive SurveyACM Journal on Responsible Computing10.1145/36313261:2(1-52)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3631326
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten