skip to main content
10.1145/3510003.3510091acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Training data debugging for the fairness of machine learning software

Published: 05 July 2022 Publication History

Abstract

With the widespread application of machine learning (ML) software, especially in high-risk tasks, the concern about their unfairness has been raised towards both developers and users of ML software. The unfairness of ML software indicates the software behavior affected by the sensitive features (e.g., sex), which leads to biased and illegal decisions and has become a worthy problem for the whole software engineering community.
According to the "data-driven" programming paradigm of ML software, we consider the root cause of the unfairness as biased features in training data. Inspired by software debugging, we propose a novel method, Linear-regression based Training Data Debugging (LTDD), to debug feature values in training data, i.e., (a) identify which features and which parts of them are biased, and (b) exclude the biased parts of such features to recover as much valuable and unbiased information as possible to build fair ML software. We conduct an extensive study on nine data sets and three classifiers to evaluate the effect of our method LTDD compared with four baseline methods. Experimental results show that (a) LTDD can better improve the fairness of ML software with less or comparable damage to the performance, and (b) LTDD is more actionable for fairness improvement in realistic scenarios.

References

[1]
1988. Heart Disease Data Set. https://archive.ics.uci.edu/ml/datasets/heart+disease.
[2]
1994. Statlog (German Credit Data) Data Set. https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data).
[3]
1996. Adult Data Set. https://archive.ics.uci.edu/ml/datasets/adult.
[4]
2011. The algorithm that beats your bank manager. https://www.forbes.com/sites/parmyolson/2011/03/15/the-algorithm-that-beats-your-bank-manager/#15da2651ae99.
[5]
2012. Bank Marketing Data Set. https://archive.ics.uci.edu/ml/datasets/Bank+Marketing.
[6]
2014. Student Performance Data Set. https://archive.ics.uci.edu/ml/datasets/Student+Performance.
[7]
2015. MEPS Data Set. https://meps.ahrq.gov/mepsweb.
[8]
2016. Amazon just showed us that unbiased algorithms can be inadvertently racist. https://www.businessinsider.com/how-algorithms-can-be-racist-2016-4.
[9]
2016. default of credit card clients Data Set. https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients.
[10]
2017. compas-analysis. https://github.com/propublica/compas-analysis.
[11]
Aniya Aggarwal, Pranay Lohia, Seema Nagar, Kuntal Dey, and Diptikalyan Saha. 2019. Black box fairness testing of machine learning models. In Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, August 26--30, 2019, Marlon Dumas, Dietmar Pfahl, Sven Apel, and Alessandra Russo (Eds.). ACM, 625--635.
[12]
Sousuke Amasaki and Chris Lokan. 2015. On the effectiveness of weighted moving windows: Experiment on linear regression based software effort estimation. Journal of Software: Evolution and Process 27, 7 (2015), 488--507.
[13]
Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 291--300.
[14]
Solon Barocas and Andrew D Selbst. 2016. Big data's disparate impact. Calif. L. Rev. 104 (2016), 671.
[15]
R. K. E. Bellamy, K. Dey, M. Hind, S. C. Hofman, S. Houde, K. Kannan, P. Lohia, J. Martino, S. Mehta, A. Mojsilovi, S. Nagar, K. Natesan Ramamurthy, J. Richards, D. Saha, P. Sattigeri, M. Singh, K. R. Varshney, and Y. Zhang. 2019. AI Fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias. IBM Journal of Research and Development 63, 4/5 (2019), 4:1--4:15.
[16]
Rachel K. E. Bellamy, Kuntal Dey, Michael Hind, Samuel C. Hofman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilovic, Seema Nagar, Karthikeyan Natesan Ramamurthy, John T. Richards, Diptikalyan Saha, Prasanna Sattigeri, Moninder Singh, Kush R. Varshney, and Yunfeng Zhang. 2018. AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias. CoRR abs/1810.01943 (2018). arXiv:1810.01943 http://arxiv.org/abs/1810.01943
[17]
Richard Berk, Hoda Heidari, Shahin Jabbari, Matthew Joseph, Michael Kearns, Jamie Morgenstern, Seth Neel, and Aaron Roth. 2017. A convex framework for fair regression. arXiv preprint arXiv:1706.02409 (2017).
[18]
Sumon Biswas and Hridesh Rajan. 2020. Do the machine learning models on a crowd sourced platform exhibit bias? an empirical study on model fairness. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 642--653.
[19]
Sumon Biswas and Hridesh Rajan. 2020. Do the machine learning models on a crowd sourced platform exhibit bias? an empirical study on model fairness. In ESEC/FSE '20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, November 8--13, 2020, Prem Devanbu, Myra B. Cohen, and Thomas Zimmermann (Eds.). ACM, 642--653.
[20]
Yuriy Brun and Alexandra Meliou. 2018. Software fairness. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 754--759.
[21]
Toon Calders and Sicco Verwer. 2010. Three naive Bayes approaches for discrimination-free classification. Data Min. Knowl. Discov. 21, 2 (2010), 277--292.
[22]
Joymallya Chakraborty, Suvodeep Majumder, and Tim Menzies. 2021. Bias in machine learning software: why? how? what to do?. In ESEC/FSE '21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, August 23--28, 2021, Diomidis Spinellis, Georgios Gousios, Marsha Chechik, and Massimiliano Di Penta (Eds.). ACM, 429--440.
[23]
Joymallya Chakraborty, Suvodeep Majumder, Zhe Yu, and Tim Menzies. 2020. Fairway: a way to build fair ML software. In ESEC/FSE '20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, November 8--13, 2020, Prem Devanbu, Myra B. Cohen, and Thomas Zimmermann (Eds.). ACM, 654--665.
[24]
Joymallya Chakraborty, Kewen Peng, and Tim Menzies. 2020. Making Fair ML Software using Trustworthy Explanation. In 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020, Melbourne, Australia, September 21--25, 2020. IEEE, 1229--1233.
[25]
Joymallya Chakraborty, Tianpei Xia, Fahmid M Fahid, and Tim Menzies. 2019. Software engineering for fairness: A case study with hyperparameter optimization. arXiv preprint arXiv:1905.05786 (2019).
[26]
L. De Capitani and D. De Martini. 2011. On stochastic orderings of the Wilcoxon Rank Sum test statistic-With applications to reproducibility probability estimation testing. Statistics & Probability Letters 81, 8 (2011), 937--946.
[27]
Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and Removing Disparate Impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, August 10--13, 2015, Longbing Cao, Chengqi Zhang, Thorsten Joachims, Geoffrey I. Webb, Dragos D. Margineantu, and Graham Williams (Eds.). ACM, 259--268.
[28]
Joseph L Gastwirth. 1988. A clarification of some statistical issues in Watson v. Fort Worth Bank and Trust. Jurimetrics J. 29 (1988), 267.
[29]
Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of Opportunity in Supervised Learning. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5--10, 2016, Barcelona, Spain, Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett (Eds.). 3315--3323. https://proceedings.neurips.cc/paper/2016/hash/9d2682367c3935defcb1f9e247a97c0d-Abstract.html
[30]
Georg Heinze and Michael Schemper. 2002. A solution to the problem of separation in logistic regression. Statistics in medicine 21, 16 (2002), 2409--2419.
[31]
Faisal Kamiran and Toon Calders. 2011. Data preprocessing techniques for classification without discrimination. Knowl. Inf. Syst. 33, 1 (2011), 1--33.
[32]
Lukas Kirschner, Ezekiel O. Soremekun, and Andreas Zeller. 2020. Debugging inputs. In ICSE '20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June -- 19 July, 2020, Gregg Rothermel and Doo-Hwan Bae (Eds.). ACM, 75--86.
[33]
Yibin Liu, Yanhui Li, Jianbo Guo, Yuming Zhou, and Baowen Xu. 2018. Connecting software metrics across versions to predict defects. In 25th International Conference on Software Analysis, Evolution and Reengineering, SANER 2018, Campobasso, Italy, March 20--23, 2018, Rocco Oliveto, Massimiliano Di Penta, and David C. Shepherd (Eds.). IEEE Computer Society, 232--243.
[34]
Linghan Meng, Yanhui Li, Lin Chen, Zhi Wang, Di Wu, Yuming Zhou, and Baowen Xu. 2021. Measuring Discrimination to Boost Comparative Testing for Multiple Deep Learning Models. In 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22--30 May 2021. IEEE, 385--396.
[35]
Vivek Nair, Zhe Yu, Tim Menzies, Norbert Siegmund, and Sven Apel. 2018. Finding faster configurations using flash. IEEE Transactions on Software Engineering 46, 7 (2018), 794--811.
[36]
Jeanine Romano, Jeffrey D Kromrey, Jesse Coraggio, Jeff Skowronek, and Linda Devine. 2006. Exploring methods for evaluating group differences on the NSSE and other surveys: Are the t-test and Cohen's d indices the most appropriate choices. In annual meeting of the Southern Association for Institutional Research. Citeseer, 1--51.
[37]
Ricardo Salazar, Felix Neutatz, and Ziawasch Abedjan. 2021. Automated feature engineering for algorithmic fairness. Proceedings of the VLDB Endowment 14, 9 (2021), 1694--1702.
[38]
Qinbao Song, Martin Shepperd, Michelle Cartwright, and Carolyn Mair. 2006. Software defect association mining and defect correction effort prediction. IEEE Transactions on Software Engineering 32, 2 (2006), 69--82.
[39]
Chad D. Sterling and Ronald A. Olsson. 2007. Automated bug isolation via program chipping. Softw. Pract. Exp. 37, 10 (2007), 1061--1086.
[40]
Latanya Sweeney. 2013. Discrimination in online ad delivery. Commun. ACM 56, 5 (2013), 44--54.
[41]
Chakkrit Tantithamthavorn, Ahmed E Hassan, and Kenichi Matsumoto. 2018. The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. IEEE Transactions on Software Engineering 46, 11 (2018), 1200--1219.
[42]
Sakshi Udeshi, Pryanshu Arora, and Sudipta Chattopadhyay. 2018. Automated directed fairness testing. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3--7, 2018, Marianne Huchard, Christian Kästner, and Gordon Fraser (Eds.). ACM, 98--108.
[43]
Weiyuan Wu, Lampros Flokas, Eugene Wu, and Jiannan Wang. 2020. Complaint-driven Training Data Debugging for Query 2.0. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14--19, 2020, David Maier, Rachel Pottinger, AnHai Doan, Wang-Chiew Tan, Abdussalam Alawini, and Hung Q. Ngo (Eds.). ACM, 1317--1334.
[44]
Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi. 2017. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th international conference on world wide web. 1171--1180.
[45]
Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. 2018. Mitigating Unwanted Biases with Adversarial Learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, AIES 2018, New Orleans, LA, USA, February 02--03, 2018, Jason Furman, Gary E. Marchant, Huw Price, and Francesca Rossi (Eds.). ACM, 335--340.
[46]
Jie M. Zhang and Mark Harman. 2021. "Ignorance and Prejudice" in Software Fairness. In 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22--30 May 2021. IEEE, 1436--1447.
[47]
Jie M Zhang, Mark Harman, Lei Ma, and Yang Liu. 2020. Machine learning testing: Survey, landscapes and horizons. IEEE Transactions on Software Engineering (2020).
[48]
Peixin Zhang, Jingyi Wang, Jun Sun, Guoliang Dong, Xinyu Wang, Xingen Wang, Jin Song Dong, and Ting Dai. 2020. White-box fairness testing through adversarial sampling. In ICSE '20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June - 19 July, 2020, Gregg Rothermel and Doo-Hwan Bae (Eds.). ACM, 949--960.
[49]
Yuming Zhou, Hareton Leung, and Baowen Xu. 2009. Examining the potentially confounding effect of class size on the associations between object-oriented metrics and change-proneness. IEEE Transactions on Software Engineering 35, 5 (2009), 607--623.
[50]
Yuming Zhou, Baowen Xu, Hareton Leung, and Lin Chen. 2014. An in-depth study of the potentially confounding effect of class size in fault prediction. ACM Transactions on Software Engineering and Methodology (TOSEM) 23, 1 (2014), 1--51.

Cited By

View all
  • (2025)Fairness for machine learning software in educationJournal of Systems and Software10.1016/j.jss.2024.112244219:COnline publication date: 1-Jan-2025
  • (2025)Why and How We Combine Multiple Deep Learning Models With Functional OverlapsJournal of Software: Evolution and Process10.1002/smr.7000337:2Online publication date: 16-Feb-2025
  • (2024)Dataset Constrution through Ontology-Based Data Requirements AnalysisApplied Sciences10.3390/app1406223714:6(2237)Online publication date: 7-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '22: Proceedings of the 44th International Conference on Software Engineering
May 2022
2508 pages
ISBN:9781450392211
DOI:10.1145/3510003
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 July 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ML software
  2. debugging
  3. fairness
  4. training data

Qualifiers

  • Research-article

Funding Sources

Conference

ICSE '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)224
  • Downloads (Last 6 weeks)16
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Fairness for machine learning software in educationJournal of Systems and Software10.1016/j.jss.2024.112244219:COnline publication date: 1-Jan-2025
  • (2025)Why and How We Combine Multiple Deep Learning Models With Functional OverlapsJournal of Software: Evolution and Process10.1002/smr.7000337:2Online publication date: 16-Feb-2025
  • (2024)Dataset Constrution through Ontology-Based Data Requirements AnalysisApplied Sciences10.3390/app1406223714:6(2237)Online publication date: 7-Mar-2024
  • (2024)Fairness Concerns in App Reviews: A Study on AI-Based Mobile AppsACM Transactions on Software Engineering and Methodology10.1145/369063334:2(1-30)Online publication date: 29-Aug-2024
  • (2024)Balancing Fairness: Unveiling the Potential of SMOTE-Driven Oversampling in AI Model EnhancementProceedings of the 2024 9th International Conference on Machine Learning Technologies10.1145/3674029.3674034(21-29)Online publication date: 24-May-2024
  • (2024)Fairness Testing: A Comprehensive Survey and Analysis of TrendsACM Transactions on Software Engineering and Methodology10.1145/365215533:5(1-59)Online publication date: 4-Jun-2024
  • (2024)NeuFair: Neural Network Fairness Repair with DropoutProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680380(1541-1553)Online publication date: 11-Sep-2024
  • (2024)Efficient DNN-Powered Software with Fair Sparse ModelsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680336(983-995)Online publication date: 11-Sep-2024
  • (2024)Datactive: Data Fault Localization for Object Detection SystemsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680329(895-907)Online publication date: 11-Sep-2024
  • (2024)Bias Mitigation for Machine Learning Classifiers: A Comprehensive SurveyACM Journal on Responsible Computing10.1145/36313261:2(1-52)Online publication date: 20-Jun-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media