Abstract
Bias in the training data can be inherited by Machine Learning models and then reproduced in socially-sensitive decision-making tasks leading to potentially discriminatory decisions. The state-of-the-art of pre-processing methods to mitigate unfairness in datasets mainly considers a single binary sensitive attribute. We devise GenFair, a fairness-enhancing data pre-processing method that is able to deal with two or more sensitive attributes, possibly multi-valued, at once. The core of the approach is a genetic algorithm for instance generation, which accounts for the plausibility of the synthetic instances w.r.t. the distribution of the original dataset. Results show that GenFair is on par or even better than state-of-the-art approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
The order of sensitive attributes considered may affect the set of instances removed. GenFair guarantees to remove instances close to the decision boundary for the first sensitive attribute given as input. For the following ones, instances already removed might not be the closest to the decision boundary. However, the user can specify the order of sensitive attributes to be considered.
- 3.
In the extreme case where no instances with the combination \(\kappa \) are featured in D, such a medoid does not exist; the algorithm fallbacks to the medoid representative of the entire dataset D.
- 4.
GitHub repository: https://github.com/FedericoMz/GenFair.
- 5.
Datasets from Kaggle. adult and compas pre-processed as in [19]. For german, categorical attributes are label-encoded while Age is binarized.
- 6.
In tables, the best results are in bold, second-best in italics\(. \uparrow \) and \(\downarrow \) indicate if the measure should be maximized or minimized, while \(\rightarrow 0\) and \(\rightarrow 1\) if the ideal value is close to 0 or 1.
- 7.
- 8.
- 9.
References
Agarwal, A., et al.: A reductions approach to fair classification. In: ICML. Proceedings of Machine Learning Research, vol. 80, pp. 60–69. PMLR (2018)
Ball-Burack, A., et al.: Differential tweetment: mitigating racial dialect bias in harmful tweet detection. In: FAccT, pp. 116–128. ACM (2021)
Berk, R., et al.: Fairness in criminal justice risk assessments: the state of the art. Sociol. Methods Res. 50(1), 3–44 (2021)
Cava, L., et al.: Genetic programming approaches to learning fair classifiers. In: GECCO, pp. 967–975 (2020)
Chakraborty, J., et al.: Bias in machine learning software: why? How? What to do? In: ESEC/SIGSOFT FSE, pp. 429–440. ACM (2021)
Cinquini, M., Guidotti, R.: CALIME: causality-aware local interpretable model-agnostic explanations. CoRR abs/2212.05256 (2022)
Dablain, D., et al.: Towards a holistic view of bias in machine learning: bridging algorithmic fairness and imbalanced learning. CoRR abs/2207.06084 (2022)
Fan, et al.: Explanation-guided fairness testing through genetic algorithm. In: ICSE, pp. 871–882 (2022)
Hardt, M., et al.: Equality of opportunity in supervised learning. In: NIPS, pp. 3315–3323 (2016)
Kamiran, F., et al.: Classification with no discrimination by preferential sampling. In: Proceedings of the 19th ML Conference Belgium and The Netherlands, vol. 1. Citeseer (2010)
Katoch, et al.: A review on genetic algorithm: past, present, and future. Multimed. Tools Appl. 80, 8091–8126 (2021)
Lim, S.M., et al.: Crossover and mutation operators of genetic algorithms. Int. J. Mach. Learn. Comput. 7(1), 9–12 (2017)
Liu, F.T., et al.: Isolation forest. In: ICDM, pp. 413–422. IEEE CS (2008)
Mehrabi, N., Fothers: a survey on bias and fairness in machine learning. ACM Comput. Surv. 54(6), 115:1–115:35 (2021)
Ntoutsi, E.: Bias in AI-systems: a multi-step approach. In: NL4XAI. ACL (2020)
Patki, N., et al.: The synthetic data vault. In: DSAA, pp. 399–410. IEEE (2016)
Pedreschi, D., Ruggieri, S., Turini, F.: Discrimination-aware data mining. In: KDD, pp. 560–568. ACM (2008)
Pessach, D., et al.: A review on fairness in machine learning. ACM Comput. Surv. (CSUR) 55(3), 1–44 (2022)
Quy, T.L., Roy, A., Iosifidis, V., Zhang, W., Ntoutsi, E.: A survey on datasets for fairness-aware machine learning. WIREs Data Min. Knowl. Discov. 12(3) (2022)
Raquel, C.R., et al.: An effective use of crowding distance in multiobjective particle swarm optimization. In: GECCO, pp. 257–264. ACM (2005)
Salazar, et al.: Fawos: fairness-aware oversampling algorithm based on distributions of sensitive attributes. IEEE Access 9, 81370–81379 (2021)
Sharma, S., et al.: Data augmentation for discrimination prevention and bias disambiguation. In: AIES, pp. 358–364. ACM (2020)
Tan, P.N., et al.: Introduction to data mining. Pearson Education India (2016)
Verma, et al.: A comprehensive review on NSGA-II for multi-objective combinatorial optimization problems. IEEE Access 9, 57757–57791 (2021)
Verma, S., et al.: Fairness definitions explained. In: FairWare, pp. 1–7. ACM (2018)
Wang, et al.: Augmented fairness: an interpretable model augmenting decision-makers’ fairness. arXiv preprint arXiv:2011.08398 (2020)
Acknowledgment
This work is partially supported by the EU NextGenerationEU programme under the funding schemes PNRR-PE-AI FAIR (Future Artificial Intelligence Research), PNRR-SoBigData.it - Prot. IR0000013, H2020-INFRAIA-2019-1: and Res. Infr. G.A. 871042 SoBigData++.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Mazzoni, F., Manerba, M.M., Cinquini, M., Guidotti, R., Ruggieri, S. (2023). GenFair: A Genetic Fairness-Enhancing Data Generation Framework. In: Bifet, A., Lorena, A.C., Ribeiro, R.P., Gama, J., Abreu, P.H. (eds) Discovery Science. DS 2023. Lecture Notes in Computer Science(), vol 14276. Springer, Cham. https://doi.org/10.1007/978-3-031-45275-8_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-45275-8_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45274-1
Online ISBN: 978-3-031-45275-8
eBook Packages: Computer ScienceComputer Science (R0)