GenFair: A Genetic Fairness-Enhancing Data Generation Framework

Mazzoni, Federico; Manerba, Marta Marchiori; Cinquini, Martina; Guidotti, Riccardo; Ruggieri, Salvatore

doi:10.1007/978-3-031-45275-8_24

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14276))

Included in the following conference series:

International Conference on Discovery Science

1084 Accesses
1 Altmetric

Abstract

Bias in the training data can be inherited by Machine Learning models and then reproduced in socially-sensitive decision-making tasks leading to potentially discriminatory decisions. The state-of-the-art of pre-processing methods to mitigate unfairness in datasets mainly considers a single binary sensitive attribute. We devise GenFair, a fairness-enhancing data pre-processing method that is able to deal with two or more sensitive attributes, possibly multi-valued, at once. The core of the approach is a genetic algorithm for instance generation, which accounts for the plausibility of the synthetic instances w.r.t. the distribution of the original dataset. Results show that GenFair is on par or even better than state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Towards Fairness and Privacy: A Novel Data Pre-processing Optimization Framework for Non-binary Protected Attributes

Synthesizing Fair Decision Trees via Iterative Constraint Solving

Dealing with Data Bias in Classification: Can Generated Data Ensure Representation and Fairness?

Notes

1.
https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm.
2.
The order of sensitive attributes considered may affect the set of instances removed. GenFair guarantees to remove instances close to the decision boundary for the first sensitive attribute given as input. For the following ones, instances already removed might not be the closest to the decision boundary. However, the user can specify the order of sensitive attributes to be considered.
3.
In the extreme case where no instances with the combination $\kappa $ are featured in D, such a medoid does not exist; the algorithm fallbacks to the medoid representative of the entire dataset D.
4.
GitHub repository: https://github.com/FedericoMz/GenFair.
5.
Datasets from Kaggle. adult and compas pre-processed as in [19]. For german, categorical attributes are label-encoded while Age is binarized.
6.
In tables, the best results are in bold, second-best in italics$. \uparrow $ and $\downarrow $ indicate if the measure should be maximized or minimized, while $\rightarrow 0$ and $\rightarrow 1$ if the ideal value is close to 0 or 1.
7.
https://fat-forensics.org/generated/fatf.fairness.data.measures.systemic_bias.html.
8.
https://sdv.dev/.
9.
See: https://sdv.dev/SDV/user_guides/single_table/tabular_preset.html.

References

Agarwal, A., et al.: A reductions approach to fair classification. In: ICML. Proceedings of Machine Learning Research, vol. 80, pp. 60–69. PMLR (2018)
Google Scholar
Ball-Burack, A., et al.: Differential tweetment: mitigating racial dialect bias in harmful tweet detection. In: FAccT, pp. 116–128. ACM (2021)
Google Scholar
Berk, R., et al.: Fairness in criminal justice risk assessments: the state of the art. Sociol. Methods Res. 50(1), 3–44 (2021)
Article MathSciNet Google Scholar
Cava, L., et al.: Genetic programming approaches to learning fair classifiers. In: GECCO, pp. 967–975 (2020)
Google Scholar
Chakraborty, J., et al.: Bias in machine learning software: why? How? What to do? In: ESEC/SIGSOFT FSE, pp. 429–440. ACM (2021)
Google Scholar
Cinquini, M., Guidotti, R.: CALIME: causality-aware local interpretable model-agnostic explanations. CoRR abs/2212.05256 (2022)
Google Scholar
Dablain, D., et al.: Towards a holistic view of bias in machine learning: bridging algorithmic fairness and imbalanced learning. CoRR abs/2207.06084 (2022)
Google Scholar
Fan, et al.: Explanation-guided fairness testing through genetic algorithm. In: ICSE, pp. 871–882 (2022)
Google Scholar
Hardt, M., et al.: Equality of opportunity in supervised learning. In: NIPS, pp. 3315–3323 (2016)
Google Scholar
Kamiran, F., et al.: Classification with no discrimination by preferential sampling. In: Proceedings of the 19th ML Conference Belgium and The Netherlands, vol. 1. Citeseer (2010)
Google Scholar
Katoch, et al.: A review on genetic algorithm: past, present, and future. Multimed. Tools Appl. 80, 8091–8126 (2021)
Google Scholar
Lim, S.M., et al.: Crossover and mutation operators of genetic algorithms. Int. J. Mach. Learn. Comput. 7(1), 9–12 (2017)
Article Google Scholar
Liu, F.T., et al.: Isolation forest. In: ICDM, pp. 413–422. IEEE CS (2008)
Google Scholar
Mehrabi, N., Fothers: a survey on bias and fairness in machine learning. ACM Comput. Surv. 54(6), 115:1–115:35 (2021)
Google Scholar
Ntoutsi, E.: Bias in AI-systems: a multi-step approach. In: NL4XAI. ACL (2020)
Google Scholar
Patki, N., et al.: The synthetic data vault. In: DSAA, pp. 399–410. IEEE (2016)
Google Scholar
Pedreschi, D., Ruggieri, S., Turini, F.: Discrimination-aware data mining. In: KDD, pp. 560–568. ACM (2008)
Google Scholar
Pessach, D., et al.: A review on fairness in machine learning. ACM Comput. Surv. (CSUR) 55(3), 1–44 (2022)
Article Google Scholar
Quy, T.L., Roy, A., Iosifidis, V., Zhang, W., Ntoutsi, E.: A survey on datasets for fairness-aware machine learning. WIREs Data Min. Knowl. Discov. 12(3) (2022)
Google Scholar
Raquel, C.R., et al.: An effective use of crowding distance in multiobjective particle swarm optimization. In: GECCO, pp. 257–264. ACM (2005)
Google Scholar
Salazar, et al.: Fawos: fairness-aware oversampling algorithm based on distributions of sensitive attributes. IEEE Access 9, 81370–81379 (2021)
Google Scholar
Sharma, S., et al.: Data augmentation for discrimination prevention and bias disambiguation. In: AIES, pp. 358–364. ACM (2020)
Google Scholar
Tan, P.N., et al.: Introduction to data mining. Pearson Education India (2016)
Google Scholar
Verma, et al.: A comprehensive review on NSGA-II for multi-objective combinatorial optimization problems. IEEE Access 9, 57757–57791 (2021)
Google Scholar
Verma, S., et al.: Fairness definitions explained. In: FairWare, pp. 1–7. ACM (2018)
Google Scholar
Wang, et al.: Augmented fairness: an interpretable model augmenting decision-makers’ fairness. arXiv preprint arXiv:2011.08398 (2020)

Download references

Acknowledgment

This work is partially supported by the EU NextGenerationEU programme under the funding schemes PNRR-PE-AI FAIR (Future Artificial Intelligence Research), PNRR-SoBigData.it - Prot. IR0000013, H2020-INFRAIA-2019-1: and Res. Infr. G.A. 871042 SoBigData++.

Author information

Authors and Affiliations

University of Pisa, Pisa, Italy
Federico Mazzoni, Marta Marchiori Manerba, Martina Cinquini, Riccardo Guidotti & Salvatore Ruggieri

Authors

Federico Mazzoni
View author publications
You can also search for this author in PubMed Google Scholar
Marta Marchiori Manerba
View author publications
You can also search for this author in PubMed Google Scholar
Martina Cinquini
View author publications
You can also search for this author in PubMed Google Scholar
Riccardo Guidotti
View author publications
You can also search for this author in PubMed Google Scholar
Salvatore Ruggieri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Federico Mazzoni .

Editor information

Editors and Affiliations

Waikato University, Hamilton, New Zealand
Albert Bifet
Aeronautics Institute of Technology, São José dos Campos, Brazil
Ana Carolina Lorena
University of Porto, Porto, Portugal
Rita P. Ribeiro
University of Porto, Porto, Portugal
João Gama
University of Coimbra, Coimbra, Portugal
Pedro H. Abreu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mazzoni, F., Manerba, M.M., Cinquini, M., Guidotti, R., Ruggieri, S. (2023). GenFair: A Genetic Fairness-Enhancing Data Generation Framework. In: Bifet, A., Lorena, A.C., Ribeiro, R.P., Gama, J., Abreu, P.H. (eds) Discovery Science. DS 2023. Lecture Notes in Computer Science(), vol 14276. Springer, Cham. https://doi.org/10.1007/978-3-031-45275-8_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-45275-8_24
Published: 08 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45274-1
Online ISBN: 978-3-031-45275-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics