Skip to main content

GenFair: A Genetic Fairness-Enhancing Data Generation Framework

  • Conference paper
  • First Online:
Discovery Science (DS 2023)

Abstract

Bias in the training data can be inherited by Machine Learning models and then reproduced in socially-sensitive decision-making tasks leading to potentially discriminatory decisions. The state-of-the-art of pre-processing methods to mitigate unfairness in datasets mainly considers a single binary sensitive attribute. We devise GenFair, a fairness-enhancing data pre-processing method that is able to deal with two or more sensitive attributes, possibly multi-valued, at once. The core of the approach is a genetic algorithm for instance generation, which accounts for the plausibility of the synthetic instances w.r.t. the distribution of the original dataset. Results show that GenFair is on par or even better than state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm.

  2. 2.

    The order of sensitive attributes considered may affect the set of instances removed. GenFair guarantees to remove instances close to the decision boundary for the first sensitive attribute given as input. For the following ones, instances already removed might not be the closest to the decision boundary. However, the user can specify the order of sensitive attributes to be considered.

  3. 3.

    In the extreme case where no instances with the combination \(\kappa \) are featured in D, such a medoid does not exist; the algorithm fallbacks to the medoid representative of the entire dataset D.

  4. 4.

    GitHub repository: https://github.com/FedericoMz/GenFair.

  5. 5.

    Datasets from Kaggle. adult and compas pre-processed as in [19]. For german, categorical attributes are label-encoded while Age is binarized.

  6. 6.

    In tables, the best results are in bold, second-best in italics\(. \uparrow \) and \(\downarrow \) indicate if the measure should be maximized or minimized, while \(\rightarrow 0\) and \(\rightarrow 1\) if the ideal value is close to 0 or 1.

  7. 7.

    https://fat-forensics.org/generated/fatf.fairness.data.measures.systemic_bias.html.

  8. 8.

    https://sdv.dev/.

  9. 9.

    See: https://sdv.dev/SDV/user_guides/single_table/tabular_preset.html.

References

  1. Agarwal, A., et al.: A reductions approach to fair classification. In: ICML. Proceedings of Machine Learning Research, vol. 80, pp. 60–69. PMLR (2018)

    Google Scholar 

  2. Ball-Burack, A., et al.: Differential tweetment: mitigating racial dialect bias in harmful tweet detection. In: FAccT, pp. 116–128. ACM (2021)

    Google Scholar 

  3. Berk, R., et al.: Fairness in criminal justice risk assessments: the state of the art. Sociol. Methods Res. 50(1), 3–44 (2021)

    Article  MathSciNet  Google Scholar 

  4. Cava, L., et al.: Genetic programming approaches to learning fair classifiers. In: GECCO, pp. 967–975 (2020)

    Google Scholar 

  5. Chakraborty, J., et al.: Bias in machine learning software: why? How? What to do? In: ESEC/SIGSOFT FSE, pp. 429–440. ACM (2021)

    Google Scholar 

  6. Cinquini, M., Guidotti, R.: CALIME: causality-aware local interpretable model-agnostic explanations. CoRR abs/2212.05256 (2022)

    Google Scholar 

  7. Dablain, D., et al.: Towards a holistic view of bias in machine learning: bridging algorithmic fairness and imbalanced learning. CoRR abs/2207.06084 (2022)

    Google Scholar 

  8. Fan, et al.: Explanation-guided fairness testing through genetic algorithm. In: ICSE, pp. 871–882 (2022)

    Google Scholar 

  9. Hardt, M., et al.: Equality of opportunity in supervised learning. In: NIPS, pp. 3315–3323 (2016)

    Google Scholar 

  10. Kamiran, F., et al.: Classification with no discrimination by preferential sampling. In: Proceedings of the 19th ML Conference Belgium and The Netherlands, vol. 1. Citeseer (2010)

    Google Scholar 

  11. Katoch, et al.: A review on genetic algorithm: past, present, and future. Multimed. Tools Appl. 80, 8091–8126 (2021)

    Google Scholar 

  12. Lim, S.M., et al.: Crossover and mutation operators of genetic algorithms. Int. J. Mach. Learn. Comput. 7(1), 9–12 (2017)

    Article  Google Scholar 

  13. Liu, F.T., et al.: Isolation forest. In: ICDM, pp. 413–422. IEEE CS (2008)

    Google Scholar 

  14. Mehrabi, N., Fothers: a survey on bias and fairness in machine learning. ACM Comput. Surv. 54(6), 115:1–115:35 (2021)

    Google Scholar 

  15. Ntoutsi, E.: Bias in AI-systems: a multi-step approach. In: NL4XAI. ACL (2020)

    Google Scholar 

  16. Patki, N., et al.: The synthetic data vault. In: DSAA, pp. 399–410. IEEE (2016)

    Google Scholar 

  17. Pedreschi, D., Ruggieri, S., Turini, F.: Discrimination-aware data mining. In: KDD, pp. 560–568. ACM (2008)

    Google Scholar 

  18. Pessach, D., et al.: A review on fairness in machine learning. ACM Comput. Surv. (CSUR) 55(3), 1–44 (2022)

    Article  Google Scholar 

  19. Quy, T.L., Roy, A., Iosifidis, V., Zhang, W., Ntoutsi, E.: A survey on datasets for fairness-aware machine learning. WIREs Data Min. Knowl. Discov. 12(3) (2022)

    Google Scholar 

  20. Raquel, C.R., et al.: An effective use of crowding distance in multiobjective particle swarm optimization. In: GECCO, pp. 257–264. ACM (2005)

    Google Scholar 

  21. Salazar, et al.: Fawos: fairness-aware oversampling algorithm based on distributions of sensitive attributes. IEEE Access 9, 81370–81379 (2021)

    Google Scholar 

  22. Sharma, S., et al.: Data augmentation for discrimination prevention and bias disambiguation. In: AIES, pp. 358–364. ACM (2020)

    Google Scholar 

  23. Tan, P.N., et al.: Introduction to data mining. Pearson Education India (2016)

    Google Scholar 

  24. Verma, et al.: A comprehensive review on NSGA-II for multi-objective combinatorial optimization problems. IEEE Access 9, 57757–57791 (2021)

    Google Scholar 

  25. Verma, S., et al.: Fairness definitions explained. In: FairWare, pp. 1–7. ACM (2018)

    Google Scholar 

  26. Wang, et al.: Augmented fairness: an interpretable model augmenting decision-makers’ fairness. arXiv preprint arXiv:2011.08398 (2020)

Download references

Acknowledgment

This work is partially supported by the EU NextGenerationEU programme under the funding schemes PNRR-PE-AI FAIR (Future Artificial Intelligence Research), PNRR-SoBigData.it - Prot. IR0000013, H2020-INFRAIA-2019-1: and Res. Infr. G.A. 871042 SoBigData++.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Federico Mazzoni .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mazzoni, F., Manerba, M.M., Cinquini, M., Guidotti, R., Ruggieri, S. (2023). GenFair: A Genetic Fairness-Enhancing Data Generation Framework. In: Bifet, A., Lorena, A.C., Ribeiro, R.P., Gama, J., Abreu, P.H. (eds) Discovery Science. DS 2023. Lecture Notes in Computer Science(), vol 14276. Springer, Cham. https://doi.org/10.1007/978-3-031-45275-8_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-45275-8_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-45274-1

  • Online ISBN: 978-3-031-45275-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics