A novel fairness-aware ensemble model based on hybrid sampling and modified two-layer stacking for fair classification

Zhang, Wenyu; He, Fang; Zhang, Shuai

doi:10.1007/s13042-023-01870-1

A novel fairness-aware ensemble model based on hybrid sampling and modified two-layer stacking for fair classification

Original Article
Published: 29 May 2023

Volume 14, pages 3883–3896, (2023)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

254 Accesses
3 Citations
Explore all metrics

Abstract

Fair classification is an important field in machine learning, especially in ensemble learning research. However, traditional machine learning methods neither consider the bias in datasets nor the unfairness while training the ensemble model. Therefore, in this paper, a novel fairness-aware ensemble model (FAEM) based on hybrid sampling and modified two-layer stacking is proposed to achieve more equitable predictive performance. To reduce the bias caused by the imbalanced dataset, a new hybrid sampling-based bias-alleviation method is proposed, which removes majority samples through cross-validation-based under-sampling and adds generated minority samples through sensitive attribute-based over-sampling. The fairness of the proposed FAEM is further improved by the proposed new two-layer stacking-based fairness-aware ensemble learning method, which modifies the individual prediction results of the base classifiers in the first layer of stacking to alleviate the bias. Four datasets and five evaluation metrics were used to evaluate the classification performance and fairness of the model. The experiment results show that the proposed FAEM can effectively trade off accuracy for fairness and is 54.8% better than benchmark models in fairness metrics in average.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enforcing fairness using ensemble of diverse Pareto-optimal models

Article 14 February 2023

One-vs.-One Mitigation of Intersectional Bias: A General Method for Extending Fairness-Aware Binary Classification

A Proposal of a Fair Voting Ensemble Classifier Using Multi-objective Optimization

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The datasets analyzed during the current study are available in the UCI repository and Propublica datastore. German dataset is from https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german, Adult dataset is from https://archive.ics.uci.edu/ml/machine-learning-databases/adult, Bank dataset is from https://archive.ics.uci.edu/ml/machine-learning-databases/00222, Compas dataset is from https://github.com/propublica/compas-analysis/blob/master/compas-scores-two-years.csv.

References

Adankon MM, Cheriet M (2011) Help-training for semi-supervised support vector machines. Pattern Recogn 44(9):2220–2230
Article Google Scholar
Awasthi P, Kleindessner M, Morgenstern J (2020) Equalized odds postprocessing under imperfect group information. In: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, August 26–28, Palermo, Sicily, Italy, pp 1770–1780.
Bellamy RK, Dey K, Hind M, Hoffman SC, Houde S, Kannan K et al (2019) AI Fairness 360: an extensible toolkit for detecting and mitigating algorithmic bias. IBM J Res Dev 4(1–4):15
Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Article MATH Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Brodersen KH, Ong CS, Stephan KE, Buhmann JM (2010) The balanced accuracy and its posterior distribution. In: Proceedings of the 20th International Conference on Pattern Recognition, August 23–26, Istanbul, Turkey, pp 3121–3124.
Calders T, Verwer S (2010) Three naive bayes approaches for discrimination-free classification. Data Min Knowl Disc 21(2):277–292
Article MathSciNet Google Scholar
Calmon FP, Wei D, Vinzamuri B, Ramamurthy KN, Varshney KR (2017) Optimized pre-processing for discrimination prevention. In: Proceedings of the 31st international conference on neural information processing systems, December 4–9, Long Beach, CA, USA, pp 3995–4004.
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article MATH Google Scholar
Chen TQ, Guestrin C (2016) XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, USA, August 13–17, pp 785–794.
d’Alessandro B, O’Neil C, LaGatta T (2017) Conscientious classification: a data scientist’s guide to discrimination-aware classification. Big Data 5(2):120–134
Article Google Scholar
Fawcett T (2004) ROC graphs: notes and practical considerations for researchers. Pattern Recogn Lett 31(8):1–38
Google Scholar
Feldman M, Friedler SA, Moeller J, Scheidegger C, Venkatasubramanian S (2015) Certifying and removing disparate impact. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, August 10–13, Sydney, NSW, Australia, pp. 259–268.
Freund Y, Schapire RE (1996). Experiments with a new boosting algorithm. In: Proceedings of the 13th international conference on machine learning, July 3–6, Bari, Italy, pp. 148–156.
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232
Article MathSciNet MATH Google Scholar
Fu AM, Liu JN, Zhang TL (2022) Self-stacking random weight neural network with multi-layer features fusion. Int J Mach Learn Cyber 13:1–13
Article Google Scholar
Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Proceedings of 2005 international conference on intelligent computing, August 23–26. Hefei, China, pp 878–887.
Hardt M, Price E, Srebro N (2016) Equality of opportunity in supervised learning. Adv Neural Inf Process Syst 29:3315–3323
Google Scholar
He HB, Bai Y, Garcia EA, Li ST (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), June 1–8, Hong Kong, China, pp 1322–1328.
He HL, Zhang WY, Zhang S (2018) A novel ensemble method for credit scoring: adaption of different imbalance ratios. Expert Syst Appl 98:105–117
Article Google Scholar
Jiang WL, Chen ZH, Xiang Y, Shao DG, Ma L, Zhang JP (2019) SSEM: a novel self-adaptive stacking ensemble model for classification. IEEE Access 7:120337–120349
Article Google Scholar
Kamiran F, Calders T (2009) Classifying without discriminating. In: Proceedings of 2009 2nd international conference on computer, control and communication, February 17–18, Karachi, Pakistan, pp 1–6.
Kamiran F, Calders T (2012) Data preprocessing techniques for classification without discrimination. Knowl Inf Syst 33(1):1–33
Article Google Scholar
Kamiran F, Karim A, Zhang XL (2012) Decision theory for discrimination-aware classification. In: Proceedings of 2012 IEEE 12th international conference on data mining, December 10–13, Brussels, Belgium, pp 924–929.
Kamishima T, Akaho S, Asoh H, Sakuma J (2012) Fairness-aware classifier with prejudice remover regularizer. In: Proceedings of 2012 joint European conference on machine learning and knowledge discovery in databases, September 24–28, Bristol, UK, pp 35–50.
Kamishima T, Akaho S, Asoh H, Sakuma J (2018) Model-based and actual independence for fairness-aware classification. Data Min Knowl Disc 32(1):258–286
Article MathSciNet MATH Google Scholar
Ke GL, Meng Q, Finley T, Wang TF, Chen W, Ma WD, Ye QW, Liu TY (2017) LightGBM: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst 30:3146–3154
Google Scholar
Kearns M, Neel S, Roth A, Wu ZS (2019) An empirical study of rich subgroup fairness for machine learning. In: Proceedings of 2019 ACM conference on fairness, accountability, and transparency, January 29–31, Atlanta, GA, USA, pp. 100–109.
Liu XY, Wu JX, Zhou ZH (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern Part B 39(2):539–550
Article Google Scholar
Mishler A, Kennedy EH, Chouldechova A (2021) Fairness in risk assessment instruments: post-processing to achieve counterfactual equalized odds. In: Proceedings of 2021 ACM conference on fairness, accountability, and transparency, March 3–10, New York, USA, pp 386–400.
Nguyen D, Gupta S, Rana S, Shilton A, Venkatesh S (2021) Fairness improvement for black-box classifiers with Gaussian process. Inf Sci 576:542–556
Article MathSciNet Google Scholar
Nikpour B, Nezamabadi-pour H (2019) A memetic approach for training set selection in imbalanced data sets. Int J Mach Learn Cybern 10(11):3043–3070
Article Google Scholar
Niu K, Zhang ZM, Liu Y, Li RF (2020) Resampling ensemble model based on data distribution for imbalanced credit risk evaluation in P2P lending. Inf Sci 536:120–134
Article MathSciNet Google Scholar
Pedreshi D, Ruggieri S, Turini F (2008) Discrimination-aware data mining. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, August 24–27, Las Vegas, Nevada, USA, pp. 560–568.
Pleiss G, Raghavan M, Wu F, Kleinberg J, Weinberger KQ (2017) On fairness and calibration. arXiv preprint arXiv:1709.02012.
Seiffert C, Khoshgoftaar TM, Van Hulse J (2009) Hybrid sampling for imbalanced data. Integr Comput Aided Eng 16(3):193–210
Article Google Scholar
Seng Z, Kareem SA, Varathan KD (2021) A neighborhood undersampling stacked ensemble (NUS-SE) in imbalanced classification. Expert Syst Appl 168:114246
Article Google Scholar
Stehman SV (1997) Selecting and interpreting measures of thematic classification accuracy. Remote Sens Environ 62(1):77–89
Article Google Scholar
Sun B, Chen HY, Wang JD, Xie H (2018) Evolutionary under-sampling based bagging ensemble method for imbalanced data classification. Front Comp Sci 12(2):331–350
Article Google Scholar
Tao XM, Li Q, Guo WJ, Ren C, He Q, Liu R, Zou JR (2020) Adaptive weighted over-sampling for imbalanced datasets based on density peaks clustering with heuristic filtering. Inf Sci 519:43–73
Article MathSciNet MATH Google Scholar
Valdivia A, Sánchez-Monedero J, Casillas J (2021) How fair can we go in machine learning? Assessing the boundaries of accuracy and fairness. Int J Intell Syst 36(4):1619–1643
Article Google Scholar
Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259
Article Google Scholar
Yen SJ, Lee YS (2006) Under-sampling approaches for improving prediction of the minority class in an imbalanced dataset. In: Proceedings of 2006 international conference on intelligent computing, August 16–19, Kunming, China, pp. 731–740.
Zhang CK, Zhou Y, Guo JW, Wang GQ, Wang X (2019) Research on classification method of high-dimensional class-imbalanced datasets based on SVM. Int J Mach Learn Cybern 10(7):1765–1778
Article Google Scholar
Zhang WY, Yang DQ, Zhang S (2021) A new hybrid ensemble model with voting-based outlier detection and balanced sampling for credit scoring. Expert Syst Appl 174:114744
Article Google Scholar
Zhu YW, Yan YT, Zhang YW, Zhang YP (2020) EHSO: evolutionary hybrid sampling in overlapping scenarios for imbalanced learning. Neurocomputing 417:333–346
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 51975512), Zhejiang Natural Science Foundation of China (No. LZ20E050001), Zhejiang Key R&D Program of China (No. 2022C03166, No. 2021C03153).

Author information

Authors and Affiliations

School of Information Management and Artificial Intelligence, Zhejiang University of Finance and Economics, Hangzhou, 310018, China
Wenyu Zhang, Fang He & Shuai Zhang

Authors

Wenyu Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Fang He
View author publications
You can also search for this author inPubMed Google Scholar
Shuai Zhang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Shuai Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, W., He, F. & Zhang, S. A novel fairness-aware ensemble model based on hybrid sampling and modified two-layer stacking for fair classification. Int. J. Mach. Learn. & Cyber. 14, 3883–3896 (2023). https://doi.org/10.1007/s13042-023-01870-1

Download citation

Received: 06 May 2022
Accepted: 11 May 2023
Published: 29 May 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s13042-023-01870-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel fairness-aware ensemble model based on hybrid sampling and modified two-layer stacking for fair classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Enforcing fairness using ensemble of diverse Pareto-optimal models

One-vs.-One Mitigation of Intersectional Bias: A General Method for Extending Fairness-Aware Binary Classification

A Proposal of a Fair Voting Ensemble Classifier Using Multi-objective Optimization

Explore related subjects

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now