Fairness in vulnerable attribute prediction on social media

Beiró, Mariano G.; Kalimeri, Kyriaki

doi:10.1007/s10618-022-00855-y

Fairness in vulnerable attribute prediction on social media

Published: 17 September 2022

Volume 36, pages 2194–2213, (2022)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

602 Accesses
3 Citations
4 Altmetric
Explore all metrics

Abstract

Historically, policymakers and practitioners relied exclusively on survey and census data to design and plan for assistive interventions; now, social media offer a timely and cost-effective way to reach out to populations otherwise unobserved. This study was designed to address the needs of a non-for-profit organisation to reach out to the young unemployed individuals in Italy with educational and job opportunities via communication channels that are more likely to appeal to younger generations. To this extend, we developed an ad-hoc Facebook application which administers questionnaires while gathering data about the Likes on Facebook Pages. Then, we developed a machine learning framework that successfully predicts the unemployment status of an unseen individual (.74 AUC). However, blindly delegating to the machine learning model the communication intervention may lead to digital discrimination on the basis of socio-demographic characteristics. Here, we propose a framework that aims to optimising both for the prediction performance as well as the most adequate fairness metric. Our framework is based on an adaptive threshold for gender, while we show that it can be expanded for other socio-demographic attributes and generalised for other interventions of assistive character. We present a doubly cross-validated setting that achieves out-of-sample stability and generalisability of results. We compare the behaviour of models that infer on different sets of data and provide an indepth discussion on the most predictive features, demonstrating that the “fairness through unawareness” approach does not suffice to achieve a fair classification since sensitive demographic information can be inferred not only via other sociodemographic attributes but also from behavioural digital patterns. Finally, we thoroughly assess the behaviour of the adaptive threshold approach and provide an in-depth discussion on the advantages but also the implications of such models offering actionable insights. Our results show that careful assessment of fairness metrics should be considered, primarily when AI models are employed for policymaking.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Advances in Social Media Research: Past, Present and Future

Article Open access 06 November 2017

Social Media and its Negative Impacts on Autonomy

Article Open access 27 July 2022

Social Determinants of Mental Health: Where We Are and Where We Need to Go

Article 17 September 2018

Notes

https://ec.europa.eu/social/main.jsp?catId=1036
We conventionally refer to the AUROC values as “accuracy” throughout this paper.
The gender attribute is considered to be a binary variable since very few participants opted for the “Other” option.
A comparison between the geographical distribution of our sample per region and the expected values from the official Census is shown in the Supplementary Materials.
This choice is based on the fact that both groups do not actively search for a job.
Link to the list of categories: https://developers.facebook.com/docs/commerce-platform/catalog/categories/google-product-category-to-facebook-product-category
The full ranges for each hyperparameter are reported in the Supplementary Materials.
All experiments are performed in Python (Van Rossum and Drake 2009) with scikit-learn (Pedregosa et al. 2011).
The baseline AUC for our tasks is .50.

References

Agarwal A, Beygelzimer A, Dudík M, Langford J, Wallach H (2018) A reductions approach to fair classification. In: International Conference on Machine Learning, pp 60–69. PMLR
Aiken E, Bellue S, Karlan D, Udry C, Blumenstock JE (2022) Machine learning and phone data can improve targeting of humanitarian aid. Nature 1–7
Akintande OJ (2021) Algorithm fairness through data inclusion, participation, and reciprocity. In: International Conference on Database Systems for Advanced Applications, Springer, pp 633–637
Baeza-Yates R, Ribeiro-Neto B et al (1999) Modern Information Retrieval, vol 463. ACM Press, New York
Google Scholar
Barocas S, Selbst AD (2016) Big data’s disparate impact. Calif L Rev 104:671
Google Scholar
Becker GS (2010) The Economics of Discrimination. University of Chicago Press, Chicago
Google Scholar
Bento M, Martinez LM, Martinez LF (2018) Brand engagement and search for brands on social media: Comparing generations x and y in portugal. J of Retailing and Consum Serv 43:234–241
Article Google Scholar
Beutel A, Chen J, Doshi T, Qian H, Woodruff A, Luu C, Kreitmann P, Bischof J, Chi EH (2019) Putting fairness principles into practice: Challenges, metrics, and improvements. In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pp 453–459
Bi B, Shokouhi M, Kosinski M, Graepel T (2013) Inferring the demographics of search users: Social data meets search queries. In: Proceedings of the 22Nd International Conference on World Wide Web. WWW ’13, ACM, New York, NY, USA, pp 131–140. https://doi.org/10.1145/2488388.2488401
Bokányi E, Lábszki Z, Vattay G (2017) Prediction of employment and unemployment rates from twitter daily rhythms in the us. EPJ Data Sci 6(1):14
Article Google Scholar
Bonanomi A, Rosina A, Cattuto C, Kalimeri K (2017) Understanding youth unemployment in italy via social media data. In: 28th IUSSP International Population Conference, Cape Town, South Africa
Calders T, Verwer S (2010) Three naive bayes approaches for discrimination-free classification. Data mining and knowl discov 21(2):277–292
Article MathSciNet Google Scholar
Chhabra A, Masalkovaitė K, Mohapatra P (2021) An overview of fairness in clustering. IEEE Access
Chouldechova A (2017) Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data 5(2):153–163
Article Google Scholar
Corbett-Davies S, Pierson E, Feller A, Goel S, Huq A (2017) Algorithmic decision making and the cost of fairness. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’17, Association for Computing Machinery, New York, NY, USA pp 797–806. https://doi.org/10.1145/3097983.3098095
Desiere S, Langenbucher K, et al. (2018) Profiling tools for early identification of jobseekers who need extra support. OECD Policy Brief on Activation Policies (dec) 1–4
Desiere S, Struyven L (2020) Using artificial intelligence to classify jobseekers: The accuracy-equity trade-off. Journal Of Social Policy
Dong Y, Yang Y, Tang J, Yang Y, Chawla NV (2014) Inferring user demographics and social strategies in mobile social networks. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, USA, pp 15–24. https://doi.org/10.1145/2623330.2623703
Dutta S, Wei D, Yueksel H, Chen P-Y, Liu S, Varshney K (2020) Is there a trade-off between fairness and accuracy? a perspective using mismatched hypothesis testing. In: International Conference on Machine Learning, pp 2803–2813. PMLR
Eslami, M., Krishna Kumaran, S.R., Sandvig, C., Karahalios, K.: Communicating algorithmic process in online behavioral advertising. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, pp. 1–13 (2018)
Fatehkia M, Kashyap R, Weber I (2018) Using facebook ad data to track the global digital gender gap. World Dev 107:189–209
Article Google Scholar
Fatehkia M, Coles B, Ofli F, Weber I (2020) The relative value of facebook advertising data for poverty mapping. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 14, pp 934–938
Felbo B, Sundsøy P, Lehmann S, de Montjoye Y-A et al. (2017) Modeling the temporal nature of human behavior for demographics prediction. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, pp 140–152
Gao J, Zhang Y-C, Zhou T (2019) Computational socioeconomics. Physics Reports
Goel S, Hofman J, Sirer MI (2012) Who does what on the web: Studying web browsing behavior at scale. In: International Conference on Weblogs and Social Media, pp 130–137
Goyat S (2011) The basis of market segmentation: A critical review of literature. Eur J of Bus and Management 3(9):45–54
Google Scholar
Hardt M, Price E, Srebro N (2016) Equality of opportunity in supervised learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. NIPS’16, Red Hook, NY, USA, pp 3323–3331
ISTAT (2020) ISTAT Database. Data on unemployed rate. http://dati.istat.it
Kalimeri K, Beiró MG, Delfino M, Raleigh R, Cattuto C (2019) Predicting demographics, moral foundations, and human values from digital behaviours. Comput in Human Behav 92:428–445
Article Google Scholar
Kalimeri K, Beiró MG, Bonanomi A, Rosina A, Cattuto C (2020) Traditional versus facebook-based surveys: Evaluation of biases in self-reported demographic and psychometric information. Demogr Res 42(5):133–148
Article Google Scholar
Kamiran F, Calders T (2012) Data preprocessing techniques for classification without discrimination. Knowl and Inf Syst 33(1):1–33
Article Google Scholar
Kamishima T, Akaho S, Asoh H, Sakuma J (2012) Fairness-aware classifier with prejudice remover regularizer. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, pp 35–50
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y (2017) Lightgbm: A highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, pp 3146–3154
Kilbertus N, Rojas Carulla M, Parascandolo G, Hardt M, Janzing D, Schölkopf B (2017) Avoiding discrimination through causal reasoning. Advances in neural information processing systems 30
Kleinberg J, Mullainathan S, Raghavan M (2016) Inherent trade-offs in the fair determination of risk scores. arXiv preprint arXiv:1609.05807
Kosinski M, Stillwell D, Graepel T (2013) Private traits and attributes are predictable from digital records of human behavior. Proc of the National Acad of Sci 110(15):5802–5805
Article Google Scholar
Kuhn P (1987) Sex discrimination in labor markets: The role of statistical evidence. The American Economic Review 567–583
Leonelli S, Lovell R, Wheeler BW, Fleming L, Williams H (2021) From fair data to fair data use: Methodological data fairness in health-related social media research. Big Data & Soc 8(1):20539517211010310
Article Google Scholar
Llorente A, Garcia-Herranz M, Cebrian M, Moro E (2015) Social media fingerprints of unemployment. PLOS ONE 10(5):1–13
Article Google Scholar
Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz R, Himmelfarb J, Bansal N, Lee S-I (2019) Explainable AI for Trees: From Local Explanations to Global Understanding
Lundberg SM, Lee S-I (2017a) A unified approach to interpreting model predictions. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in Neural Information Processing Systems 30, pp 4765–4774
Lundberg S, Lee S-I (2017b) A unified approach to interpreting model predictions. arXiv preprint arXiv:1705.07874
Malmi E, Weber I (2016) You are what apps you use: Demographic prediction based on user’s apps. ICWSM, 635–638
Mason SJ, Graham NE (2002) Areas beneath the relative operating characteristics (roc) and relative operating levels (rol) curves: Statistical significance and interpretation. Quarterly J of the Royal Meteorol Soc 128(584):2145–2166
Article Google Scholar
Matz SC, Menges JI, Stillwell DJ, Schwartz HA (2019) Predicting individual-level income from facebook profiles. PloS one 14(3):0214369
Article Google Scholar
Ntoutsi E, Fafalios P, Gadiraju U, Iosifidis V, Nejdl W, Vidal M-E, Ruggieri S, Turini F, Papadopoulos S, Krasanakis E et al (2020) Bias in data-driven artificial intelligence systems-an introductory survey. Wiley Int Rev: Data Mining and Knowl Discov 10(3):1356
Google Scholar
Olteanu A, Castillo C, Diaz F, Kıcıman E (2019) Social data: Biases, methodological pitfalls, and ethical boundaries. Frontiers in Big Data 2:13
Article Google Scholar
Olteanu A, Castillo C, Diaz F, Kiciman E (2016) Social data: Biases, methodological pitfalls, and ethical boundaries. https://doi.org/10.2139/ssrn.2886526
O’Neil C (2016) Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown, New York
MATH Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. J of Mach Learning Res 12:2825–2830
MathSciNet MATH Google Scholar
Pedreshi D, Ruggieri S, Turini F (2008) Discrimination-aware data mining. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 560–568
Pessach D, Shmueli E (2022) A review on fairness in machine learning. ACM Comput Surveys (CSUR) 55(3):1–44
Article Google Scholar
Rama D, Mejova Y, Tizzoni M, Kalimeri K, Weber I (2020) Facebook ads as a demographic tool to measure the urban-rural divide. In: Proceedings of The Web Conference 2020, pp 327–338
Saleiro P, Kuester B, Stevens A, Anisfeld A, Hinkson L, London J, Ghani R (2018) Aequitas: A bias and fairness audit toolkit. arXiv preprint arXiv:1811.05577
Seneviratne S, Seneviratne A, Mohapatra P, Mahanti A (2015) Your installed apps reveal your gender and more! ACM SIGMOBILE Mobile Comput and Commun Rev 18(3):55–61
Article Google Scholar
Stoll MA, Raphael S, Holzer HJ (2004) Black job applicants and the hiring officer’s race. ILR Rev 57(2):267–287
Article Google Scholar
Sundsøy P, Bjelland J, Reme B-A, Jahani E, Wetter E, Bengtsson L (2016) Estimating individual employment status using mobile phone network data. arXiv preprint arXiv:1612.03870
Toole JL, Lin Y-R, Muehlegger E, Shoag D, González MC, Lazer D (2015) Tracking employment shocks using mobile phone data. J of The Royal Soc Int 12(107):20150185
Article Google Scholar
Urbinati A, Kalimeri K, Bonanomi A, Rosina A, Cattuto C, Paolotti D (2020) Young adult unemployment through the lens of social media: Italy as a case study. In: International Conference on Social Informatics, Springer, Cham, pp 380–396
van Landeghem B, Desiere S, Struyven L (2021) Statistical profiling of unemployed jobseekers. IZA World of Labor, Germany
Book Google Scholar
Van Rossum G, Drake FL (2009) Python 3 Reference Manual. CreateSpace, Scotts Valley, CA
Google Scholar
Verma S, Rubin J (2018) Fairness definitions explained. In: 2018 IEEE/ACM International Workshop on Software Fairness (fairware), pp 1–7. IEEE
Wood R, Murch B, Betteridge R (2019) A comparison of population segmentation methods. Oper Res for Health Care 22:100192
Article Google Scholar
Yeung K, Lodge M (2019) The Possibilities of Digital Discrimination: Research on E-commerce, Algorithms and Big Data. Oxford University Press, UK
Google Scholar
Ying JJ-C, Chang Y-J, Huang C-M, Tseng VS (2012) Demographic prediction based on users mobile behaviors. Mobile Data Challenge
Zafar MB, Valera I, Gomez Rodriguez M, Gummadi KP (2017) Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In: Proceedings of the 26th International Conference on World Wide Web, pp 1171–1180
Zemel R, Wu Y, Swersky K, Pitassi T, Dwork C (2013) Learning fair representations. In: International Conference on Machine Learning, pp 325–333. PMLR
Zhang BH, Lemoine B, Mitchell M (2018) Mitigating unwanted biases with adversarial learning. In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pp 335–340
Zhong Y, Yuan NJ, Zhong W, Zhang F, Xie X (2015) You are where you go: Inferring demographic attributes from location check-ins. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. WSDM ’15, ACM, New York, NY, USA, pp 295–304

Download references

Acknowledgements

K.K acknowledges support from the “Lagrange Project” of the ISI Foundation funded by the Fondazione CRT.

Author information

Authors and Affiliations

Facultad de Ingeniería, Universidad de Buenos Aires, Av. Paseo Colón 850, Buenos Aires, Argentina
Mariano G. Beiró
CONICET–Universidad de Buenos Aires, INTECIN, Av. Paseo Colón 850, Buenos Aires, Argentina
Mariano G. Beiró
ISI Foundation, Turin, Italy
Kyriaki Kalimeri

Authors

Mariano G. Beiró
View author publications
You can also search for this author in PubMed Google Scholar
Kyriaki Kalimeri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Mariano G. Beiró or Kyriaki Kalimeri.

Additional information

Responsible editor: Toon Calders.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 1261 KB)

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Beiró, M.G., Kalimeri, K. Fairness in vulnerable attribute prediction on social media. Data Min Knowl Disc 36, 2194–2213 (2022). https://doi.org/10.1007/s10618-022-00855-y

Download citation

Received: 30 August 2021
Accepted: 05 July 2022
Published: 17 September 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s10618-022-00855-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fairness in vulnerable attribute prediction on social media

Abstract

Access this article

Similar content being viewed by others

Advances in Social Media Research: Past, Present and Future

Social Media and its Negative Impacts on Autonomy

Social Determinants of Mental Health: Where We Are and Where We Need to Go

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 1261 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fairness in vulnerable attribute prediction on social media

Abstract

Access this article

Similar content being viewed by others

Advances in Social Media Research: Past, Present and Future

Social Media and its Negative Impacts on Autonomy

Social Determinants of Mental Health: Where We Are and Where We Need to Go

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 1261 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation