Abstract
To tackle the societal and person-specific adverse consequences of long-term unemployment, many public employment services (PES) have implemented data-driven profiling systems to promptly identify vulnerable job seekers. More recently, PES increasingly rely on more complex machine learning (ML) models due to their enhanced accuracy. However, increasing concerns are raised regarding the algorithmic opacity, which hinders comprehension and trust in the predictions. The current study focuses on the explainability of the ML-based profiling model deployed at the Flemish PES (VDAB), aiming to predict clients’ likelihood of securing sustainable employment. We compare two explainability techniques: (1) TreeSHAP is a state-of-the-art method grounded in the theoretical properties of the Shapley values, and (2) TreeInterpreter is a computationally feasible approximation that foregoes some of these properties. Leveraging multiple evaluation metrics, our findings suggest that for tree-based models, approximations to the SHAP (SHapley Additive exPlanations) values yield very similar insights and maintain explanatory performance while minimizing computational overhead. This enables institutions with large client bases to generate real-time explanations without being compelled to deteriorate the model’s accuracy. Additionally, our analysis identifies key predictors of job seekers’ employment prospects, offering valuable insights for PES and related agencies striving to improve their support for job seekers in need. Clients’ online behavior, acting as a proxy for hard-to-measure job search intensity and motivation, emerges as a key component in the profiling model, presenting promising opportunities for future profiling efforts.




Similar content being viewed by others
Data Availability Statement
The data that support the findings of this study are available from Vlaamse Dienst voor Arbeidsbemiddeling en Beroepsopleiding (VDAB), but restrictions apply to the availability of these data, which were used under license for the current study and so are not publicly available. The data are, however, available from the authors upon reasonable request and with the permission of Vlaamse Dienst voor Arbeidsbemiddeling en Beroepsopleiding (VDAB).
References
Loxha A, Morgandi M. Profiling the unemployed: a review of OECD experiences and implications for emerging economies. In: Social protection discussion papers and notes, 91051. The World Bank. 2014. https://ideas.repec.org/p/wbk/hdnspu/91051.html. Accessed 8 July 2020.
Soukup T. Profiling: predicting long-term unemployment at the individual level. Central Eur J Public Policy. 2011;5(1):118–43.
Desiere S, Struyven L. Using artificial intelligence to classify job seekers: the accuracy-equity trade-off. J Soc Policy. 2021;50(2):367–85. https://doi.org/10.1017/S0047279420000203.
van Landeghem BD, Sam Struyven L. Statistical profiling of unemployed job seekers. IZA World of Labor. 2021. https://doi.org/10.15185/izawol.483.
Lepri B, Oliver N, Letouzé E, Pentland A, Vinck P. Fair, transparent, and accountable algorithmic decision-making processes. Philos Technol. 2018;31(4):611–27. https://doi.org/10.1007/s13347-017-0279-x.
Scoppetta A, Buckenleib A. Tackling long-term unemployment through risk profiling and outreach. In: A discussion paper from the employment thematic network. Technical Dossier no. 6. Eur. Comm.–ESF Transnatl, Coop. 2018;6:1–28.
Brandt M, Hank K. Scars that will not disappear: long-term associations between early and later life unemployment under different welfare regimes. J Soc Policy. 2014;43(4):727–43. https://doi.org/10.1017/S0047279414000397.
Eurofound, Adăscăliței D, Weber T. Tackling labor shortages in EU Member States. Publications Office of the European Union, Luxembourg. 2021. https://doi.org/10.2806/363602.
Henman PWF. Digital social policy: past, present, future. J Soc Policy. 2022;51(3):535–50. https://doi.org/10.1017/S0047279422000162.
Barnes SA, Wright S, Irving P, Deganis I. Identification of latest trends and current developments in methods to profile job seekers in European public employment services: final report. Directorate-General for Employment, Social Affairs and Inclusion, European Commission, Brussels. 2015.
Lechner M, Smith J. What is the value added by caseworkers? Labour Econ. 2007;14(2):135–51. https://doi.org/10.1016/j.labeco.2004.12.002.
Caswell D, Marston G, Larsen JE. Unemployed citizen or ‘at risk’ client? Classification systems and employment services in Denmark and Australia. Crit Soc Policy. 2010;30(3):384–404. https://doi.org/10.1177/0261018310367674.
Zejnilović L, Lavado S, Martínez de Rituerto de Troya Í, Sim S, Bell A. Algorithmic long-term unemployment risk assessment in use: counselors’ perceptions and use practices. Glob Perspect. 2020. https://doi.org/10.1525/gp.2020.12908.
Wang W, Qiu L, Kim D, Benbasat I. Effects of rational and social appeals of online recommendation agents on cognition-and affect-based trust. Decis Support Syst. 2016;86:48–60. https://doi.org/10.1016/j.dss.2016.03.007.
Moerel L, Storm M. Automated decisions based on profiling: information, explanation or justification—that is the question! autonomous systems and the law. 2019. https://doi.org/10.2139/ssrn.3356631.
Breiman L. Random forests. Mach Learn. 2001;45(1):5–32. https://doi.org/10.1023/A:1010933404324.
Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz R, Himmelfarb J, Bansal N, Lee S-I. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell. 2020;2(1):56–67. https://doi.org/10.1038/s42256-019-0138-9.
Saabas A. Interpreting random forests. 2014. http://blog.datadive.net/interpreting-random-forests/. Accessed 9 Aug 2021.
Bloch L, Friedrich CM, For the Alzheimer’s Disease Neuroimaging Initiative. Machine learning workflow to explain black-box models for early Alzheimer’s disease classification evaluated for multiple datasets. SN Comput Sci. 2022;3(6):509. https://doi.org/10.1007/s42979-022-01371-y.
Banerjee JS, Mahmud M, Brown D. Heart rate variability-based mental stress detection: an explainable machine learning approach. SN Comput Sci. 2023;4(2):176. https://doi.org/10.1007/s42979-022-01605-z.
Inan MSK, Rahman I. Explainable AI integrated feature selection for landslide susceptibility mapping using TreeSHAP. SN Comput Sci. 2023;4(5):482. https://doi.org/10.1007/s42979-023-01960-5.
Hu X, Zhang X, Lovrich N. Public perceptions of police behavior during traffic stops: logistic regression and machine learning approaches compared. J Comput Soc Sci. 2021;4(1):355–80. https://doi.org/10.1007/s42001-020-00079-4.
Molnar C. Interpretable machine learning: a guide for making black box models explainable. https://christophm.github.io/interpretable-ml-book/. Accessed 31 Aug 2021.
Kumar IE, Venkatasubramanian S, Scheidegger C, Friedler S. Problems with Shapley-value-based explanations as feature importance measures. In: Proceedings of the international conference on machine learning (ICML). 2020. p. 5491–500. PMLR. http://proceedings.mlr.press/v119/kumar20e/kumar20e.pdf.
Walker R, Brown L, Moskos M, Isherwood L, Osborne K, Patel K, King D. ‘They really get you motivated’: experiences of a life-first employment programme from the perspective of long-term unemployed Australians. J Soc Policy. 2016;45(3):507–26. https://doi.org/10.1017/S0047279416000027.
Nguyen AP, Martínez MR. On quantitative aspects of model interpretability. 2020. arXiv:2007.07584
Dumitrescu E, Hué S, Hurlin C, Tokpavi S. Machine learning for credit scoring: improving logistic regression with non-linear decision-tree effects. Eur J Oper Res. 2022;297(3):1178–92. https://doi.org/10.1016/j.ejor.2021.06.053.
Bock KWD, den Poel DV. Reconciling performance and interpretability in customer churn prediction using ensemble learning based on generalized additive models. Expert Syst Appl. 2012;39(8):6816–26. https://doi.org/10.1016/j.eswa.2012.01.014.
Desiere S, Langenbucher K, Struyven L. Statistical profiling in public employment services: an international comparison. In: OECD Social, Employment and Migration Working Papers, No. 224. OECD Publishing, Paris. 2019. https://doi.org/10.1787/b5e5f16e-en. Accessed 8 July 2020.
Wijnhoven MA, Havinga H. The Work Profiler: a digital instrument for selection and diagnosis of the unemployed. Local Econ. 2014;29(6–7):740–9. https://doi.org/10.1177/0269094214545045.
Allhutter D, Cech F, Fischer F, Grill G, Mager A. Algorithmic profiling of job seekers in Austria: how austerity politics are made effective. Front Big Data. 2020. p. 5. https://doi.org/10.3389/fdata.2020.00005.
Kern C, Bach RL, Mautner H, Kreuter F. Fairness in algorithmic profiling: a German case study. 2021. arXiv:2108.04134.
Grundy J. Statistical profiling of the unemployed. Stud Polit Econ. 2015;96(1):47–68. https://doi.org/10.1080/19187033.2015.11674937.
Sztandar-Sztanderska K, Zielenska M. Changing social citizenship through information technology. Soc Work Soc. 2018;16(2):1–13.
Matty, S. Predicting likelihood of long-term unemployment: the development of a UK Job seekers' Classification Instrument. In: Department for Work and Pensions Working Paper, No. 116. 2013. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/210303/WP116.pdf. Accessed 8 July 2020.
de Troya ÍMDR, Chen R, Moraes LO, Bajaj P, Kupersmith J, Ghani R, Brás NB, Zejnilovic L. Predicting, explaining, and understanding risk of long-term unemployment. In: 32nd conference on neural information processing systems (NeurIPS) workshop on AI for social good. 2018. https://www.researchgate.net/profile/Laura-Moraes-3/publication/342452939_Predicting_explaining_and_understanding_risk_of_long-term_unemployment/links/5ef5073f92851c52d6fdb7b7/Predicting-explaining-and-understanding-risk-of-long-term-unemployment.pdf. Accessed 4 Aug 2020.
Caigny AD, Coussement K, Bock KWD. A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees. Eur J Oper Res. 2018;269(2):760–72. https://doi.org/10.1016/j.ejor.2018.02.009.
Kütük Y, Güloğlu B. Prediction of transition probabilities from unemployment to employment for Turkey via machine learning and econometrics: a comparative study. J Res Econ. 2019;3(1):58–75.
Boškoski P, Perne M, Rameša M, Boshkoska BM. Variational Bayes survival analysis for unemployment modelling. Knowl Based Syst. 2021;229: 107335. https://doi.org/10.1016/j.knosys.2021.107335.
Zhao L. Data-driven approach for predicting and explaining the risk of long-term unemployment. In: E3S Web of Conferences, vol. 214, 01023. EDP Sciences. 2020. https://doi.org/10.1051/e3sconf/202021401023.
Chen H, Janizek JD, Lundberg S, Lee SI. True to the Model or True to the Data?. 2020. arXiv preprint arXiv: 2006.16234
Janzing D, Minorics L, Blöbaum P. Feature relevance quantification in explainable AI: a causal problem. In: International conference on artificial intelligence and statistics. PMLR. 2020. p. 2907–2916. http://proceedings.mlr.press/v108/janzing20a/janzing20a.pdf.
Sundararajan M, Najmi A. The many Shapley values for model explanation. In: International conference on machine learning. PMLR. 2020. p. 9269–9278. http://proceedings.mlr.press/v119/sundararajan20b/sundararajan20b.pdf.
Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz R, Himmelfarb J, Bansal N, Lee S-I. Explainable AI for trees: from local explanations to global understanding. 2019. arXiv:1905.04610.
Frye C, Rowat C, Feige I. Asymmetric Shapley values: incorporating causal knowledge into model-agnostic explainability. In: Advances in neural information processing systems, vol 33. 2020. p. 1229–1239. https://proceedings.neurips.cc/paper/2020/file/0d770c496aa3da6d2c3f2bd19e7b9d6b-Paper.pdf.
Lundberg SM, Lee S. A unified approach to interpreting model predictions. In: Proceedings of the 31st international conference on neural information processing systems (NIPS’17). Curran Associates Inc., Red Hook. 2017. p. 4768–4777. https://proceedings.neurips.cc/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf.
Ancona M, Ceolini E, Öztireli C, Gross M. Gradient-based attribution methods. In: Samek W, Montavon G, Vedaldi A, Hansen L, Müller KR, editors. Explainable AI: interpreting, explaining and visualizing deep learning. Lecture notes in computer science. Springer, Cham, vol 11700. 2019. p. 169–191. https://doi.org/10.1007/978-3-030-28954-6_9.
Okeson A, Caruana R, Craswell N, Inkpen K, Lundberg SM, Nori H, Wallach HM, Vaughan JW. Summarize with caution: comparing global feature attributions. IEEE Data Eng Bull. 2021;44(4):14–27.
Montavon G, Samek W, Müller KR. Methods for interpreting and understanding deep neural networks. Digit Signal Process. 2018;73:1–15. https://doi.org/10.1016/j.dsp.2017.10.011.
Hohmeyer K, Lietzmann T. Persistence of welfare receipt and unemployment in Germany: determinants and duration dependence. J Soc Policy. 2020;49(2):299–322. https://doi.org/10.1017/S0047279419000242.
Vansteenkiste S, Deschacht N, Sels L. Why are unemployed aged fifty and over less likely to find a job? A decomposition analysis. J Vocat Behav. 2015;90:55–65. https://doi.org/10.1016/j.jvb.2015.07.004.
Considine M, McGann M, Ball S, Nguyen P. Can robots understand welfare? Exploring machine bureaucracies in welfare-to-work. J Soc Policy. 2022;51(3):519–34. https://doi.org/10.1017/S0047279422000174.
Kanfer R, Wanberg CR. Job search and employment: a personality-motivational analysis and meta-analytic review. J Appl Psychol. 2017;86:837–55. https://doi.org/10.1037/0021-9010.86.5.837.
Vansteenkiste S, Verbruggen M, Sels L. Flexible job search behavior among unemployed job seekers: antecedents and outcomes. Eur J Work Organ Psychol. 2016;25(6):862–82. https://doi.org/10.1080/1359432X.2016.116840.
Chen H, Covert IC, Lundberg SM, Lee S-I. Algorithms to estimate Shapley value feature attributions. Nat Mach Intell. 2023;5:590–601. https://doi.org/10.1038/s42256-023-00657-x.
Acknowledgements
We would like to express our gratitude to Dr. Karolien Scheerlinck, Stijn Van De Velde, Joris Van Den Bossche, and Dieter Verbeemen of the VDAB AI Team for their assistance and feedback provided throughout this research project.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
This research is supported by the Career Management Analytics research chair, sponsored by the Flemish PES (VDAB: Vlaamse Dienst voor Arbeidsbemiddeling en Beroepsopleiding).
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
See Fig. 5.
SHAP decision plot for a local explanation of a client with a predicted employment likelihood of 8.9%. The plot shows the 20 most important drivers behind the model’s decision, with the features with the highest impact plotted at the top. The decision line starts at the base rate (mean of outcome variable for all training observations as shown by the vertical line) and incrementally adds the attribution values for all features until the final prediction is reached (colored bar above). The distance between the vertical line and the start of the blue decision line at the bottom of plot shows the sum of the attribution values for the features left out of the plot. The feature values of the client are plotted in the figure between brackets (e.g., the most influential feature for this client was the average unemployment duration in previous unemployment episodes of 1112 days)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Dossche, W., Vansteenkiste, S., Baesens, B. et al. Interpretable and Accurate Identification of Job Seekers at Risk of Long-Term Unemployment: Explainable ML-Based Profiling. SN COMPUT. SCI. 5, 536 (2024). https://doi.org/10.1007/s42979-024-02884-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-024-02884-4