Predicting Unemployment with Machine Learning Based on Registry Data

Viljanen, Markus; Pahikkala, Tapio

doi:10.1007/978-3-030-50316-1_21

Markus Viljanen⁹ &
Tapio Pahikkala⁹

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 385))

Included in the following conference series:

International Conference on Research Challenges in Information Science

2366 Accesses

Abstract

Many statistical models have been developed to understand the causes of unemployment, but predicting unemployment has received less attention. In this study, we develop a model to predict the labour market state of a person based on machine learning trained with a large administrative unemployment registry. The model specifies individuals as Markov chains with person specific transition rates. We evaluate the model on three tasks, where the goal is to predict who has the highest risk of escaping unemployment, becoming unemployed, and being unemployed at any given time. We obtain good performance (AUC: 0.80) for the machine learning model of lifetime unemployment, and very good performance (AUC: 0.90+) to the near future when we know the recent labour market state of a person. We find that person information affects the predictions in an intuitive way, but there still are significant differences that can be learned by utilizing labour market histories.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Towards Real-Time Prediction of Unemployment and Profession

Unemployment in Rural Europe: A Machine Learning Perspective

Article Open access 02 June 2022

Mortality Modeling of Partially Observed Cohorts Using Administrative Death Records

Article Open access 26 April 2023

References

Ernst, E., Rani, U.: Understanding unemployment flows. Oxford Rev. Econ. Pol. 27(2), 268–294 (2011)
Article Google Scholar
Shimer, R.: Reassessing the ins and outs of unemployment. Rev. Econ. Dyn. 15(2), 127–148 (2012)
Article Google Scholar
Ahn, H.J., Hamilton, J.D.: Heterogeneity and unemployment dynamics. J. Bus. Econ. Stat. 1–26 (2019)
Google Scholar
Honkanen, P.: Odotelaskelmat työllisyyden, työttömyyden ja eläkeajan arvioinnissa. KELA Working Papers, No. 137 (2018)
Google Scholar
Pedersen, P.J., Westergård-Nielsen, N.C.: Unemployment. A review of the evidence from panel data. In: Economics of Unemployment. Edward Elgar Publishing (2000)
Google Scholar
Wanberg, C.R.: The individual experience of unemployment. Ann. Rev. Psychol. 63, 369–396 (2012)
Article Google Scholar
Kettunen, J.: Education and unemployment duration. Econ. Educ. Rev. 16(2), 163–170 (1997)
Article Google Scholar
Ollikainen, V.: The determinants of unemployment duration by gender in Finland. VATT Discussion Papers, No. 316 (2003)
Google Scholar
Kyyrä, T.: Partial unemployment insurance benefits and the transition rate to regular work. Eur. Econ. Rev. 54(7), 911–930 (2010)
Article Google Scholar
Rokkanen, M., Uusitalo, R.: Changes in job stability: evidence from lifetime job histories. IZA Discussion Papers, No. 4721 (2010)
Google Scholar
Asplund, R.: Unemployment among finnish manufacturing workers. Who gets unemployed and from where? ETLA Discussion Papers, No. 711 (2000)
Google Scholar
Eriksson, T., Pehkonen, J.: Unemployment flows in Finland, 1969–95: a time series analysis. Labour 12(3), 571–593 (1998)
Article Google Scholar
Peltola, M.: Työmarkkinasiirtymät Suomessa. Työllisyyden päättymisen jälkeinen työmarkkinasiirtymien dynamiikka vuosina 1995–1999. VATT Discussion Papers, No. 360 (2005)
Google Scholar
Heckman, J.J., Borjas, G.J.: Does unemployment cause future unemployment? Definitions, questions and answers from a continuous time model of heterogeneity and state dependence. Economica 47(187), 247–283 (1980)
Article Google Scholar
Flinn, C.J., Heckman, J.J.: New methods for analyzing individual event histories. Sociol. Methodol. 13, 99–140 (1982)
Article Google Scholar
Mühleisen, M., Zimmermann, K.F.: A panel analysis of job changes and unemployment. Eur. Econ. Rev. 38(3–4), 793–801 (1994)
Article Google Scholar
D’Amuri, F., Marcucci, J.: The predictive power of Google searches in forecasting US unemployment. Int. J. Forecast. 33(4), 801–816 (2017)
Article Google Scholar
Tuhkuri, J.: ETLAnow: a model for forecasting with big data-forecasting unemployment with Google searches in Europe. No. 54. ETLA Report (2016)
Google Scholar
Katris, C.: Prediction of unemployment rates with time series and machine learning techniques. Comput. Econ. 55, 673–706 (2019). https://doi.org/10.1007/s10614-019-09908-9
Article Google Scholar
de Troya, Í.M.R., et al.: Predicting, explaining and understanding risk of long-term unemployment. In: 32nd Conference on Neural Information Processing Systems (2018)
Google Scholar
Kütük, Y., Güloğlu, B.: Prediction of transition probabilities from unemployment to employment for Turkey via machine learning and econometrics: a comparative study. J. Res. Econ. 3(1), 58–75 (2019)
Google Scholar
Beyersmann, J., Allignol, A., Schumacher, M.: Competing Risks and Multistate Models with R. Springer, Heidelberg (2011). https://doi.org/10.1007/978-1-4614-2035-4
Book MATH Google Scholar
Tutz, G., Schmid, M.: Modeling Discrete Time-to-Event Data. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-28158-2
Book MATH Google Scholar
Duchateau, L., Janssen, P.: The Frailty Model. Springer, Heidelberg (2007). https://doi.org/10.1007/978-0-387-72835-3
Book MATH Google Scholar
Cook, R.J., Lawless, J.: The Statistical Analysis of Recurrent Events. Springer, Heidelberg (2007). https://doi.org/10.1007/978-0-387-69810-6
Book MATH Google Scholar
Rausand, M., Høyland, A.: System Reliability Theory: Models, Statistical Methods, and Applications, vol. 396. Wiley, Hoboken (2003)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Turun Yliopisto, Turku, Finland
Markus Viljanen & Tapio Pahikkala

Authors

Markus Viljanen
View author publications
You can also search for this author in PubMed Google Scholar
Tapio Pahikkala
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Markus Viljanen .

Editor information

Editors and Affiliations

Utrecht University, Utrecht, The Netherlands
Fabiano Dalpiaz
Stockholm University, Kista, Sweden
Jelena Zdravkovic
The Institute of Digital Innovation and Research, Dublin, Ireland
Pericles Loucopoulos

Appendix

Unemployment registry data has a number of potential biases if we want to generalize the results to the entire population. In that case, the training set should contain all 18 to 64 year olds currently residing in Varsinais-Suomi with their recurrent unemployment and employment spells. However, the unemployment registry is sampled monthly and contains only people who have been jobseekers at least once during the sampling.

1.1 The Data Set Includes Only People Who Have Been Unemployed

Persons who have not been job seekers during 2013 to 2017 in the unemployment agencies are missing from the original data set because they do not have a registry entry. This is the case for people who have not been unemployed. We compared the data set to the official yearly statistics in Fig. 6, where we find that about 50% of the labour force in Varsinais-Suomi is missing. The unemployment exit rate is not biased, because every unemployed person is included. However, the baseline unemployment entry rate and prevalence are too high, because many people who are never unemployed are missing as negative examples. In other words, by definition we are analyzing unemployment among all people who experience unemployment at least once during the follow-up period.

It is still possible to estimate the true person specific rates from model predictions. Assume that the true unemployment entry rate is $\mu _i$ and the exit rate is $\lambda _i$ for person i. Denote the length of a ’not unemployed’ spell as T and the length of follow-up as t. The probability of missing from the data corresponds to probability of starting outside unemployment and remaining at that state the entire time: $P(\{X_i(t)=0\}_{t=0,1,...})=P(X_i(0)=0)P(T > t)$. The data contains all of the ’unemployed’ observations $P(X_i(t)=1)=\frac{\mu _i}{\lambda _i+\mu _i}$ but the proportion of ’not unemployed’ observations included is only $P(X_i(t)=0)-P(\{X_i(t)=1\}_{t=0,1,...})=P(X_i(0)=0)P(T\le t)=\frac{\lambda _i}{\lambda _i+\mu _i}(1-e^{\mu _i t})$. Denote the observed odds of unemployment $\frac{\mu _i^*}{\lambda _i}$, which should be equal to the odds $P(X_i(t)=1)/ P(X_i(t)=0)P(T\le t)=\frac{\mu _i}{\lambda _i (1-e^{-\mu _i t})}$. This means we can solve:

$$\begin{aligned} \begin{array}{c} \frac{\mu _i}{1-e^{-\mu _i t}} = \mu _i^* \end{array} \end{aligned}$$

(16)

We then obtain the true rate $\mu _i$ that produces the observed unemployment entry rate $\mu _i^*$. With increasing follow-up $t\longrightarrow \infty $ we gather all samples.

1.2 The Data Set Excludes Some Short Spells

Spells shorter than one calendar month are undersampled because the registry status is recorded monthly. Such persons may enter and exit unemployment in between monthly measurements without being recorded. We estimate how many percent of spells are missing by calculating the Kaplan-Meier estimate $S(t)=P(T>t)$ of the first spell length T in Fig. 7. The second month hazard can be used to estimate the true first month hazard, as shown in the bottom left figure. For example, assuming that first month hazards should be 0.25/month (exit) and 0.12/month (entry), the percentage of spells that end in the first month should be $1-e^{-0.25}\approx $22% (exit) and $1-e^{-0.12}\approx $11% (entry) instead of the $1-e^{-0.12}\approx 11\%$ (exit) and $1-e^(-0.06)\approx 6\%$ (entry) that were observed. These spells are a small subset of the data set, and short spells do not meaningfully contribute to the total amount of unemployment

1.3 The Data Set Includes Some People Who have a Moved Out

Finally, we have no knowledge of who remain or move out of the Varsinais-Suomi area. The data set may include persons who have moved out and are not at risk of being recorded in the unemployment registry. This bias can be estimated with a simple Monte Carlo simulation. From the government movement statistics in the years 2013-2017 (StatFin) we can calculate the migration rates within Finland. Each year on average 2.1% of the Varsinais-Suomi population moved into other economic areas, and 0.21% of the population in other areas moved into Varsinais-Suomi. We assume that the migration of people follows a Markov chain with the corresponding monthly transition probabilities. We then overlay the movement patterns generated from this Markov Chain into the data set as seen in the left of Fig. 8, and calculate the percentage of people that are outside Varsinais-Suomi each month in the right of Fig. 8. This implies that about 6% of the samples at each time were probably outside Varsinais-Suomi.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Viljanen, M., Pahikkala, T. (2020). Predicting Unemployment with Machine Learning Based on Registry Data. In: Dalpiaz, F., Zdravkovic, J., Loucopoulos, P. (eds) Research Challenges in Information Science. RCIS 2020. Lecture Notes in Business Information Processing, vol 385. Springer, Cham. https://doi.org/10.1007/978-3-030-50316-1_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-50316-1_21
Published: 25 June 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-50315-4
Online ISBN: 978-3-030-50316-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Predicting Unemployment with Machine Learning Based on Registry Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Towards Real-Time Prediction of Unemployment and Profession

Unemployment in Rural Europe: A Machine Learning Perspective

Mortality Modeling of Partially Observed Cohorts Using Administrative Death Records

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 The Data Set Includes Only People Who Have Been Unemployed

1.2 The Data Set Excludes Some Short Spells

1.3 The Data Set Includes Some People Who have a Moved Out

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us