Abstract
Many statistical models have been developed to understand the causes of unemployment, but predicting unemployment has received less attention. In this study, we develop a model to predict the labour market state of a person based on machine learning trained with a large administrative unemployment registry. The model specifies individuals as Markov chains with person specific transition rates. We evaluate the model on three tasks, where the goal is to predict who has the highest risk of escaping unemployment, becoming unemployed, and being unemployed at any given time. We obtain good performance (AUC: 0.80) for the machine learning model of lifetime unemployment, and very good performance (AUC: 0.90+) to the near future when we know the recent labour market state of a person. We find that person information affects the predictions in an intuitive way, but there still are significant differences that can be learned by utilizing labour market histories.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ernst, E., Rani, U.: Understanding unemployment flows. Oxford Rev. Econ. Pol. 27(2), 268–294 (2011)
Shimer, R.: Reassessing the ins and outs of unemployment. Rev. Econ. Dyn. 15(2), 127–148 (2012)
Ahn, H.J., Hamilton, J.D.: Heterogeneity and unemployment dynamics. J. Bus. Econ. Stat. 1–26 (2019)
Honkanen, P.: Odotelaskelmat työllisyyden, työttömyyden ja eläkeajan arvioinnissa. KELA Working Papers, No. 137 (2018)
Pedersen, P.J., Westergård-Nielsen, N.C.: Unemployment. A review of the evidence from panel data. In: Economics of Unemployment. Edward Elgar Publishing (2000)
Wanberg, C.R.: The individual experience of unemployment. Ann. Rev. Psychol. 63, 369–396 (2012)
Kettunen, J.: Education and unemployment duration. Econ. Educ. Rev. 16(2), 163–170 (1997)
Ollikainen, V.: The determinants of unemployment duration by gender in Finland. VATT Discussion Papers, No. 316 (2003)
Kyyrä, T.: Partial unemployment insurance benefits and the transition rate to regular work. Eur. Econ. Rev. 54(7), 911–930 (2010)
Rokkanen, M., Uusitalo, R.: Changes in job stability: evidence from lifetime job histories. IZA Discussion Papers, No. 4721 (2010)
Asplund, R.: Unemployment among finnish manufacturing workers. Who gets unemployed and from where? ETLA Discussion Papers, No. 711 (2000)
Eriksson, T., Pehkonen, J.: Unemployment flows in Finland, 1969–95: a time series analysis. Labour 12(3), 571–593 (1998)
Peltola, M.: Työmarkkinasiirtymät Suomessa. Työllisyyden päättymisen jälkeinen työmarkkinasiirtymien dynamiikka vuosina 1995–1999. VATT Discussion Papers, No. 360 (2005)
Heckman, J.J., Borjas, G.J.: Does unemployment cause future unemployment? Definitions, questions and answers from a continuous time model of heterogeneity and state dependence. Economica 47(187), 247–283 (1980)
Flinn, C.J., Heckman, J.J.: New methods for analyzing individual event histories. Sociol. Methodol. 13, 99–140 (1982)
Mühleisen, M., Zimmermann, K.F.: A panel analysis of job changes and unemployment. Eur. Econ. Rev. 38(3–4), 793–801 (1994)
D’Amuri, F., Marcucci, J.: The predictive power of Google searches in forecasting US unemployment. Int. J. Forecast. 33(4), 801–816 (2017)
Tuhkuri, J.: ETLAnow: a model for forecasting with big data-forecasting unemployment with Google searches in Europe. No. 54. ETLA Report (2016)
Katris, C.: Prediction of unemployment rates with time series and machine learning techniques. Comput. Econ. 55, 673–706 (2019). https://doi.org/10.1007/s10614-019-09908-9
de Troya, Í.M.R., et al.: Predicting, explaining and understanding risk of long-term unemployment. In: 32nd Conference on Neural Information Processing Systems (2018)
Kütük, Y., Güloğlu, B.: Prediction of transition probabilities from unemployment to employment for Turkey via machine learning and econometrics: a comparative study. J. Res. Econ. 3(1), 58–75 (2019)
Beyersmann, J., Allignol, A., Schumacher, M.: Competing Risks and Multistate Models with R. Springer, Heidelberg (2011). https://doi.org/10.1007/978-1-4614-2035-4
Tutz, G., Schmid, M.: Modeling Discrete Time-to-Event Data. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-28158-2
Duchateau, L., Janssen, P.: The Frailty Model. Springer, Heidelberg (2007). https://doi.org/10.1007/978-0-387-72835-3
Cook, R.J., Lawless, J.: The Statistical Analysis of Recurrent Events. Springer, Heidelberg (2007). https://doi.org/10.1007/978-0-387-69810-6
Rausand, M., Høyland, A.: System Reliability Theory: Models, Statistical Methods, and Applications, vol. 396. Wiley, Hoboken (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Unemployment registry data has a number of potential biases if we want to generalize the results to the entire population. In that case, the training set should contain all 18 to 64 year olds currently residing in Varsinais-Suomi with their recurrent unemployment and employment spells. However, the unemployment registry is sampled monthly and contains only people who have been jobseekers at least once during the sampling.
1.1 The Data Set Includes Only People Who Have Been Unemployed
Persons who have not been job seekers during 2013 to 2017 in the unemployment agencies are missing from the original data set because they do not have a registry entry. This is the case for people who have not been unemployed. We compared the data set to the official yearly statistics in Fig. 6, where we find that about 50% of the labour force in Varsinais-Suomi is missing. The unemployment exit rate is not biased, because every unemployed person is included. However, the baseline unemployment entry rate and prevalence are too high, because many people who are never unemployed are missing as negative examples. In other words, by definition we are analyzing unemployment among all people who experience unemployment at least once during the follow-up period.
It is still possible to estimate the true person specific rates from model predictions. Assume that the true unemployment entry rate is \(\mu _i\) and the exit rate is \(\lambda _i\) for person i. Denote the length of a ’not unemployed’ spell as T and the length of follow-up as t. The probability of missing from the data corresponds to probability of starting outside unemployment and remaining at that state the entire time: \(P(\{X_i(t)=0\}_{t=0,1,...})=P(X_i(0)=0)P(T > t)\). The data contains all of the ’unemployed’ observations \(P(X_i(t)=1)=\frac{\mu _i}{\lambda _i+\mu _i}\) but the proportion of ’not unemployed’ observations included is only \(P(X_i(t)=0)-P(\{X_i(t)=1\}_{t=0,1,...})=P(X_i(0)=0)P(T\le t)=\frac{\lambda _i}{\lambda _i+\mu _i}(1-e^{\mu _i t})\). Denote the observed odds of unemployment \(\frac{\mu _i^*}{\lambda _i}\), which should be equal to the odds \(P(X_i(t)=1)/ P(X_i(t)=0)P(T\le t)=\frac{\mu _i}{\lambda _i (1-e^{-\mu _i t})}\). This means we can solve:
We then obtain the true rate \(\mu _i\) that produces the observed unemployment entry rate \(\mu _i^*\). With increasing follow-up \(t\longrightarrow \infty \) we gather all samples.
1.2 The Data Set Excludes Some Short Spells
Spells shorter than one calendar month are undersampled because the registry status is recorded monthly. Such persons may enter and exit unemployment in between monthly measurements without being recorded. We estimate how many percent of spells are missing by calculating the Kaplan-Meier estimate \(S(t)=P(T>t)\) of the first spell length T in Fig. 7. The second month hazard can be used to estimate the true first month hazard, as shown in the bottom left figure. For example, assuming that first month hazards should be 0.25/month (exit) and 0.12/month (entry), the percentage of spells that end in the first month should be \(1-e^{-0.25}\approx \)22% (exit) and \(1-e^{-0.12}\approx \)11% (entry) instead of the \(1-e^{-0.12}\approx 11\%\) (exit) and \(1-e^(-0.06)\approx 6\%\) (entry) that were observed. These spells are a small subset of the data set, and short spells do not meaningfully contribute to the total amount of unemployment
1.3 The Data Set Includes Some People Who have a Moved Out
Finally, we have no knowledge of who remain or move out of the Varsinais-Suomi area. The data set may include persons who have moved out and are not at risk of being recorded in the unemployment registry. This bias can be estimated with a simple Monte Carlo simulation. From the government movement statistics in the years 2013-2017 (StatFin) we can calculate the migration rates within Finland. Each year on average 2.1% of the Varsinais-Suomi population moved into other economic areas, and 0.21% of the population in other areas moved into Varsinais-Suomi. We assume that the migration of people follows a Markov chain with the corresponding monthly transition probabilities. We then overlay the movement patterns generated from this Markov Chain into the data set as seen in the left of Fig. 8, and calculate the percentage of people that are outside Varsinais-Suomi each month in the right of Fig. 8. This implies that about 6% of the samples at each time were probably outside Varsinais-Suomi.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Viljanen, M., Pahikkala, T. (2020). Predicting Unemployment with Machine Learning Based on Registry Data. In: Dalpiaz, F., Zdravkovic, J., Loucopoulos, P. (eds) Research Challenges in Information Science. RCIS 2020. Lecture Notes in Business Information Processing, vol 385. Springer, Cham. https://doi.org/10.1007/978-3-030-50316-1_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-50316-1_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-50315-4
Online ISBN: 978-3-030-50316-1
eBook Packages: Computer ScienceComputer Science (R0)