Skip to main content

Predicting Unemployment with Machine Learning Based on Registry Data

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 385))

Abstract

Many statistical models have been developed to understand the causes of unemployment, but predicting unemployment has received less attention. In this study, we develop a model to predict the labour market state of a person based on machine learning trained with a large administrative unemployment registry. The model specifies individuals as Markov chains with person specific transition rates. We evaluate the model on three tasks, where the goal is to predict who has the highest risk of escaping unemployment, becoming unemployed, and being unemployed at any given time. We obtain good performance (AUC: 0.80) for the machine learning model of lifetime unemployment, and very good performance (AUC: 0.90+) to the near future when we know the recent labour market state of a person. We find that person information affects the predictions in an intuitive way, but there still are significant differences that can be learned by utilizing labour market histories.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Ernst, E., Rani, U.: Understanding unemployment flows. Oxford Rev. Econ. Pol. 27(2), 268–294 (2011)

    Article  Google Scholar 

  2. Shimer, R.: Reassessing the ins and outs of unemployment. Rev. Econ. Dyn. 15(2), 127–148 (2012)

    Article  Google Scholar 

  3. Ahn, H.J., Hamilton, J.D.: Heterogeneity and unemployment dynamics. J. Bus. Econ. Stat. 1–26 (2019)

    Google Scholar 

  4. Honkanen, P.: Odotelaskelmat työllisyyden, työttömyyden ja eläkeajan arvioinnissa. KELA Working Papers, No. 137 (2018)

    Google Scholar 

  5. Pedersen, P.J., Westergård-Nielsen, N.C.: Unemployment. A review of the evidence from panel data. In: Economics of Unemployment. Edward Elgar Publishing (2000)

    Google Scholar 

  6. Wanberg, C.R.: The individual experience of unemployment. Ann. Rev. Psychol. 63, 369–396 (2012)

    Article  Google Scholar 

  7. Kettunen, J.: Education and unemployment duration. Econ. Educ. Rev. 16(2), 163–170 (1997)

    Article  Google Scholar 

  8. Ollikainen, V.: The determinants of unemployment duration by gender in Finland. VATT Discussion Papers, No. 316 (2003)

    Google Scholar 

  9. Kyyrä, T.: Partial unemployment insurance benefits and the transition rate to regular work. Eur. Econ. Rev. 54(7), 911–930 (2010)

    Article  Google Scholar 

  10. Rokkanen, M., Uusitalo, R.: Changes in job stability: evidence from lifetime job histories. IZA Discussion Papers, No. 4721 (2010)

    Google Scholar 

  11. Asplund, R.: Unemployment among finnish manufacturing workers. Who gets unemployed and from where? ETLA Discussion Papers, No. 711 (2000)

    Google Scholar 

  12. Eriksson, T., Pehkonen, J.: Unemployment flows in Finland, 1969–95: a time series analysis. Labour 12(3), 571–593 (1998)

    Article  Google Scholar 

  13. Peltola, M.: Työmarkkinasiirtymät Suomessa. Työllisyyden päättymisen jälkeinen työmarkkinasiirtymien dynamiikka vuosina 1995–1999. VATT Discussion Papers, No. 360 (2005)

    Google Scholar 

  14. Heckman, J.J., Borjas, G.J.: Does unemployment cause future unemployment? Definitions, questions and answers from a continuous time model of heterogeneity and state dependence. Economica 47(187), 247–283 (1980)

    Article  Google Scholar 

  15. Flinn, C.J., Heckman, J.J.: New methods for analyzing individual event histories. Sociol. Methodol. 13, 99–140 (1982)

    Article  Google Scholar 

  16. Mühleisen, M., Zimmermann, K.F.: A panel analysis of job changes and unemployment. Eur. Econ. Rev. 38(3–4), 793–801 (1994)

    Article  Google Scholar 

  17. D’Amuri, F., Marcucci, J.: The predictive power of Google searches in forecasting US unemployment. Int. J. Forecast. 33(4), 801–816 (2017)

    Article  Google Scholar 

  18. Tuhkuri, J.: ETLAnow: a model for forecasting with big data-forecasting unemployment with Google searches in Europe. No. 54. ETLA Report (2016)

    Google Scholar 

  19. Katris, C.: Prediction of unemployment rates with time series and machine learning techniques. Comput. Econ. 55, 673–706 (2019). https://doi.org/10.1007/s10614-019-09908-9

    Article  Google Scholar 

  20. de Troya, Í.M.R., et al.: Predicting, explaining and understanding risk of long-term unemployment. In: 32nd Conference on Neural Information Processing Systems (2018)

    Google Scholar 

  21. Kütük, Y., Güloğlu, B.: Prediction of transition probabilities from unemployment to employment for Turkey via machine learning and econometrics: a comparative study. J. Res. Econ. 3(1), 58–75 (2019)

    Google Scholar 

  22. Beyersmann, J., Allignol, A., Schumacher, M.: Competing Risks and Multistate Models with R. Springer, Heidelberg (2011). https://doi.org/10.1007/978-1-4614-2035-4

    Book  MATH  Google Scholar 

  23. Tutz, G., Schmid, M.: Modeling Discrete Time-to-Event Data. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-28158-2

    Book  MATH  Google Scholar 

  24. Duchateau, L., Janssen, P.: The Frailty Model. Springer, Heidelberg (2007). https://doi.org/10.1007/978-0-387-72835-3

    Book  MATH  Google Scholar 

  25. Cook, R.J., Lawless, J.: The Statistical Analysis of Recurrent Events. Springer, Heidelberg (2007). https://doi.org/10.1007/978-0-387-69810-6

    Book  MATH  Google Scholar 

  26. Rausand, M., Høyland, A.: System Reliability Theory: Models, Statistical Methods, and Applications, vol. 396. Wiley, Hoboken (2003)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Markus Viljanen .

Editor information

Editors and Affiliations

Appendix

Appendix

Unemployment registry data has a number of potential biases if we want to generalize the results to the entire population. In that case, the training set should contain all 18 to 64 year olds currently residing in Varsinais-Suomi with their recurrent unemployment and employment spells. However, the unemployment registry is sampled monthly and contains only people who have been jobseekers at least once during the sampling.

Fig. 6.
figure 6

Registry data set compared to the full population in Varsinais-Suomi.

1.1 The Data Set Includes Only People Who Have Been Unemployed

Persons who have not been job seekers during 2013 to 2017 in the unemployment agencies are missing from the original data set because they do not have a registry entry. This is the case for people who have not been unemployed. We compared the data set to the official yearly statistics in Fig. 6, where we find that about 50% of the labour force in Varsinais-Suomi is missing. The unemployment exit rate is not biased, because every unemployed person is included. However, the baseline unemployment entry rate and prevalence are too high, because many people who are never unemployed are missing as negative examples. In other words, by definition we are analyzing unemployment among all people who experience unemployment at least once during the follow-up period.

It is still possible to estimate the true person specific rates from model predictions. Assume that the true unemployment entry rate is \(\mu _i\) and the exit rate is \(\lambda _i\) for person i. Denote the length of a ’not unemployed’ spell as T and the length of follow-up as t. The probability of missing from the data corresponds to probability of starting outside unemployment and remaining at that state the entire time: \(P(\{X_i(t)=0\}_{t=0,1,...})=P(X_i(0)=0)P(T > t)\). The data contains all of the ’unemployed’ observations \(P(X_i(t)=1)=\frac{\mu _i}{\lambda _i+\mu _i}\) but the proportion of ’not unemployed’ observations included is only \(P(X_i(t)=0)-P(\{X_i(t)=1\}_{t=0,1,...})=P(X_i(0)=0)P(T\le t)=\frac{\lambda _i}{\lambda _i+\mu _i}(1-e^{\mu _i t})\). Denote the observed odds of unemployment \(\frac{\mu _i^*}{\lambda _i}\), which should be equal to the odds \(P(X_i(t)=1)/ P(X_i(t)=0)P(T\le t)=\frac{\mu _i}{\lambda _i (1-e^{-\mu _i t})}\). This means we can solve:

$$\begin{aligned} \begin{array}{c} \frac{\mu _i}{1-e^{-\mu _i t}} = \mu _i^* \end{array} \end{aligned}$$
(16)

We then obtain the true rate \(\mu _i\) that produces the observed unemployment entry rate \(\mu _i^*\). With increasing follow-up \(t\longrightarrow \infty \) we gather all samples.

Fig. 7.
figure 7

Survival functions, hazards and probability densities of the unemployment (exit) and non-unemployment (enter) spell lengths, estimated from the first spell.

1.2 The Data Set Excludes Some Short Spells

Spells shorter than one calendar month are undersampled because the registry status is recorded monthly. Such persons may enter and exit unemployment in between monthly measurements without being recorded. We estimate how many percent of spells are missing by calculating the Kaplan-Meier estimate \(S(t)=P(T>t)\) of the first spell length T in Fig. 7. The second month hazard can be used to estimate the true first month hazard, as shown in the bottom left figure. For example, assuming that first month hazards should be 0.25/month (exit) and 0.12/month (entry), the percentage of spells that end in the first month should be \(1-e^{-0.25}\approx \)22% (exit) and \(1-e^{-0.12}\approx \)11% (entry) instead of the \(1-e^{-0.12}\approx 11\%\) (exit) and \(1-e^(-0.06)\approx 6\%\) (entry) that were observed. These spells are a small subset of the data set, and short spells do not meaningfully contribute to the total amount of unemployment

1.3 The Data Set Includes Some People Who have a Moved Out

Finally, we have no knowledge of who remain or move out of the Varsinais-Suomi area. The data set may include persons who have moved out and are not at risk of being recorded in the unemployment registry. This bias can be estimated with a simple Monte Carlo simulation. From the government movement statistics in the years 2013-2017 (StatFin) we can calculate the migration rates within Finland. Each year on average 2.1% of the Varsinais-Suomi population moved into other economic areas, and 0.21% of the population in other areas moved into Varsinais-Suomi. We assume that the migration of people follows a Markov chain with the corresponding monthly transition probabilities. We then overlay the movement patterns generated from this Markov Chain into the data set as seen in the left of Fig. 8, and calculate the percentage of people that are outside Varsinais-Suomi each month in the right of Fig. 8. This implies that about 6% of the samples at each time were probably outside Varsinais-Suomi.

Fig. 8.
figure 8

The real world moving out bias is estimated with a monte carlo simulation.

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Viljanen, M., Pahikkala, T. (2020). Predicting Unemployment with Machine Learning Based on Registry Data. In: Dalpiaz, F., Zdravkovic, J., Loucopoulos, P. (eds) Research Challenges in Information Science. RCIS 2020. Lecture Notes in Business Information Processing, vol 385. Springer, Cham. https://doi.org/10.1007/978-3-030-50316-1_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-50316-1_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-50315-4

  • Online ISBN: 978-3-030-50316-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics