Skip to main content

Comparison of Machine Learning Methods for Life Trajectory Analysis in Demography

  • Conference paper
  • First Online:
Book cover Intelligent Information and Database Systems (ACIIDS 2021)

Abstract

Nowadays there are representative volumes of demographic data which are the sources for extraction of demographic sequences that can be further analysed and interpreted by domain experts. Since traditional statistical methods cannot face the emerging needs of demography, we used modern methods of pattern mining and machine learning to achieve better results. In particular, our collaborators, the demographers, are interested in two main problems: prediction of the next event in a personal life trajectory and finding interesting patterns in terms of demographic events for the gender feature. The main goal of this paper is to compare different methods by accuracy for these tasks. We have considered interpretable methods such as decision trees and semi- and non-interpretable methods, such as the SVM method with custom kernels and neural networks. The best accuracy results are obtained with a two-channel convolutional neural network. All the acquired results and the found patterns are passed to the demographers for further investigation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://keras.io.

  2. 2.

    https://github.com/anya-m/2CNNSeqDem/.

  3. 3.

    For emerging patterns in classification setting cf. [8].

References

  1. Aisenbrey, S., Fasang, A.E.: New life for old ideas: the “second wave” of sequence analysis bringing the “course” back into the life course. Soc. Meth. Res. 38(3), 420–462 (2010). https://doi.org/10.1177/0049124109357532

    Article  MathSciNet  Google Scholar 

  2. Billari, F.C.: Sequence analysis in demographic research. Can. Stud. Popul. [Arch.] 28, 439–458 (2001)

    Google Scholar 

  3. Billari, F.C., Fürnkranz, J., Prskawetz, A.: Timing, sequencing, and quantum of life course events: a machine learning approach. Eur. J. Popul. (Revue européenne de Démographie) 22(1), 37–65 (2006). https://doi.org/10.1007/s10680-005-5549-0

    Article  Google Scholar 

  4. Blockeel, H., Fürnkranz, J., Prskawetz, A., Billari, F.C.: Detecting temporal change in event sequences: an application to demographic data. In: De Raedt, L., Siebes, A. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 29–41. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44794-6_3

    Chapter  MATH  Google Scholar 

  5. Buzmakov, A., Egho, E., Jay, N., Kuznetsov, S.O., Napoli, A., Raïssi, C.: On mining complex sequential data by means of FCA and pattern structures. Int. J. Gen Syst 45(2), 135–159 (2016). https://doi.org/10.1080/03081079.2015.1072925

    Article  MathSciNet  MATH  Google Scholar 

  6. Caruana, R., Lundberg, S., Ribeiro, M.T., Nori, H., Jenkins, S.: Intelligible and explainable machine learning: best practices and practical challenges. In: Gupta, R., Liu, Y., Tang, J., Prakash, B.A. (eds.) The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2020, pp. 3511–3512. ACM (2020). https://dl.acm.org/doi/10.1145/3394486.3406707

  7. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Moschitti, A., Pang, B., Daelemans, W. (eds.) Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, pp. 1724–1734. ACL (2014). https://doi.org/10.3115/v1/d14-1179

  8. Dong, G., Li, J.: Emerging pattern based classification. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, 2nd edn. Springer, Boston (2018). https://doi.org/10.1007/978-1-4614-8265-9_5002

  9. Egho, E., Raïssi, C., Calders, T., Jay, N., Napoli, A.: On measuring similarity for sequences of itemsets. Data Min. Knowl. Discov. 29(3), 732–764 (2014). https://doi.org/10.1007/s10618-014-0362-1

    Article  MathSciNet  MATH  Google Scholar 

  10. Elzinga, C.H., Liefbroer, A.C.: De-standardization of family-life trajectories of young adults: a cross-national comparison using sequence analysis. Eur. J. Popul. (Revue européenne de Démographie) 23(3), 225–250 (2007). https://doi.org/10.1007/s10680-007-9133-7

    Article  Google Scholar 

  11. Gizdatullin, D., Baixeries, J., Ignatov, D.I., Mitrofanova, E., Muratova, A., Espy, T.H.: Learning interpretable prefix-based patterns from demographic sequences. In: Strijov, V.V., Ignatov, D.I., Vorontsov, K.V. (eds.) IDP 2016. CCIS, vol. 794, pp. 74–91. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-35400-8_6

    Chapter  Google Scholar 

  12. Gizdatullin, D., Ignatov, D., Mitrofanova, E., Muratova, A.: Classification of demographic sequences based on pattern structures and emerging patterns. In: Supplementary Proceedings of 14th International Conference on Formal Concept Analysis, ICFCA, pp. 49–66 (2017)

    Google Scholar 

  13. Hochreiter, S., Schmidhuber, J.: LSTM can solve hard long time lag problems. In: Mozer, M., Jordan, M.I., Petsche, T. (eds.) Advances in Neural Information Processing Systems (NIPS), Denver, CO, USA, 2–5 December, vol. 9, pp. 473–479. MIT Press (1996)

    Google Scholar 

  14. Ignatov, D.I., Mitrofanova, E., Muratova, A., Gizdatullin, D.: Pattern mining and machine learning for demographic sequences. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2015. CCIS, vol. 518, pp. 225–239. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24543-0_17

    Chapter  Google Scholar 

  15. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N.: Watkins, C: Text classification using string kernels. J. Mach. Learn. Res. 2, 419–444 (2002). http://jmlr.org/papers/v2/lodhi02a.html

  16. Muratova, A., Sushko, P., Espy, T.H.: Black-box classification techniques for demographic sequences: from customised SVM to RNN. In: Tagiew, R., Ignatov, D.I., Hilbert, A., Heinrich, K., Delhibabu, R. (eds.) Proceedings of the 4th Workshop on Experimental Economics and Machine Learning, EEML 2017, Dresden, Germany, 17–18 September 2017, pp. 31–40. CEUR Workshop Proceedings, Aachen (2017). http://ceur-ws.org/Vol-1968/paper4.pdf

  17. Piccarreta, R., Studer, M.: Holistic analysis of the life course: methodological challenges and new perspectives. Adv. Life Course Res. (2019). https://doi.org/10.1016/j.alcr.2018.10.004

    Article  Google Scholar 

  18. Puur, A., Rahnu, L., Maslauskaite, A., Stankuniene, V., Zakharov, S.: Transformation of partnership formation in eastern Europe: the legacy of the past demographic divide. J. Comp. Fam. Stud. 43, 389–417 (2012). https://doi.org/10.3138/jcfs.43.3.389

    Article  Google Scholar 

  19. Ritschard, G., Studer, M.: Sequence analysis: where are we, where are we going? In: Ritschard, G., Studer, M. (eds.) Sequence Analysis and Related Approaches. LCRSP, vol. 10, pp. 1–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-95420-2_1

    Chapter  Google Scholar 

  20. Rossignon, F., Studer, M., Gauthier, J.-A., Goff, J.-M.L.: Sequence history analysis (SHA): estimating the effect of past trajectories on an upcoming event. In: Ritschard, G., Studer, M. (eds.) Sequence Analysis and Related Approaches. LCRSP, vol. 10, pp. 83–100. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-95420-2_6

    Chapter  Google Scholar 

  21. Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1(5), 206–215 (2019). https://doi.org/10.1038/s42256-019-0048-x

    Article  Google Scholar 

  22. Ryšavý, P., Železný, F.: Estimating sequence similarity from read sets for clustering next-generation sequencing data. Data Min. Knowl. Discov. 33(1), 1–23 (2018). https://doi.org/10.1007/s10618-018-0584-8

    Article  MathSciNet  Google Scholar 

  23. Solomonoff, R.J.: The Kolmogorov lecture the universal distribution and machine learning. Comput. J. 46(6), 598–601 (2003). https://doi.org/10.1093/comjnl/46.6.598

    Article  MATH  Google Scholar 

  24. Tomczak, J.M., Zieba, M.: Probabilistic combination of classification rules and its application to medical diagnosis. Mach. Learn. 101(1–3), 105–135 (2015). https://doi.org/10.1007/s10994-015-5508-x

    Article  MathSciNet  MATH  Google Scholar 

  25. Wahrendorf, M.: Agreement of self-reported and administrative data on employment histories in a German cohort study: a sequence analysis. Eur. J. Popul. 35(2), 329–346 (2018). https://doi.org/10.1007/s10680-018-9476-2

    Article  Google Scholar 

  26. Wang, T., Duan, L., Dong, G., Bao, Z.: Efficient mining of outlying sequence patterns for analyzing outlierness of sequence data. ACM Trans. Knowl. Discov. Data 14(5), 62:1–62:26 (2020). https://doi.org/10.1145/3399671

  27. Zimmermann, A., Nijssen, S.: Supervised pattern mining and applications to classification. In: Aggarwal, C.C., Han, J. (eds.) Frequent Pattern Mining. LCRSP, pp. 425–442. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07821-2_17

    Chapter  MATH  Google Scholar 

Download references

Acknowledgment

The authors would like to thank Prof. G. Dong for his interest in our previous work on prefix-based emerging sequential patterns.

The study was implemented in the framework of the Basic Research Program at the National Research University Higher School of Economics and funded by the Russian Academic Excellence Project ‘5-100’. This research is also supported by the Faculty of Social Sciences, National Research University Higher School of Economics.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anna Muratova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Muratova, A., Mitrofanova, E., Islam, R. (2021). Comparison of Machine Learning Methods for Life Trajectory Analysis in Demography. In: Nguyen, N.T., Chittayasothorn, S., Niyato, D., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2021. Lecture Notes in Computer Science(), vol 12672. Springer, Cham. https://doi.org/10.1007/978-3-030-73280-6_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-73280-6_50

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-73279-0

  • Online ISBN: 978-3-030-73280-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics