Abstract
Nowadays there are representative volumes of demographic data which are the sources for extraction of demographic sequences that can be further analysed and interpreted by domain experts. Since traditional statistical methods cannot face the emerging needs of demography, we used modern methods of pattern mining and machine learning to achieve better results. In particular, our collaborators, the demographers, are interested in two main problems: prediction of the next event in a personal life trajectory and finding interesting patterns in terms of demographic events for the gender feature. The main goal of this paper is to compare different methods by accuracy for these tasks. We have considered interpretable methods such as decision trees and semi- and non-interpretable methods, such as the SVM method with custom kernels and neural networks. The best accuracy results are obtained with a two-channel convolutional neural network. All the acquired results and the found patterns are passed to the demographers for further investigation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
For emerging patterns in classification setting cf. [8].
References
Aisenbrey, S., Fasang, A.E.: New life for old ideas: the “second wave” of sequence analysis bringing the “course” back into the life course. Soc. Meth. Res. 38(3), 420–462 (2010). https://doi.org/10.1177/0049124109357532
Billari, F.C.: Sequence analysis in demographic research. Can. Stud. Popul. [Arch.] 28, 439–458 (2001)
Billari, F.C., Fürnkranz, J., Prskawetz, A.: Timing, sequencing, and quantum of life course events: a machine learning approach. Eur. J. Popul. (Revue européenne de Démographie) 22(1), 37–65 (2006). https://doi.org/10.1007/s10680-005-5549-0
Blockeel, H., Fürnkranz, J., Prskawetz, A., Billari, F.C.: Detecting temporal change in event sequences: an application to demographic data. In: De Raedt, L., Siebes, A. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 29–41. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44794-6_3
Buzmakov, A., Egho, E., Jay, N., Kuznetsov, S.O., Napoli, A., Raïssi, C.: On mining complex sequential data by means of FCA and pattern structures. Int. J. Gen Syst 45(2), 135–159 (2016). https://doi.org/10.1080/03081079.2015.1072925
Caruana, R., Lundberg, S., Ribeiro, M.T., Nori, H., Jenkins, S.: Intelligible and explainable machine learning: best practices and practical challenges. In: Gupta, R., Liu, Y., Tang, J., Prakash, B.A. (eds.) The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2020, pp. 3511–3512. ACM (2020). https://dl.acm.org/doi/10.1145/3394486.3406707
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Moschitti, A., Pang, B., Daelemans, W. (eds.) Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, pp. 1724–1734. ACL (2014). https://doi.org/10.3115/v1/d14-1179
Dong, G., Li, J.: Emerging pattern based classification. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, 2nd edn. Springer, Boston (2018). https://doi.org/10.1007/978-1-4614-8265-9_5002
Egho, E., Raïssi, C., Calders, T., Jay, N., Napoli, A.: On measuring similarity for sequences of itemsets. Data Min. Knowl. Discov. 29(3), 732–764 (2014). https://doi.org/10.1007/s10618-014-0362-1
Elzinga, C.H., Liefbroer, A.C.: De-standardization of family-life trajectories of young adults: a cross-national comparison using sequence analysis. Eur. J. Popul. (Revue européenne de Démographie) 23(3), 225–250 (2007). https://doi.org/10.1007/s10680-007-9133-7
Gizdatullin, D., Baixeries, J., Ignatov, D.I., Mitrofanova, E., Muratova, A., Espy, T.H.: Learning interpretable prefix-based patterns from demographic sequences. In: Strijov, V.V., Ignatov, D.I., Vorontsov, K.V. (eds.) IDP 2016. CCIS, vol. 794, pp. 74–91. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-35400-8_6
Gizdatullin, D., Ignatov, D., Mitrofanova, E., Muratova, A.: Classification of demographic sequences based on pattern structures and emerging patterns. In: Supplementary Proceedings of 14th International Conference on Formal Concept Analysis, ICFCA, pp. 49–66 (2017)
Hochreiter, S., Schmidhuber, J.: LSTM can solve hard long time lag problems. In: Mozer, M., Jordan, M.I., Petsche, T. (eds.) Advances in Neural Information Processing Systems (NIPS), Denver, CO, USA, 2–5 December, vol. 9, pp. 473–479. MIT Press (1996)
Ignatov, D.I., Mitrofanova, E., Muratova, A., Gizdatullin, D.: Pattern mining and machine learning for demographic sequences. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2015. CCIS, vol. 518, pp. 225–239. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24543-0_17
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N.: Watkins, C: Text classification using string kernels. J. Mach. Learn. Res. 2, 419–444 (2002). http://jmlr.org/papers/v2/lodhi02a.html
Muratova, A., Sushko, P., Espy, T.H.: Black-box classification techniques for demographic sequences: from customised SVM to RNN. In: Tagiew, R., Ignatov, D.I., Hilbert, A., Heinrich, K., Delhibabu, R. (eds.) Proceedings of the 4th Workshop on Experimental Economics and Machine Learning, EEML 2017, Dresden, Germany, 17–18 September 2017, pp. 31–40. CEUR Workshop Proceedings, Aachen (2017). http://ceur-ws.org/Vol-1968/paper4.pdf
Piccarreta, R., Studer, M.: Holistic analysis of the life course: methodological challenges and new perspectives. Adv. Life Course Res. (2019). https://doi.org/10.1016/j.alcr.2018.10.004
Puur, A., Rahnu, L., Maslauskaite, A., Stankuniene, V., Zakharov, S.: Transformation of partnership formation in eastern Europe: the legacy of the past demographic divide. J. Comp. Fam. Stud. 43, 389–417 (2012). https://doi.org/10.3138/jcfs.43.3.389
Ritschard, G., Studer, M.: Sequence analysis: where are we, where are we going? In: Ritschard, G., Studer, M. (eds.) Sequence Analysis and Related Approaches. LCRSP, vol. 10, pp. 1–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-95420-2_1
Rossignon, F., Studer, M., Gauthier, J.-A., Goff, J.-M.L.: Sequence history analysis (SHA): estimating the effect of past trajectories on an upcoming event. In: Ritschard, G., Studer, M. (eds.) Sequence Analysis and Related Approaches. LCRSP, vol. 10, pp. 83–100. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-95420-2_6
Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1(5), 206–215 (2019). https://doi.org/10.1038/s42256-019-0048-x
Ryšavý, P., Železný, F.: Estimating sequence similarity from read sets for clustering next-generation sequencing data. Data Min. Knowl. Discov. 33(1), 1–23 (2018). https://doi.org/10.1007/s10618-018-0584-8
Solomonoff, R.J.: The Kolmogorov lecture the universal distribution and machine learning. Comput. J. 46(6), 598–601 (2003). https://doi.org/10.1093/comjnl/46.6.598
Tomczak, J.M., Zieba, M.: Probabilistic combination of classification rules and its application to medical diagnosis. Mach. Learn. 101(1–3), 105–135 (2015). https://doi.org/10.1007/s10994-015-5508-x
Wahrendorf, M.: Agreement of self-reported and administrative data on employment histories in a German cohort study: a sequence analysis. Eur. J. Popul. 35(2), 329–346 (2018). https://doi.org/10.1007/s10680-018-9476-2
Wang, T., Duan, L., Dong, G., Bao, Z.: Efficient mining of outlying sequence patterns for analyzing outlierness of sequence data. ACM Trans. Knowl. Discov. Data 14(5), 62:1–62:26 (2020). https://doi.org/10.1145/3399671
Zimmermann, A., Nijssen, S.: Supervised pattern mining and applications to classification. In: Aggarwal, C.C., Han, J. (eds.) Frequent Pattern Mining. LCRSP, pp. 425–442. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07821-2_17
Acknowledgment
The authors would like to thank Prof. G. Dong for his interest in our previous work on prefix-based emerging sequential patterns.
The study was implemented in the framework of the Basic Research Program at the National Research University Higher School of Economics and funded by the Russian Academic Excellence Project ‘5-100’. This research is also supported by the Faculty of Social Sciences, National Research University Higher School of Economics.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Muratova, A., Mitrofanova, E., Islam, R. (2021). Comparison of Machine Learning Methods for Life Trajectory Analysis in Demography. In: Nguyen, N.T., Chittayasothorn, S., Niyato, D., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2021. Lecture Notes in Computer Science(), vol 12672. Springer, Cham. https://doi.org/10.1007/978-3-030-73280-6_50
Download citation
DOI: https://doi.org/10.1007/978-3-030-73280-6_50
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73279-0
Online ISBN: 978-3-030-73280-6
eBook Packages: Computer ScienceComputer Science (R0)