Abstract
Socioeconomic status (SES) is an important economic and social aspect widely concerned. Assessing individual SES can assist related organizations in making a variety of policy decisions. Traditional approach suffers from the extremely high cost in collecting large-scale SES-related survey data. With the ubiquity of smart phones, mobile phone data has become a novel data source for predicting individual SES with low cost. However, the task of predicting individual SES on mobile phone data also proposes some new challenges, including sparse individual records, scarce explicit relationships and limited labeled samples, unconcerned in prior work restricted to regional or household-oriented SES prediction. To address these issues, we propose a semi-supervised hypergraph-based factor graph model (HyperFGM) for individual SES prediction. HyperFGM is able to efficiently capture the associations between SES and individual mobile phone records to handle the individual record sparsity. For the scarce explicit relationships, HyperFGM models implicit high-order relationships among users on the hypergraph structure. Besides, HyperFGM explores the limited labeled data and unlabeled data in a semi-supervised way. Experimental results show that HyperFGM greatly outperforms the baseline methods on a set of anonymized real mobile phone data for individual SES prediction.
Similar content being viewed by others
References
Adler, N.E., Boyce, T., Chesney, M.A., Cohen, S., Folkman, S., Kahn, R.L., Syme, S.L.: Socioeconomic status and health: the challenge of the gradient. Am. Psychol. 49(1), 15 (1994)
Blau, P.M., Duncan, O.D.: The American occupational structure (1967). ERIC
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
Blumenstock, J., Cadamuro, G., On, R.: Predicting poverty and wealth from mobile phone metadata. Science 350(6264), 1073–1076 (2015)
Carlsson-Kanyama, A., Linden, A.L.: Travel patterns and environmental effects now and in the future: implications of differences in energy consumption among socio-economic groups. Ecol. Econ. 30(3), 405–417 (1999)
Dagan, I., Lee, L., Pereira, F.: Similarity-based methods for word sense disambiguation. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, pp. 56–63 (1997)
Gao, Y., Wang, M., Tao, D., Ji, R., Dai, Q.: 3-D object retrieval and recognition with hypergraph analysis. IEEE Trans. Image Process. 21(9), 4290–4303 (2012)
Granizo-Mackenzie, D., Moore, J.H.: Multiple threshold spatially uniform relief for the genetic analysis of complex human diseases. In: EvoBIO, Springer, pp. 1–10 (2013)
Hauser, R.M., Warren, J.R.: Socioeconomic indexes for occupations: a review, update, and critique. Sociol. Methodol. 27(1), 177–298 (1997)
Hong, L., Frias-Martinez, E., Frias-Martinez, V.: Topic models to infer socio-economic maps. In: AAAI, pp. 3835–3841 (2016)
Huang, Q., Wong, D.W.: Activity patterns, socioeconomic status and urban spatial structure: What can social media data tell us? Int. J. Geogr. Inf. Sci. 30(9), 1873–1898 (2016)
Huang, Y., Liu, Q., Zhang, S., Metaxas, D.N.: Image retrieval via probabilistic hypergraph ranking. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 3376–3383 (2010)
Lampos, V., Aletras, N., Geyti, J.K., Zou, B., Cox, I.J.: Inferring the socioeconomic status of social media users based on behaviour and language. In: European Conference on Information Retrieval, Springer, pp. 689–695 (2016)
Lotero, L., Hurtado, R.G., Floría, L.M., Gómez-Gardeñes, J.: Rich do not rise early: spatio-temporal patterns in the mobility networks of different socio-economic classes. R. Soc. Open Sci. 3(10), 150654 (2016)
Mao, H., Shuai, X., Ahn, Y.Y., Bollen, J.: Quantifying socio-economic indicators in developing countries from mobile phone communication data: applications to côte d’ivoire. EPJ Data Sci. 4(1), 15 (2015)
Murphy, K.P., Weiss, Y., Jordan, M.I.: Loopy belief propagation for approximate inference: an empirical study. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers Inc., pp. 467–475 (1999)
Propper, C., Damiani, M., Leckie, G., Dixon, J.: Impact of patients’ socioeconomic status on the distance travelled for hospital admission in the english national health service. J. Health Serv. Res. Policy 12(3), 153–159 (2007)
Rabin, M.O., Scott, D.: Finite automata and their decision problems. IBM J. Res. Dev. 3(2), 114–125 (1959)
Rose, D., Pevalin, D.: Re-basing the ns-sec on soc2010 : a report to ONS. Techincal Report, University of Essex (2010)
Satchidanand, S.N., Ananthapadmanaban, H., Ravindran, B.: Extended discriminative random walk: a hypergraph approach to multi-view multi-relational transductive learning. In: IJCAI, pp. 3791–3797 (2015)
Sirin, S.R.: Socioeconomic status and academic achievement: a meta-analytic review of research. Rev. Educ. Res. 75(3), 417–453 (2005)
Smith-Clarke, C., Mashhadi, A., Capra, L.: Poverty on the cheap: estimating poverty maps using aggregated mobile communication networks. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM, pp. 511–520 (2014)
Soto, V., Frias-Martinez, V., Virseda, J., Frias-Martinez, E.: Prediction of socioeconomic levels using cell phone records. In: International Conference on User Modeling, Adaption and Personalization, Springer, Berlin, pp. 377–388 (2011)
Su, L., Gao, Y., Zhao, X., Wan, H., Gu, M., Sun, J.: Vertex-weighted hypergraph learning for multi-view object classification. In: IJCAI, pp. 2779–2785 (2017)
Tang, W., Zhuang, H., Tang, J.: Learning to infer social ties in large networks. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, pp. 381–397 (2011)
Urbanowicz, R.J., Olson, R.S., Schmitt, P., Meeker, M., Moore, J.H.: Benchmarking relief-based feature selection methods. arXiv e-print. arXiv:1711.08477 (2017)
Varatharajah, Y., Chong, M.J., Saboo, K., Berry, B., Brinkmann, B., Worrell, G., Iyer, R.: Eeg-graph: a factor-graph-based model for capturing spatial, temporal, and observational relationships in electroencephalograms. In: Advances in Neural Information Processing Systems, pp. 5377–5386 (2017)
Wilcock, A., Pun, M., Khanona, J., Aung, M.: Consumer attitudes, knowledge and behaviour: a review of food safety issues. Trends Food Sci. Technol. 15(2), 56–66 (2004)
Winkleby, M.A., Jatulis, D.E., Frank, E., Fortmann, S.P.: Socioeconomic status and health: how education, income, and occupation contribute to risk factors for cardiovascular disease. Am. J. Publ. Health 82(6), 816–820 (1992)
Yang, Y., Luyten, W., Liu, L., Moens, M.F., Tang, J., Li, J.: Forecasting potential diabetes complications. In: AAAI, pp. 313–319 (2014)
Ye, Y., Zheng, Y., Chen, Y., Feng, J., Xie, X.: Mining individual life pattern based on location history. In: Tenth International Conference on Mobile Data Management: Systems, Services and Middleware, 2009. MDM’09, IEEE, pp. 1–10 (2009)
Yu, J., Tao, D., Wang, M.: Adaptive hypergraph learning and its application in image classification. IEEE Trans. Image Process. 21(7), 3262–3272 (2012)
Zhou, D., Huang, J., Schölkopf, B.: Learning with hypergraphs: clustering, classification, and embedding. In: Advances in Neural Information Processing Systems, pp. 1601–1608 (2007)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work is partially supported by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No. 824019, by the National Natural Science Foundation of China No. 61802140, and by the Hubei Provincial Natural Science Foundation No. 2018CFB200.
Rights and permissions
About this article
Cite this article
Zhao, T., Huang, H., Yao, X. et al. Predicting individual socioeconomic status from mobile phone data: a semi-supervised hypergraph-based factor graph approach. Int J Data Sci Anal 9, 361–372 (2020). https://doi.org/10.1007/s41060-019-00195-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41060-019-00195-z