Abstract
Social networks are a sterling source of information that reflects the real life of people in the digital space. This makes it possible to infer various aspects of the socioeconomic behavior of the user, even if he/she does not indicate them explicitly. In this study, on the one hand, we consider Russian online social network VK.com, which is analog to the global Facebook platform. On the other hand, there is a supplementary financial information source provided by the bank company. Combining the data of online social media with debit card transactions, we train machine learning models to infer the socioeconomic status (SES) of the user, as well as six purchasing patterns that characterize customer transactional activity of certain type. Namely, we detect if a user is a driver, parent, gamer, traveler, or he/she prefers to purchase at night/in the morning. SES is defined as average monthly expenses and considered as real number variable. The following features are extracted as predictors: demographic information from a user’s page, user participation in communities, topics of that communities, text embeddings of user posts, topological characteristics, and graph embeddings of nodes in the friendship graph. Obtained results show the superiority of graph embeddings in both classification and regression tasks (median absolute percentage error MedAPE = 29.7 for SES). Moreover, for drivers (Macro-\(F_1=0.688\)) and parents (Macro-\(F_1=0.679\)), the higher scores are reached by concatenation of different features. In addition, we investigate feature importance values and found that topics of user communities and the structure of its network influence on the model stronger than other features. The performed study shows the power of online social media data for inferring user socioeconomic attributes.
Similar content being viewed by others
Data availability
The dataset generated and analyzed during the current study is not publicly available due to bank privacy statement.
Notes
We use the comprehensive collection of stop-words for the Russian language, which is available at “https://github.com/stopwords-iso/stopwords-ru”.
References
Abitbol J, Karsai M, Fleury E (2018) Location, occupation, and semantics based socioeconomic status inference on twitter, pp 1192–1199. https://doi.org/10.1109/ICDMW.2018.00171
Aletras N, Chamberlain BP (2018) Predicting twitter user socioeconomic attributes with network and language information. In: Proceedings of the 29th on hypertext and social media, ACM, pp 20–24
Al-Sharawneh JA, Williams M (2010) Credibility-aware web-based social network recommender: follow the leader. In: ACM recommender systems, WARWICK, United Kingdome, pp 1–8
Bernstein B (1960) Language and social class. Br J Sociol 11(3):271–276
Blumenstock J, Cadamuro G, On R (2015) Predicting poverty and wealth from mobile phone metadata. Science 350(6264):1073–1076
Bobadilla J, Ortega F, Hernando A, Gutiérrez A (2013) Recommender systems survey. Knowl-Based Syst 46:109–132. https://doi.org/10.1016/j.knosys.2013.03.012
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146
Bonacich P (2007) Some unique properties of eigenvector centrality. Soc Netw 29(4):555–564. https://doi.org/10.1016/j.socnet.2007.04.002
Borzymek P, Sydow M, Wierzbicki A (2009) Enriching trust prediction model in social network with user rating similarity. In: Proceedings of the 2009 international conference on computational aspects of social networks. CASON ’09, IEEE Computer Society, USA, pp 40–47. https://doi.org/10.1109/CASoN.2009.30.
Brandes U (2001) A faster algorithm for betweenness centrality. J Math Sociol 25(2):163–177. https://doi.org/10.1080/0022250X.2001.9990249
Campbell KE, Marsden PV, Hurlbert JS (1986) Social resources and socioeconomic status. Soc Netw 8(1):97–117
Chamberlain BP, Humby C, Deisenroth MP (2017) Probabilistic inference of twitter users’ age based on what they follow. In: Altun Y, Das K, Mielikäinen T, Malerba D, Stefanowski J, Read J, Žitnik M, Ceci M, Džeroski S (eds) Machine learning and knowledge discovery in databases. Springer, Cham, pp 191–203
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’16, Association for Computing Machinery, New York, NY, USA, pp 785–794. https://doi.org/10.1145/2939672.2939785.
De Montjoye Y-A, Hidalgo CA, Verleysen M, Blondel VD (2013) Unique in the crowd: the privacy bounds of human mobility. Sci Rep 3:1376
Ding S, Huang H, Zhao T, Fu X (2019) Estimating socioeconomic status via temporal-spatial mobility analysis—a case study of smart card data. In: 2019 28th international conference on computer communication and networks (ICCCN), pp 1–9
Dunbar RIM (1998) The social brain hypothesis. Evolut Anthropol Issues News Rev. https://doi.org/10.1002/(sici)1520-6505(1998)6:5<178::aid-evan5>3.3.co;2-p
Fisher JE (1987) Social class and consumer behavior: the relevance of class and status. ACR North American Advances
Fixman M, Berenstein A, Brea J, Minnoni M, Travizano M, Sarraute, C (2016) A bayesian approach to income inference in a communication network. In: Proceedings of the 2016 IEEE/ACM international conference on advances in social networks analysis and mining. ASONAM ’16, IEEE Press, pp 579–582
Gao J, Zhang YC, Zhou T (2019) Computational socioeconomics. Phys Rep 817:1–104. https://doi.org/10.1016/j.physrep.2019.05.002
Garfinkel SL (2015) De-identification of personal information. Technical report, National Institute of Standards and Technology
Han X, Wang L, Liu G, Zhao D, Xu S (2017) Occupation profiling with user-generated geolocation data. In: 2017 2nd international conference on knowledge engineering and applications (ICKEA), pp 93–97. https://doi.org/10.1109/ICKEA.2017.8169908
Heatherly R, Kantarcioglu M, Lindamood J (2013) Preventing private information inference attacks on social networks technical report UTDCS-03-09 (2), pp 1–18
Huang Y, Yu L, Wang X, Cui B (2015) A multi-source integration framework for user occupation inference in social media systems. World Wide Web 18(5):1247–1267. https://doi.org/10.1007/s11280-014-0300-6
Iqbal S, Ismail Z (2011) Buying behavior: gender and socioeconomic class differences on interpersonal in uence susceptibility. Int J Bus Soc Sci 2(4):55–66
Jean N, Burke M, Xie M, Davis WM, Lobell DB, Ermon S (2016) Combining satellite imagery and machine learning to predict poverty. Science 353(6301):790–794
Jøsang A, Ismail R, Boyd C (2007) A survey of trust and reputation systems for online service provision. Decis Support Syst 43(2):618–644. https://doi.org/10.1016/j.dss.2005.05.019
Kong Y-X, Shi G-Y, Wu R-J, Zhang Y-C (2019) k-core: theories and applications. Phys Rep 832:1–32. https://doi.org/10.1016/j.physrep.2019.10.004
Kreidl M (2000) Perceptions of poverty and wealth in western and post-communist countries. Soc Justice Res 13(2):151–176
Lampos V, Aletras N, Geyti JK, Zou B, Cox IJ (2016) Inferring the socioeconomic status of social media users based on behaviour and language. In: European conference on information retrieval, Springer, pp 689–695
Leo Y, Karsai M, Sarraute C, Fleury E (2018) Correlations and dynamics of consumption patterns in social-economic networks. Soc Netw Anal Min 8(1):9
Li Y-M, Kao C-P (2009) Trepps: a trust-based recommender system for peer production services. Expert Syst Appl 36(2, Part 2):3263–3277. https://doi.org/10.1016/j.eswa.2008.01.078
Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz R, Himmelfarb J, Bansal N, Lee S-I (2020) From local explanations to global understanding with explainable AI for trees. Nat Mach Intell 2(1):2522–5839
Luo S, Morone F, Sarraute C, Travizano M, Makse HA (2017) Inferring personal economic status from social network location. Nat Commun 8(1):15227. https://doi.org/10.1038/ncomms15227
Lv X, Jin P, Yue L (2016) User occupation prediction on microblogs. In: Li F, Shim K, Zheng K, Liu G (eds) Web technologies and applications. Springer, Cham, pp 497–501
Lv X, Jin P, Mu L, Wan S, Yue L (2017) Detecting user occupations on microblogging platforms: an experimental study. In: Chen L, Jensen CS, Shahabi C, Yang X, Lian X (eds) Web and big data. Springer, Cham, pp 331–345
Matz SC, Menges JI, Stillwell DJ, Schwartz HA (2019) Predicting individual-level income from facebook profiles. PLOS ONE 14(3):1–13. https://doi.org/10.1371/journal.pone.0214369
McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Ann Rev Sociol 27(1):415–444
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th international conference on neural information processing systems—volume 2. NIPS’13, Curran Associates Inc., Red Hook, NY, USA, pp 3111–3119
Morone F, Makse HA (2015) In uence maximization in complex networks through optimal percolation. Nature 524(7563):65–68. https://doi.org/10.1038/nature14604
Page SE (2008) The difference: How the power of diversity creates better groups, firms, schools, and societies. Princeton University Press, Princeton, p 456. https://doi.org/10.2307/j.ctt7sp9c
Page L, Brin S, Motwani R, Winograd T (November 1999) The pagerank citation ranking: bringing order to the web. Technical Report 1999-66, Stanford InfoLab . Previous number = SIDL-WP-1999-0120. http://ilpubs.stanford.edu:8090/422/
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Preoţiuc-Pietro D, Volkova S, Lampos V, Bachrach Y, Aletras N (2015a) Studying user income through language, behaviour and affect in social media. PLOS ONE 10(9):1–17. https://doi.org/10.1371/journal.pone.0138717
Preoţiuc-Pietro D, Lampos V, Aletras N (2015b) An analysis of the user occupational class through Twitter content. Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Volume 1: Long Papers), pp 1754–1764. https://doi.org/10.3115/v1/P15-1169
Rao D, Yarowsky D, Shreevats A, Gupta M (2010) Classifying latent user attributes in twitter. In: SMUC ’10
Rizos G, Papadopoulos S, Kompatsiaris Y (2017) Multilabel user classification using the community structure of online networks. PLOS ONE 12(3):1–34. https://doi.org/10.1371/journal.pone.0173347
Roth P (2019) In: Holzer B, Stegbauer C (eds) Feld (1981) The focused organization of social ties, Springer, Wiesbaden, pp 185–188
Schäfer I, Hansen H, Schön G, Höfels S, Altiner A, Dahlhaus A, Gensichen J, Riedel-Heller S, Weyerer S, Blank WA et al (2012) The in uence of age, gender and socio-economic status on multimorbidity patterns in primary care: first results from the multicare cohort study. BMC Health Serv Res 12(1):89
Segalovich I(2003) A fast morphological algorithm with unknown word guessing induced by a dictionary for a web search engine. In: Proceedings of the international conference on machine learning; models, technologies and applications. MLMTA’03. Citeseer
Sloan L, Morgan J, Burnap P, Williams M (2015) Who tweets? Deriving the demographic characteristics of age, occupation and social class from twitter user meta-data. PLOS ONE 10(3):1–20. https://doi.org/10.1371/journal.pone.0115545
Tsakalidis A, Aletras N, Cristea AI, Liakata M (2018) Nowcasting the stance of social media users in a sudden vote: the case of the greek referendum. In: Proceedings of the 27th ACM international conference on information and knowledge management. CIKM ’18, Association for Computing Machinery, New York, NY, USA, pp 367–376. https://doi.org/10.1145/3269206.3271783.
Tsitsulin A, Mottin D, Karras P, Müller E (2018) Verse: versatile graph embeddings from similarity measures. In: Proceedings of the 2018 World Wide Web conference. WWW ’18, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, pp 539–548. https://doi.org/10.1145/3178876.3186120.
Tucker-Drob EM, Briley DA (2012) Socioeconomic status modifies interest-knowledge associations among adolescents. Personal Individ Differ 53(1):9–15
Vaganov D, Kalinin A, Bochenina K (2020) On inferring monthly expenses of social media users: towards data and approaches. In: Cherifi H, Gaito S, Mendes JF, Moro E, Rocha LM (eds) Complex networks and their applications VIII. Springer, Cham, pp 854–865
Vaganov D, Funkner A, Kovalchuk S, Guleva V, Bochenina, K (2018) Forecasting purchase categories with transition graphs using financial and social data. In: International conference on social informatics, Springer, pp 439–454
Visa Merchant Data Standards Manual (2019). https://usa.visa.com/content/dam/VCOM/download/merchants/visa-merchant-data-standards-manual.pdf. Accessed 4 Feb 2020
Vorontsov KV (2014) Additive regularization for topic models of text collections. Doklady Math 89(3):301–304. https://doi.org/10.1134/S1064562414020185
Vorontsov K, Frei O, Apishev M, Romov P, Dudarenko M (2015) Bigartm: open source library for regularized multimodal topic modeling of large collections. In: AIST
Wang X, Yu L, Yao J, Cui B (2013) A multiple feature integration model to infer occupation from social media records. In: Lin X, Manolopoulos Y, Srivastava D, Huang G (eds) Web information systems engineering WISE 2013. Springer, Berlin, pp 137–150
Wang Q, Gao J, Zhou T, Hu Z, Tian H (2016) Critical size of ego communication networks. EPL (Europhys Lett) 114(5):58004. https://doi.org/10.1209/0295-5075/114/58004
Wang J, Gao J, Liu J-H, Yang D, Zhou T (2019) Regional economic status inference from information flow and talent mobility. EPL (Europhys Lett) 125(6):68002
Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393(6684):440–442. https://doi.org/10.1038/30918
Xu W, Zhou X, Li L (2008) Inferring privacy information via social relations. In: 2008 IEEE 24th international conference on data engineering workshop, pp 525–530. https://doi.org/10.1109/ICDEW.2008.4498373
Yuan W, Guan D, Lee Y-K, Lee S, Hur SJ (2010) Improved trust-aware recommender system using small-worldness of trust networks. Knowl-Based Syst 23(3):232–238. https://doi.org/10.1016/j.knosys.2009.12.004
Zamal FA, Liu W, Ruths D (2012) Homophily and latent attribute inference: inferring latent attributes of twitter users from neighbors. In: Proceedings of the sixth international AAAI conference on weblogs and social media homophily, pp 387–390
Zhang J, Hu X, Zhang Y, Liu H (2016) Your age is no secret: inferring microbloggers’ ages via content and interaction analysis. In: Proceedings of the 10th international conference on web and social media, ICWSM 2016 (Icwsm), pp 476–485
Zheleva E, Getoor L (2009) To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. In: Proceedings of the 18th international conference on world wide web, ACM, pp 531–540
Acknowledgements
This research is financially supported by The Russian Science Foundation, Agreement #17-71-30029 with co-financing of Bank Saint Petersburg. We are extremely grateful to Max Petrov for data collection from social media. We also much appreciate Mariia Bardina for her assistance with topic modeling.
Author information
Authors and Affiliations
Contributions
The contribution of all authors to the manuscript is quite balanced. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kalinin, A., Vaganov, D. & Bochenina, K. Discovering patterns of customer financial behavior using social media data. Soc. Netw. Anal. Min. 10, 77 (2020). https://doi.org/10.1007/s13278-020-00690-3
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-020-00690-3