Skip to main content
Log in

Discovering patterns of customer financial behavior using social media data

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Social networks are a sterling source of information that reflects the real life of people in the digital space. This makes it possible to infer various aspects of the socioeconomic behavior of the user, even if he/she does not indicate them explicitly. In this study, on the one hand, we consider Russian online social network VK.com, which is analog to the global Facebook platform. On the other hand, there is a supplementary financial information source provided by the bank company. Combining the data of online social media with debit card transactions, we train machine learning models to infer the socioeconomic status (SES) of the user, as well as six purchasing patterns that characterize customer transactional activity of certain type. Namely, we detect if a user is a driver, parent, gamer, traveler, or he/she prefers to purchase at night/in the morning. SES is defined as average monthly expenses and considered as real number variable. The following features are extracted as predictors: demographic information from a user’s page, user participation in communities, topics of that communities, text embeddings of user posts, topological characteristics, and graph embeddings of nodes in the friendship graph. Obtained results show the superiority of graph embeddings in both classification and regression tasks (median absolute percentage error MedAPE = 29.7 for SES). Moreover, for drivers (Macro-\(F_1=0.688\)) and parents (Macro-\(F_1=0.679\)), the higher scores are reached by concatenation of different features. In addition, we investigate feature importance values and found that topics of user communities and the structure of its network influence on the model stronger than other features. The performed study shows the power of online social media data for inferring user socioeconomic attributes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

The dataset generated and analyzed during the current study is not publicly available due to bank privacy statement.

Notes

  1. We use the comprehensive collection of stop-words for the Russian language, which is available at “https://github.com/stopwords-iso/stopwords-ru”.

References

  • Abitbol J, Karsai M, Fleury E (2018) Location, occupation, and semantics based socioeconomic status inference on twitter, pp 1192–1199. https://doi.org/10.1109/ICDMW.2018.00171

  • Aletras N, Chamberlain BP (2018) Predicting twitter user socioeconomic attributes with network and language information. In: Proceedings of the 29th on hypertext and social media, ACM, pp 20–24

  • Al-Sharawneh JA, Williams M (2010) Credibility-aware web-based social network recommender: follow the leader. In: ACM recommender systems, WARWICK, United Kingdome, pp 1–8

  • Bernstein B (1960) Language and social class. Br J Sociol 11(3):271–276

    Article  Google Scholar 

  • Blumenstock J, Cadamuro G, On R (2015) Predicting poverty and wealth from mobile phone metadata. Science 350(6264):1073–1076

    Article  Google Scholar 

  • Bobadilla J, Ortega F, Hernando A, Gutiérrez A (2013) Recommender systems survey. Knowl-Based Syst 46:109–132. https://doi.org/10.1016/j.knosys.2013.03.012

    Article  Google Scholar 

  • Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146

    Article  Google Scholar 

  • Bonacich P (2007) Some unique properties of eigenvector centrality. Soc Netw 29(4):555–564. https://doi.org/10.1016/j.socnet.2007.04.002

    Article  Google Scholar 

  • Borzymek P, Sydow M, Wierzbicki A (2009) Enriching trust prediction model in social network with user rating similarity. In: Proceedings of the 2009 international conference on computational aspects of social networks. CASON ’09, IEEE Computer Society, USA, pp 40–47. https://doi.org/10.1109/CASoN.2009.30.

  • Brandes U (2001) A faster algorithm for betweenness centrality. J Math Sociol 25(2):163–177. https://doi.org/10.1080/0022250X.2001.9990249

    Article  MATH  Google Scholar 

  • Campbell KE, Marsden PV, Hurlbert JS (1986) Social resources and socioeconomic status. Soc Netw 8(1):97–117

    Article  Google Scholar 

  • Chamberlain BP, Humby C, Deisenroth MP (2017) Probabilistic inference of twitter users’ age based on what they follow. In: Altun Y, Das K, Mielikäinen T, Malerba D, Stefanowski J, Read J, Žitnik M, Ceci M, Džeroski S (eds) Machine learning and knowledge discovery in databases. Springer, Cham, pp 191–203

    Chapter  Google Scholar 

  • Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’16, Association for Computing Machinery, New York, NY, USA, pp 785–794. https://doi.org/10.1145/2939672.2939785.

  • De Montjoye Y-A, Hidalgo CA, Verleysen M, Blondel VD (2013) Unique in the crowd: the privacy bounds of human mobility. Sci Rep 3:1376

    Article  Google Scholar 

  • Ding S, Huang H, Zhao T, Fu X (2019) Estimating socioeconomic status via temporal-spatial mobility analysis—a case study of smart card data. In: 2019 28th international conference on computer communication and networks (ICCCN), pp 1–9

  • Dunbar RIM (1998) The social brain hypothesis. Evolut Anthropol Issues News Rev. https://doi.org/10.1002/(sici)1520-6505(1998)6:5<178::aid-evan5>3.3.co;2-p

    Article  Google Scholar 

  • Fisher JE (1987) Social class and consumer behavior: the relevance of class and status. ACR North American Advances

  • Fixman M, Berenstein A, Brea J, Minnoni M, Travizano M, Sarraute, C (2016) A bayesian approach to income inference in a communication network. In: Proceedings of the 2016 IEEE/ACM international conference on advances in social networks analysis and mining. ASONAM ’16, IEEE Press, pp 579–582

  • Gao J, Zhang YC, Zhou T (2019) Computational socioeconomics. Phys Rep 817:1–104. https://doi.org/10.1016/j.physrep.2019.05.002

    Article  MathSciNet  Google Scholar 

  • Garfinkel SL (2015) De-identification of personal information. Technical report, National Institute of Standards and Technology

  • Han X, Wang L, Liu G, Zhao D, Xu S (2017) Occupation profiling with user-generated geolocation data. In: 2017 2nd international conference on knowledge engineering and applications (ICKEA), pp 93–97. https://doi.org/10.1109/ICKEA.2017.8169908

  • Heatherly R, Kantarcioglu M, Lindamood J (2013) Preventing private information inference attacks on social networks technical report UTDCS-03-09 (2), pp 1–18

  • Huang Y, Yu L, Wang X, Cui B (2015) A multi-source integration framework for user occupation inference in social media systems. World Wide Web 18(5):1247–1267. https://doi.org/10.1007/s11280-014-0300-6

    Article  Google Scholar 

  • Iqbal S, Ismail Z (2011) Buying behavior: gender and socioeconomic class differences on interpersonal in uence susceptibility. Int J Bus Soc Sci 2(4):55–66

    Google Scholar 

  • Jean N, Burke M, Xie M, Davis WM, Lobell DB, Ermon S (2016) Combining satellite imagery and machine learning to predict poverty. Science 353(6301):790–794

    Article  Google Scholar 

  • Jøsang A, Ismail R, Boyd C (2007) A survey of trust and reputation systems for online service provision. Decis Support Syst 43(2):618–644. https://doi.org/10.1016/j.dss.2005.05.019

    Article  Google Scholar 

  • Kong Y-X, Shi G-Y, Wu R-J, Zhang Y-C (2019) k-core: theories and applications. Phys Rep 832:1–32. https://doi.org/10.1016/j.physrep.2019.10.004

    Article  MathSciNet  Google Scholar 

  • Kreidl M (2000) Perceptions of poverty and wealth in western and post-communist countries. Soc Justice Res 13(2):151–176

    Article  Google Scholar 

  • Lampos V, Aletras N, Geyti JK, Zou B, Cox IJ (2016) Inferring the socioeconomic status of social media users based on behaviour and language. In: European conference on information retrieval, Springer, pp 689–695

  • Leo Y, Karsai M, Sarraute C, Fleury E (2018) Correlations and dynamics of consumption patterns in social-economic networks. Soc Netw Anal Min 8(1):9

    Article  Google Scholar 

  • Li Y-M, Kao C-P (2009) Trepps: a trust-based recommender system for peer production services. Expert Syst Appl 36(2, Part 2):3263–3277. https://doi.org/10.1016/j.eswa.2008.01.078

    Article  Google Scholar 

  • Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz R, Himmelfarb J, Bansal N, Lee S-I (2020) From local explanations to global understanding with explainable AI for trees. Nat Mach Intell 2(1):2522–5839

    Article  Google Scholar 

  • Luo S, Morone F, Sarraute C, Travizano M, Makse HA (2017) Inferring personal economic status from social network location. Nat Commun 8(1):15227. https://doi.org/10.1038/ncomms15227

    Article  Google Scholar 

  • Lv X, Jin P, Yue L (2016) User occupation prediction on microblogs. In: Li F, Shim K, Zheng K, Liu G (eds) Web technologies and applications. Springer, Cham, pp 497–501

    Chapter  Google Scholar 

  • Lv X, Jin P, Mu L, Wan S, Yue L (2017) Detecting user occupations on microblogging platforms: an experimental study. In: Chen L, Jensen CS, Shahabi C, Yang X, Lian X (eds) Web and big data. Springer, Cham, pp 331–345

    Chapter  Google Scholar 

  • Matz SC, Menges JI, Stillwell DJ, Schwartz HA (2019) Predicting individual-level income from facebook profiles. PLOS ONE 14(3):1–13. https://doi.org/10.1371/journal.pone.0214369

    Article  Google Scholar 

  • McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Ann Rev Sociol 27(1):415–444

    Article  Google Scholar 

  • Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th international conference on neural information processing systems—volume 2. NIPS’13, Curran Associates Inc., Red Hook, NY, USA, pp 3111–3119

  • Morone F, Makse HA (2015) In uence maximization in complex networks through optimal percolation. Nature 524(7563):65–68. https://doi.org/10.1038/nature14604

    Article  Google Scholar 

  • Page SE (2008) The difference: How the power of diversity creates better groups, firms, schools, and societies. Princeton University Press, Princeton, p 456. https://doi.org/10.2307/j.ctt7sp9c

    Book  Google Scholar 

  • Page L, Brin S, Motwani R, Winograd T (November 1999) The pagerank citation ranking: bringing order to the web. Technical Report 1999-66, Stanford InfoLab . Previous number = SIDL-WP-1999-0120. http://ilpubs.stanford.edu:8090/422/

  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  • Preoţiuc-Pietro D, Volkova S, Lampos V, Bachrach Y, Aletras N (2015a) Studying user income through language, behaviour and affect in social media. PLOS ONE 10(9):1–17. https://doi.org/10.1371/journal.pone.0138717

    Article  Google Scholar 

  • Preoţiuc-Pietro D, Lampos V, Aletras N (2015b) An analysis of the user occupational class through Twitter content. Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Volume 1: Long Papers), pp 1754–1764. https://doi.org/10.3115/v1/P15-1169

  • Rao D, Yarowsky D, Shreevats A, Gupta M (2010) Classifying latent user attributes in twitter. In: SMUC ’10

  • Rizos G, Papadopoulos S, Kompatsiaris Y (2017) Multilabel user classification using the community structure of online networks. PLOS ONE 12(3):1–34. https://doi.org/10.1371/journal.pone.0173347

    Article  Google Scholar 

  • Roth P (2019) In: Holzer B, Stegbauer C (eds) Feld (1981) The focused organization of social ties, Springer, Wiesbaden, pp 185–188

  • Schäfer I, Hansen H, Schön G, Höfels S, Altiner A, Dahlhaus A, Gensichen J, Riedel-Heller S, Weyerer S, Blank WA et al (2012) The in uence of age, gender and socio-economic status on multimorbidity patterns in primary care: first results from the multicare cohort study. BMC Health Serv Res 12(1):89

    Article  Google Scholar 

  • Segalovich I(2003) A fast morphological algorithm with unknown word guessing induced by a dictionary for a web search engine. In: Proceedings of the international conference on machine learning; models, technologies and applications. MLMTA’03. Citeseer

  • Sloan L, Morgan J, Burnap P, Williams M (2015) Who tweets? Deriving the demographic characteristics of age, occupation and social class from twitter user meta-data. PLOS ONE 10(3):1–20. https://doi.org/10.1371/journal.pone.0115545

    Article  Google Scholar 

  • Tsakalidis A, Aletras N, Cristea AI, Liakata M (2018) Nowcasting the stance of social media users in a sudden vote: the case of the greek referendum. In: Proceedings of the 27th ACM international conference on information and knowledge management. CIKM ’18, Association for Computing Machinery, New York, NY, USA, pp 367–376. https://doi.org/10.1145/3269206.3271783.

  • Tsitsulin A, Mottin D, Karras P, Müller E (2018) Verse: versatile graph embeddings from similarity measures. In: Proceedings of the 2018 World Wide Web conference. WWW ’18, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, pp 539–548. https://doi.org/10.1145/3178876.3186120.

  • Tucker-Drob EM, Briley DA (2012) Socioeconomic status modifies interest-knowledge associations among adolescents. Personal Individ Differ 53(1):9–15

    Article  Google Scholar 

  • Vaganov D, Kalinin A, Bochenina K (2020) On inferring monthly expenses of social media users: towards data and approaches. In: Cherifi H, Gaito S, Mendes JF, Moro E, Rocha LM (eds) Complex networks and their applications VIII. Springer, Cham, pp 854–865

    Chapter  Google Scholar 

  • Vaganov D, Funkner A, Kovalchuk S, Guleva V, Bochenina, K (2018) Forecasting purchase categories with transition graphs using financial and social data. In: International conference on social informatics, Springer, pp 439–454

  • Visa Merchant Data Standards Manual (2019). https://usa.visa.com/content/dam/VCOM/download/merchants/visa-merchant-data-standards-manual.pdf. Accessed 4 Feb 2020

  • Vorontsov KV (2014) Additive regularization for topic models of text collections. Doklady Math 89(3):301–304. https://doi.org/10.1134/S1064562414020185

    Article  MathSciNet  MATH  Google Scholar 

  • Vorontsov K, Frei O, Apishev M, Romov P, Dudarenko M (2015) Bigartm: open source library for regularized multimodal topic modeling of large collections. In: AIST

  • Wang X, Yu L, Yao J, Cui B (2013) A multiple feature integration model to infer occupation from social media records. In: Lin X, Manolopoulos Y, Srivastava D, Huang G (eds) Web information systems engineering WISE 2013. Springer, Berlin, pp 137–150

    Chapter  Google Scholar 

  • Wang Q, Gao J, Zhou T, Hu Z, Tian H (2016) Critical size of ego communication networks. EPL (Europhys Lett) 114(5):58004. https://doi.org/10.1209/0295-5075/114/58004

    Article  Google Scholar 

  • Wang J, Gao J, Liu J-H, Yang D, Zhou T (2019) Regional economic status inference from information flow and talent mobility. EPL (Europhys Lett) 125(6):68002

    Article  Google Scholar 

  • Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393(6684):440–442. https://doi.org/10.1038/30918

    Article  MATH  Google Scholar 

  • Xu W, Zhou X, Li L (2008) Inferring privacy information via social relations. In: 2008 IEEE 24th international conference on data engineering workshop, pp 525–530. https://doi.org/10.1109/ICDEW.2008.4498373

  • Yuan W, Guan D, Lee Y-K, Lee S, Hur SJ (2010) Improved trust-aware recommender system using small-worldness of trust networks. Knowl-Based Syst 23(3):232–238. https://doi.org/10.1016/j.knosys.2009.12.004

    Article  Google Scholar 

  • Zamal FA, Liu W, Ruths D (2012) Homophily and latent attribute inference: inferring latent attributes of twitter users from neighbors. In: Proceedings of the sixth international AAAI conference on weblogs and social media homophily, pp 387–390

  • Zhang J, Hu X, Zhang Y, Liu H (2016) Your age is no secret: inferring microbloggers’ ages via content and interaction analysis. In: Proceedings of the 10th international conference on web and social media, ICWSM 2016 (Icwsm), pp 476–485

  • Zheleva E, Getoor L (2009) To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. In: Proceedings of the 18th international conference on world wide web, ACM, pp 531–540

Download references

Acknowledgements

This research is financially supported by The Russian Science Foundation, Agreement #17-71-30029 with co-financing of Bank Saint Petersburg. We are extremely grateful to Max Petrov for data collection from social media. We also much appreciate Mariia Bardina for her assistance with topic modeling.

Author information

Authors and Affiliations

Authors

Contributions

The contribution of all authors to the manuscript is quite balanced. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Alexander Kalinin.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kalinin, A., Vaganov, D. & Bochenina, K. Discovering patterns of customer financial behavior using social media data. Soc. Netw. Anal. Min. 10, 77 (2020). https://doi.org/10.1007/s13278-020-00690-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-020-00690-3

Keywords

Navigation