Abstract
With the rapid development of microblogs in recent years, accurate prediction of microblog user profiles is valuable for marketing, personalized recommendation, and legal investigation. Microblog users post rich contents everyday and build a complex friendship network with “following” behaviors. Both of user-generated content and friendship network are crucial for user profiling. In this work, we propose a neural-network based model for user profiling. It takes advantages of both user-generated content and friendship network with attentional multi-scale convolutional neural networks and graph embeddings. We evaluate our model on SMP CUP 2016 dataset whose task is to infer age, gender and region of microblog users. The experiment results show that utilizing information from user generated content and friend network, our method obtains the state-of-the-art performance on all of three sub-tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
As training data is insufficient, the model is difficult to learn how to map a specific location that shows in user-generated content to its belonging region. Hence, we construct a region dictionary using geography knowledge and Sina Weibo location information to help our model find the relation between location and region.
- 4.
- 5.
References
Ciot, M., Sonderegger, M., Ruths, D.: Gender inference of twitter users in nonEnglish contexts. In: Proceedings of EMNLP, pp. 18–21 (2013)
Wendy, L., Derek, R.: What’s in a name? Using first names as features for gender inference in twitter. In: AAAI Spring Symposium Series (2013)
Liu, W., Zamal, F.A., Ruths, D.: Using social media to infer gender composition of commuter populations. In: Proceedings of the International Conference on Weblogs and Social Media (2102)
Rao, D., Yarowsky, D.: Detecting latent user properties in social media. In: Proceedings of the NIPS MLSN Workshop (2010)
Pennacchiotti, M., Popescu, A.M.: A machine learning approach to twitter user classification. In: Proceedings of ICWSM (2011)
Conover, M.D., Ratkiewicz, J., Francisco, M., et al.: Political polarization on twitter. In: Proceedings of ICWSM (2011)
Tu, C., Liu, Z., Sun, M.: PRISM: Profession Identification in Social Media with personal information and community structure. In: Proceedings of Social Media Processing (2015)
Rao, D., Yarowsky, D., Shreevats, A., Gupta, M.: Classifying latent user attributes in twitter. In: Proceedings of the 2nd International Workshop on Search and Mining User-Generated Contents, pp. 37–44 (2010)
Rosenthal, S., McKeown, K.: Age prediction in blogs: a study of style, content, and online behavior in pre-and post-social media generations. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Human Language Technologies, vol. 1, pp. 763–772 (2011)
Nguyen, D., Smith, N.A., Rosé, C.P.: Author age prediction from text using linear regression. In: Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, pp. 115–123 (2011)
Burger, J.D., Henderson, J., Kim, G., Zarrella, G.: Discriminating gender on twitter. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1301–1309 (2011)
Al Zamal, F., Liu, W., Ruths, D.: Homophily and latent attribute inference: inferring latent attributes of twitter users from neighbors. In: Proceedings of ICWSM (2012)
Lim, K.H., Datta, A.: Finding twitter communities with common interests using following links of celebrities. In: Proceedings of the 3rd International Workshop on Modeling Social Media, pp. 25–32 (2012)
Tu, C., Liu, Z., Sun, M.: Inferring correspondences from multiple sources for microblog user tags. In: Huang, H., Liu, T., Zhang, H.-P., Tang, J. (eds.) SMP 2014. CCIS, vol. 489, pp. 1–12. Springer, Heidelberg (2014). doi:10.1007/978-3-662-45558-6_1
Gui, L., Xu, R, He, Y., Lu, Q., Wei, Z.: Intersubjectivity and Sentiment: from Language to Knowledge. In: Proceedings of 25th International Joint Conference on Artificial Intelligence (IJCAI) (2016)
Gui, L., Zhou, Y., Xu, R., He, Y., Lu, Q.: Learning representations from heterogeneous network for sentiment classification of product reviews. In: Proceedings of Knowledge-Based Systems, pp. 34–45 (2017)
Yan, X., Yan, L.: Gender classification of weblog authors. In: Proceedings of the Association for the Advancement of Artificial Intelligence. Computational Approaches to Analyzing Weblogs (2006)
Tuv, E., Borisov, A., Runger, G., Torkkola, K.: Feature selection with ensembles, artificial variables, and redundancy elimination. Proc. J. Mach. Learn. Res. 10, 1341–1366 (2009)
Houvardas, J., Stamatatos, E.: N-gram feature selection for authorship identification. In: Proceedings of the 12th International Conference on Artificial Intelligence: Methodology, Systems, Applications, pp. 77–86 (2006)
Schler, J., Koppel, M., Argamon, S., Pennebaker, J.: Effects of age and gender on blogging. In: Proceedings of the Association for the Advancement of Artificial Intelligence Spring Symposium Computational Approaches to Analyzing Weblogs (2006)
Eisenstein, J., O’Connor, B., Smith, N.A., et al.: A latent variable model for geographic lexical variation. In: Proceedings of Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 1277–1287 (2010)
Mukherjee, A., Liu, B.: Improving gender classification of blog authors. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Cambridge, MA. Association for Computational Linguistics, October 2010
Rao, D., Fink, C., Oates, T.: Hierarchical Bayesian models for latent attribute detection in social media. In: Proceedings of the 5th International Conference in Weblogs and Social Media (2011)
Sun, X., Guo, J., Ding, X., Liu, T.: A general framework for content-enhanced network representation learning. arXiv preprint (2016)
LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Handwritten digit recognition with a backpropagation network. In: Proceedings of NIPS (1989)
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: Proceedings of NIPS (2012)
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1746–1751 (2014)
Collobert, R., Weston, J., Bottou, L., et al.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(8), 2493–2537 (2011)
Tang, J., Qu, M., Wang, M., et al.: LINE: Large-scale Information Network Embedding. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077 (2015)
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning (2014)
van der Laurens, M., Hinton, G.: Visualizing data using t-SNE. Proc. J. Mach. Learn. Res. 9, 2579–2605 (2008)
Acknowledgments
This work was supported by the National Natural Science Foundation of China 61370165, U1636103, 61632011, Shenzhen Foundational Research Funding JCYJ20150625142543470, JCYJ20170307150024907 and Guangdong Provincial Engineering Technology Research Center for Data Science 2016KF09.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhao, Z., Du, J., Gao, Q., Gui, L., Xu, R. (2017). Inferring User Profile Using Microblog Content and Friendship Network. In: Cheng, X., Ma, W., Liu, H., Shen, H., Feng, S., Xie, X. (eds) Social Media Processing. SMP 2017. Communications in Computer and Information Science, vol 774. Springer, Singapore. https://doi.org/10.1007/978-981-10-6805-8_3
Download citation
DOI: https://doi.org/10.1007/978-981-10-6805-8_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6804-1
Online ISBN: 978-981-10-6805-8
eBook Packages: Computer ScienceComputer Science (R0)