Skip to main content

Inferring User Profile Using Microblog Content and Friendship Network

  • Conference paper
  • First Online:
Social Media Processing (SMP 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 774))

Included in the following conference series:

Abstract

With the rapid development of microblogs in recent years, accurate prediction of microblog user profiles is valuable for marketing, personalized recommendation, and legal investigation. Microblog users post rich contents everyday and build a complex friendship network with “following” behaviors. Both of user-generated content and friendship network are crucial for user profiling. In this work, we propose a neural-network based model for user profiling. It takes advantages of both user-generated content and friendship network with attentional multi-scale convolutional neural networks and graph embeddings. We evaluate our model on SMP CUP 2016 dataset whose task is to infer age, gender and region of microblog users. The experiment results show that utilizing information from user generated content and friend network, our method obtains the state-of-the-art performance on all of three sub-tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://biendata.com/competition/smpcup2016/.

  2. 2.

    http://pyltp.readthedocs.io/zh_CN/latest/.

  3. 3.

    As training data is insufficient, the model is difficult to learn how to map a specific location that shows in user-generated content to its belonging region. Hence, we construct a region dictionary using geography knowledge and Sina Weibo location information to help our model find the relation between location and region.

  4. 4.

    https://nlp.stanford.edu/projects/glove/.

  5. 5.

    https://github.com/tangjianpku/LINE.

References

  1. Ciot, M., Sonderegger, M., Ruths, D.: Gender inference of twitter users in nonEnglish contexts. In: Proceedings of EMNLP, pp. 18–21 (2013)

    Google Scholar 

  2. Wendy, L., Derek, R.: What’s in a name? Using first names as features for gender inference in twitter. In: AAAI Spring Symposium Series (2013)

    Google Scholar 

  3. Liu, W., Zamal, F.A., Ruths, D.: Using social media to infer gender composition of commuter populations. In: Proceedings of the International Conference on Weblogs and Social Media (2102)

    Google Scholar 

  4. Rao, D., Yarowsky, D.: Detecting latent user properties in social media. In: Proceedings of the NIPS MLSN Workshop (2010)

    Google Scholar 

  5. Pennacchiotti, M., Popescu, A.M.: A machine learning approach to twitter user classification. In: Proceedings of ICWSM (2011)

    Google Scholar 

  6. Conover, M.D., Ratkiewicz, J., Francisco, M., et al.: Political polarization on twitter. In: Proceedings of ICWSM (2011)

    Google Scholar 

  7. Tu, C., Liu, Z., Sun, M.: PRISM: Profession Identification in Social Media with personal information and community structure. In: Proceedings of Social Media Processing (2015)

    Google Scholar 

  8. Rao, D., Yarowsky, D., Shreevats, A., Gupta, M.: Classifying latent user attributes in twitter. In: Proceedings of the 2nd International Workshop on Search and Mining User-Generated Contents, pp. 37–44 (2010)

    Google Scholar 

  9. Rosenthal, S., McKeown, K.: Age prediction in blogs: a study of style, content, and online behavior in pre-and post-social media generations. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Human Language Technologies, vol. 1, pp. 763–772 (2011)

    Google Scholar 

  10. Nguyen, D., Smith, N.A., Rosé, C.P.: Author age prediction from text using linear regression. In: Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, pp. 115–123 (2011)

    Google Scholar 

  11. Burger, J.D., Henderson, J., Kim, G., Zarrella, G.: Discriminating gender on twitter. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1301–1309 (2011)

    Google Scholar 

  12. Al Zamal, F., Liu, W., Ruths, D.: Homophily and latent attribute inference: inferring latent attributes of twitter users from neighbors. In: Proceedings of ICWSM (2012)

    Google Scholar 

  13. Lim, K.H., Datta, A.: Finding twitter communities with common interests using following links of celebrities. In: Proceedings of the 3rd International Workshop on Modeling Social Media, pp. 25–32 (2012)

    Google Scholar 

  14. Tu, C., Liu, Z., Sun, M.: Inferring correspondences from multiple sources for microblog user tags. In: Huang, H., Liu, T., Zhang, H.-P., Tang, J. (eds.) SMP 2014. CCIS, vol. 489, pp. 1–12. Springer, Heidelberg (2014). doi:10.1007/978-3-662-45558-6_1

    Google Scholar 

  15. Gui, L., Xu, R, He, Y., Lu, Q., Wei, Z.: Intersubjectivity and Sentiment: from Language to Knowledge. In: Proceedings of 25th International Joint Conference on Artificial Intelligence (IJCAI) (2016)

    Google Scholar 

  16. Gui, L., Zhou, Y., Xu, R., He, Y., Lu, Q.: Learning representations from heterogeneous network for sentiment classification of product reviews. In: Proceedings of Knowledge-Based Systems, pp. 34–45 (2017)

    Google Scholar 

  17. Yan, X., Yan, L.: Gender classification of weblog authors. In: Proceedings of the Association for the Advancement of Artificial Intelligence. Computational Approaches to Analyzing Weblogs (2006)

    Google Scholar 

  18. Tuv, E., Borisov, A., Runger, G., Torkkola, K.: Feature selection with ensembles, artificial variables, and redundancy elimination. Proc. J. Mach. Learn. Res. 10, 1341–1366 (2009)

    MATH  MathSciNet  Google Scholar 

  19. Houvardas, J., Stamatatos, E.: N-gram feature selection for authorship identification. In: Proceedings of the 12th International Conference on Artificial Intelligence: Methodology, Systems, Applications, pp. 77–86 (2006)

    Google Scholar 

  20. Schler, J., Koppel, M., Argamon, S., Pennebaker, J.: Effects of age and gender on blogging. In: Proceedings of the Association for the Advancement of Artificial Intelligence Spring Symposium Computational Approaches to Analyzing Weblogs (2006)

    Google Scholar 

  21. Eisenstein, J., O’Connor, B., Smith, N.A., et al.: A latent variable model for geographic lexical variation. In: Proceedings of Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 1277–1287 (2010)

    Google Scholar 

  22. Mukherjee, A., Liu, B.: Improving gender classification of blog authors. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Cambridge, MA. Association for Computational Linguistics, October 2010

    Google Scholar 

  23. Rao, D., Fink, C., Oates, T.: Hierarchical Bayesian models for latent attribute detection in social media. In: Proceedings of the 5th International Conference in Weblogs and Social Media (2011)

    Google Scholar 

  24. Sun, X., Guo, J., Ding, X., Liu, T.: A general framework for content-enhanced network representation learning. arXiv preprint (2016)

    Google Scholar 

  25. LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Handwritten digit recognition with a backpropagation network. In: Proceedings of NIPS (1989)

    Google Scholar 

  26. Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: Proceedings of NIPS (2012)

    Google Scholar 

  27. Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1746–1751 (2014)

    Google Scholar 

  28. Collobert, R., Weston, J., Bottou, L., et al.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(8), 2493–2537 (2011)

    MATH  Google Scholar 

  29. Tang, J., Qu, M., Wang, M., et al.: LINE: Large-scale Information Network Embedding. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077 (2015)

    Google Scholar 

  30. Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning (2014)

    Google Scholar 

  31. van der Laurens, M., Hinton, G.: Visualizing data using t-SNE. Proc. J. Mach. Learn. Res. 9, 2579–2605 (2008)

    MATH  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China 61370165, U1636103, 61632011, Shenzhen Foundational Research Funding JCYJ20150625142543470, JCYJ20170307150024907 and Guangdong Provincial Engineering Technology Research Center for Data Science 2016KF09.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruifeng Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Zhao, Z., Du, J., Gao, Q., Gui, L., Xu, R. (2017). Inferring User Profile Using Microblog Content and Friendship Network. In: Cheng, X., Ma, W., Liu, H., Shen, H., Feng, S., Xie, X. (eds) Social Media Processing. SMP 2017. Communications in Computer and Information Science, vol 774. Springer, Singapore. https://doi.org/10.1007/978-981-10-6805-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6805-8_3

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6804-1

  • Online ISBN: 978-981-10-6805-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics