Skip to main content

Neural Gender Prediction from News Browsing Data

  • Conference paper
  • First Online:
Book cover Chinese Computational Linguistics (CCL 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11856))

Included in the following conference series:

Abstract

Online news platforms have attracted massive users to read digital news online. The demographic information of these users such as gender is critical for these platforms to provide personalized services such as news recommendation and targeted advertising. However, the gender information of many users in online news platforms is not available. Fortunately, male and female users usually have different pattern in reading online news. Thus, the news browsing data of users can provide useful clues for inferring their genders. In this paper, we propose a neural gender prediction approach based on the news browsing data of users. Usually a news article has different kinds of information such as title, body and categories. However, the characteristics of these components are very different, and they should be processed differently. Thus, we propose to learn unified user representations for gender prediction by incorporating different components of browsed news as different views of users. In each view, we use a hierarchical framework to first learn news representations and then learn user representations from news representations. In addition, since different words in news titles and bodies usually have different informativeness for learning news representations, we use attention mechanisms to select important words. Besides, since different news articles may also have different informativeness for gender prediction, we use news-level attentions to attend to important news articles for learning informative user representations. Extensive experiments on a real-world dataset validate the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.msn.com/en-us/news.

References

  1. Buraya, K., Farseev, A., Filchenkov, A.: Multi-view personality profiling based on longitudinal data. In: Bellot, P., et al. (eds.) CLEF 2018. LNCS, vol. 11018, pp. 15–27. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98932-7_2

    Chapter  Google Scholar 

  2. Ciccone, G., Sultan, A., Laporte, L., Egyed-Zsigmond, E., Alhamzeh, A., Granitzer, M.: Stacked gender prediction from tweet texts and images notebook for pan at CLEF 2018. In: CLEF, 11 p. (2018)

    Google Scholar 

  3. Culotta, A., Kumar, N.R., Cutler, J.: Predicting the demographics of Twitter users from website traffic data. In: AAAI, pp. 72–78 (2015)

    Google Scholar 

  4. Das, A.S., Datar, M., Garg, A., Rajaram, S.: Google news personalization: scalable online collaborative filtering. In: WWW, pp. 271–280. ACM (2007)

    Google Scholar 

  5. Farnadi, G., Tang, J., De Cock, M., Moens, M.F.: User profiling through deep multimodal fusion. In: WSDM, pp. 171–179 (2018)

    Google Scholar 

  6. Filippova, K.: User demographics and language in an implicit social network. In: EMNLP, pp. 1478–1488 (2012)

    Google Scholar 

  7. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  8. Hu, J., Zeng, H.J., Li, H., Niu, C., Chen, Z.: Demographic prediction based on user’s browsing behavior. In: WWW, pp. 151–160 (2007)

    Google Scholar 

  9. Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP, pp. 1746–1751 (2014)

    Google Scholar 

  10. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  11. Li, W., Dickinson, M.: Gender prediction for Chinese social media data. In: RANLP, pp. 438–445 (2017)

    Google Scholar 

  12. Mac Kim, S., Xu, Q., Qu, L., Wan, S., Paris, C.: Demographic inference on Twitter using recursive neural networks. In: ACL, vol. 2, pp. 471–477 (2017)

    Google Scholar 

  13. Malmi, E., Weber, I.: You are what apps you use: demographic prediction based on user’s apps. In: ICWSM, pp. 635–638 (2016)

    Google Scholar 

  14. Mislove, A., Lehmann, S., Ahn, Y.Y., Onnela, J.P., Rosenquist, J.N.: Understanding the demographics of Twitter users. In: 2011 5th ICWSM, vol. 25 (2011)

    Google Scholar 

  15. Mukherjee, S., Bala, P.K.: Gender classification of microblog text based on authorial style. IseB 15(1), 117–138 (2017)

    Article  Google Scholar 

  16. Nguyen, D., Gravel, R., Trieschnigg, D., Meder, T.: “How old do you think i am?” a study of language and age in Twitter. In: ICWSM, pp. 439–448 (2013)

    Google Scholar 

  17. Nguyen, D., Smith, N.A., Rosé, C.P.: Author age prediction from text using linear regression. In: Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 115–123 (2011)

    Google Scholar 

  18. Nguyen, D., Trieschnigg, D., Doğruöz, A.S., Gravel, R., Theune, M., Meder, T., De Jong, F.: Why gender and age prediction from tweets is hard: lessons from a crowdsourcing experiment. In: COLING, pp. 1950–1961 (2014)

    Google Scholar 

  19. Peersman, C., Daelemans, W., Van Vaerenbergh, L.: Predicting age and gender in online social networks. In: SMUC, pp. 37–44 (2011)

    Google Scholar 

  20. Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)

    Google Scholar 

  21. Phuong, T.M., et al.: Gender prediction using browsing history. In: Huynh, V., Denoeux, T., Tran, D., Le, A., Pham, S. (eds.) Knowledge and Systems Engineering, vol. 244, pp. 271–283. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-02741-8_24

    Chapter  Google Scholar 

  22. Rangel, F., Rosso, P., Montes-y Gómez, M., Potthast, M., Stein, B.: Overview of the 6th author profiling task at pan 2018: multimodal gender identification in Twitter. Working Notes Papers of the CLEF (2018)

    Google Scholar 

  23. Rangel Pardo, F.M., Celli, F., Rosso, P., Potthast, M., Stein, B., Daelemans, W.: Overview of the 3rd author profiling task at pan 2015. In: CLEF, pp. 1–8 (2015)

    Google Scholar 

  24. Reddy, T.R., Vardhan, B.V., Reddy, P.V.: N-gram approach for gender prediction. In: IACC, pp. 860–865. IEEE (2017)

    Google Scholar 

  25. Rosenthal, S., McKeown, K.: Age prediction in blogs: a study of style, content, and online behavior in pre-and post-social media generations. In: ACL, pp. 763–772 (2011)

    Google Scholar 

  26. Sezerer, E., Polatbilek, O., Sevgili, Ö., Tekir, S.: Gender prediction from tweets with convolutional neural networks: notebook for pan at CLEF 2018. In: CLEF (2018)

    Google Scholar 

  27. Wang, J., Li, S., Zhou, G.: Joint learning on relevant user attributes in micro-blog. In: IJCAI, pp. 4130–4136 (2017)

    Google Scholar 

  28. Wang, L., Li, Q., Chen, X., Li, S.: Multi-task learning for gender and age prediction on chinese microblog. In: Lin, C.-Y., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds.) ICCPOL/NLPCC -2016. LNCS (LNAI), vol. 10102, pp. 189–200. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50496-4_16

    Chapter  Google Scholar 

  29. Wu, C., Wu, F., Liu, J., He, S., Huang, Y., Xie, X.: Neural demographic prediction using search query. In: WSDM, pp. 654–662. ACM (2019)

    Google Scholar 

  30. Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: NAACL, pp. 1480–1489 (2016)

    Google Scholar 

  31. Zhang, D., Li, S., Wang, H., Zhou, G.: User classification with multiple textual perspectives. In: COLING, pp. 2112–2121 (2016)

    Google Scholar 

Download references

Acknowledgments

The authors would like to thank Microsoft News for providing technical support and data in the experiments, and Jiun-Hung Chen (Microsoft News) and Ying Qiao (Microsoft News) for their support and discussions. This work was supported by the National Key Research and Development Program of China under Grant number 2018YFC1604002, the National Natural Science Foundation of China under Grant numbers U1836204, U1705261, U1636113, U1536201, and U1536207, and the Tsinghua University Initiative Scientific Research Program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chuhan Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, C., Wu, F., Qi, T., Huang, Y., Xie, X. (2019). Neural Gender Prediction from News Browsing Data. In: Sun, M., Huang, X., Ji, H., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics. CCL 2019. Lecture Notes in Computer Science(), vol 11856. Springer, Cham. https://doi.org/10.1007/978-3-030-32381-3_53

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32381-3_53

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32380-6

  • Online ISBN: 978-3-030-32381-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics