Skip to main content

Age Detection for Chinese Users in Weibo

  • Conference paper
  • First Online:
Web-Age Information Management (WAIM 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9098))

Included in the following conference series:

Abstract

Age is one of the most important attributes in one user’s profile. Age detection has many applications like personalized search, targeted advertisement and recommendation. Current research has uncovered the relationship between the use of western language and social identities to some extents. However, the age detection problem for Chinese users is so far unexplored. Due to the cultural and societal difference, some well known features in English may not be applicable to the Chinese users. For example, while the frequency of capitalized letter in English has proved to be a good feature, Chinese users do not have such patterns. Moreover, Chinese has its own characteristics such as rich emoticons, complex syntax and unique lexicon structures. Hence age detection for Chinese users is a new big challenge.

In this paper, we present our age detection study on a corpus of microblogs from 3200 users in Sina Weibo. We construct three types of Chinese language patterns, including stylistic, lexical, and syntactic features, and then investigate their effects on age prediction. We find a number of interesting language patterns: (1) there is a significant topic divergence among Chinese people in various age groups, (2) the young people are open and easy to accept new slangs from the internet or foreign languages, and (3) the young adult people exhibit distinguished syntactic structures from all other people. Our best result reaches an accuracy of 88% when classifying users into four age groups.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bergsma, S., Durme, B.V.: Using conceptual class attributes to characterize social media users. In: Proc. of ACL, pp. 710–720 (2013)

    Google Scholar 

  2. Cheng, N., Chen, X., Chandramouli, R., Subbalakshmi, K.P.: Gender identification from e-mails. In: CIDM, pp. 154–158 (2009)

    Google Scholar 

  3. Garera, N., Yarowsky, D.: Modeling latent biographic attributes in conversational genres. In: Proc. of ACL and IJCNLP, pp. 710–718 (2009)

    Google Scholar 

  4. Goswami, S., Sarkar, S., Rustagi, M.: Stylometric analysis of bloggers’ age and gender. In: Proc. of ICWSM, pp. 214–217 (2009)

    Google Scholar 

  5. Gressel, G., Hrudya, P., Surendran, K., Thara, S., Aravind, A., Poornachandran, P.: Ensemble learning approach for author profiling. In: PAN at CLEF (2014)

    Google Scholar 

  6. Kabbur, S., Han, E.H., Karypis, G.: Content-based methods for predicting web-site demographic attributes. In: Proc. of ICDM (2010)

    Google Scholar 

  7. Kosinski, M., Stillwell, D., Graepel, T.: Private traits and attributes are predictable from digital records of human behavior. PNAS 110, 5802–5805 (2013)

    Article  Google Scholar 

  8. Li, J., Ritter, A., Hovy, E.: Weakly supervised user profile extraction from twitter. In: Proc. of ACL, pp. 165–174 (2014)

    Google Scholar 

  9. Mislove, A., Viswanath, B., Gummadi, P.K., Druschel, P.: You are who you know: inferring user profiles in online social networks. In: Proc. of WSDM, pp. 251–260 (2010)

    Google Scholar 

  10. Mukherjee, A., Liu, B.: Improving gender classification of blog authors. In: Proc. of EMNLP, pp. 207–217 (2010)

    Google Scholar 

  11. Nguyen, D., Gravel, R., Trieschnigg, D., Meder, T.: “how old do you think i am?”: A study of language and age in twitter. In: Proc. of ICWSM, pp. 439–448 (2013)

    Google Scholar 

  12. Nguyen, D., Smith, N.A., Rosé, C.P.: Author age prediction from text using linear regression. In: Proc. of the 5th ACL-HLT Workshop, pp. 115–123 (2011)

    Google Scholar 

  13. Nguyen, D., Trieschnigg, D., Dog̀ruöz, A.S., Grave, R., Theune, M., Meder, T., de Jong, F.: Why gender and age prediction from tweets is hard: lessons from a crowdsourcing experiment. In: Proc. of COLING, pp. 1950–1961 (2014)

    Google Scholar 

  14. Otterbacher, J.: Inferring gender of movie reviewers: exploiting writing style, content and metadata. In: Proc. of CIKM, pp. 369–378 (2010)

    Google Scholar 

  15. Peersman, C., Daelemans, W., Vaerenbergh, L.V.: Predicting age and gender in online social networks. In: Proc. of SMUC, pp. 37–44 (2011)

    Google Scholar 

  16. Rao, D., Yarowsky, D., Shreevats, A., Gupta, M.: Classifying latent user attributes in twitter. In: Proc. of SMUC, pp. 37–44 (2010)

    Google Scholar 

  17. Rosenthal, S., McKeown, K.: Age prediction in blogs: a study of style, content, and online behavior in pre- and post-social media generations. In: Proc. of ACL, pp. 763–772 (2011)

    Google Scholar 

  18. Schler, J., Koppel, M., Argamon, S., Pennebaker, J.W.: Effects of age and gender on blogging. In: Proc. of AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs, pp. 199–205 (2005)

    Google Scholar 

  19. Tam, J., Martell., C.H.: Age detection in chat. In: Proc. of ICSC, pp. 33–39 (2009)

    Google Scholar 

  20. Xiao, C., Zhou, F., Wu, Y.: Predicting audience gender in online content-sharing social networks. JASIST 64, 1284–1297 (2013)

    Article  Google Scholar 

  21. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proc. of ICML, pp. 412–420 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tieyun Qian .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Chen, L., Qian, T., Wang, F., You, Z., Peng, Q., Zhong, M. (2015). Age Detection for Chinese Users in Weibo. In: Dong, X., Yu, X., Li, J., Sun, Y. (eds) Web-Age Information Management. WAIM 2015. Lecture Notes in Computer Science(), vol 9098. Springer, Cham. https://doi.org/10.1007/978-3-319-21042-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-21042-1_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-21041-4

  • Online ISBN: 978-3-319-21042-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics