skip to main content
10.1145/3443279.3443300acmotherconferencesArticle/Chapter ViewAbstractPublication PagesnlpirConference Proceedingsconference-collections
research-article

Age Inference on Twitter using SAGE and TF-IGM

Published:01 February 2021Publication History

ABSTRACT

Social media is increasingly influential in day-to-day life. People are more than ever sharing, posting, liking, and following different activities on disparate social media. Deriving specific attributes of users based on their online behavior is a growing research field. In this study, a novel methodology is proposed for determining the age of Twitter users. We classify three separate age groups, namely, 18--24, 25--54, 55 >. We compute numerous linguistic features from the tweets of users, obtain significant terms extracted by the SAGE algorithms, and retrieve relevant meta-data of users by extracting information on their followed interests on Twitter using TF-IGM. The final logistic regression model obtains a macro F1-score of 78%. This way, effectively combining NLP and IR techniques for attribute inference on social media.

References

  1. Aletras, N., & Chamberlain, B. P. (2018). Predicting twitter user socioeconomic attributes with network and language information. In Proceedings of the 29th on Hypertext and Social Media (pp. 20--24).Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bamman, D., & Smith, N. A. (2015, April). Contextualized sarcasm detection on twitter. In Ninth international AAAI conference on web and social media.Google ScholarGoogle Scholar
  3. Burger, J. D., Henderson, J., Kim, G., & Zarrella, G. (2011, July). Discriminating gender on Twitter. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (pp. 1301--1309).Google ScholarGoogle Scholar
  4. Chamberlain, B. P., Humby, C., & Deisenroth, M. P. (2017, September). Probabilistic inference of twitter users' age based on what they follow. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 191--203). Springer, Cham.Google ScholarGoogle ScholarCross RefCross Ref
  5. Chen, K., Zhang, Z., Long, J., & Zhang, H. (2016). Turning from TF-IDF to TF-IGM for term weighting in text classification. Expert Systems with Applications, 66, 245--260.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Coppersmith, G., Dredze, M., & Harman, C. (2014, June). Quantifying mental health signals in Twitter. In Proceedings of the workshop on computational linguistics and clinical psychology: From linguistic signal to clinical reality (pp. 51--60).Google ScholarGoogle ScholarCross RefCross Ref
  7. Culotta, A., Kumar, N. R., & Cutler, J. (2015, January). Predicting the Demographics of Twitter Users from Website Traffic Data. In AAAI (Vol. 15, pp. 72--8).Google ScholarGoogle Scholar
  8. Debole, F., & Sebastiani, F. (2004). Supervised term weighting for automated text categorization. In Text mining and its applications (pp. 81--97). Springer, Berlin, Heidelberg.Google ScholarGoogle ScholarCross RefCross Ref
  9. Duan, Y., Chen, Z., Wei, F., Zhou, M., & Shum, H. Y. (2012, December). Twitter topic summarization by ranking tweets using social influence and content quality. In Proceedings of COLING 2012 (pp. 763--780).Google ScholarGoogle Scholar
  10. Eisenstein, J., Ahmed, A., & Xing, E. P. (2011). Sparse additive generative models of text.Google ScholarGoogle Scholar
  11. Fang, A., Macdonald, C., Ounis, I., & Habel, P. (2016, March). Topics in tweets: A user study of topic coherence metrics for Twitter data. In European Conference on Information Retrieval (pp. 492--504). Springer, Cham.Google ScholarGoogle Scholar
  12. Fortin, D., Uncles, M., Burton, S., & Soboleva, A. (2011). Interactive or reactive? Marketing with Twitter. Journal of Consumer Marketing.Google ScholarGoogle Scholar
  13. Golbeck, J., Robles, C., Edmondson, M., & Turner, K. (2011, October). Predicting personality from twitter. In 2011 IEEE third international conference on privacy, security, risk and trust and 2011 IEEE third international conference on social computing (pp. 149--156). IEEE.Google ScholarGoogle Scholar
  14. Jebara, T. (2012). Machine learning: discriminative and generative (Vol. 755). Springer Science & Business Media.Google ScholarGoogle Scholar
  15. Kateb, F., & Kalita, J. (2015). Classifying short text in social media: Twitter as case study. International Journal of Computer Applications, 111(9).Google ScholarGoogle ScholarCross RefCross Ref
  16. Kumar, S., Morstatter, F., & Liu, H. (2014). Twitter data analytics (pp. 1041--4347). New York, NY: Springer New York.Google ScholarGoogle Scholar
  17. Luo, J., Du, J., Tao, C., Xu, H., & Zhang, Y. (2018, June). Exploring Temporal Patterns of Suicidal Behavior on Twitter. In 2018 IEEE International Conference on Healthcare Informatics Workshop (ICHI-W) (pp. 55--56). IEEE.Google ScholarGoogle Scholar
  18. Morgan-Lopez, A. A., Kim, A. E., Chew, R. F., & Ruddle, P. (2017). Predicting age groups of Twitter users based on language and metadata features. PloS one, 12(8), e0183537.Google ScholarGoogle ScholarCross RefCross Ref
  19. Nguyen, D., Gravel, R., Trieschnigg, D., & Meder, T. (2013). "How Old Do You Think I Am?"; A Study of Language and Age in Twitter. In Proceedings of the seventh international AAAI conference on weblogs and social media. AAAI Press.Google ScholarGoogle Scholar
  20. Nguyen, D., Trieschnigg, D., Doğruöz, A. S., Gravel, R., Theune, M., Meder, T., & De Jong, F. (2014, August). Why gender and age prediction from tweets is hard: Lessons from a crowdsourcing experiment. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers (pp. 1950--1961).Google ScholarGoogle Scholar
  21. Nguyen, T., Phung, D., Adams, B., & Venkatesh, S. (2011, October). Prediction of age, sentiment, and connectivity from social media text. In International Conference on Web Information Systems Engineering (pp. 227--240). Springer, Berlin, Heidelberg.Google ScholarGoogle Scholar
  22. Perl, J., Wagner, C., Kunegis, J., & Staab, S. (2015, June). Twitter as a Political Network: Predicting the Following and Unfollowing Behavior of German Politicians. In Proceedings of the ACM Web Science Conference (pp. 1--2).Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., ... & Ungar, L. H. (2013). Personality, gender, and age in the language of social media: The open-vocabulary approach. PloS one, 8(9), e73791.Google ScholarGoogle ScholarCross RefCross Ref
  24. Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., & Demirbas, M. (2010, July). Short text classification in twitter to improve information filtering. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval (pp. 841--842).Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Tsugawa, S., Kikuchi, Y., Kishino, F., Nakajima, K., Itoh, Y., & Ohsaki, H. (2015, April). Recognizing depression from twitter activity. In Proceedings of the 33rd annual ACM conference on human factors in computing systems (pp. 3187--3196).Google ScholarGoogle Scholar
  26. Uysal, I., & Croft, W. B. (2011, October). User oriented tweet ranking: a filtering approach to microblogs. In Proceedings of the 20th ACM international conference on Information and knowledge management (pp. 2261--2264).Google ScholarGoogle Scholar
  27. Wang, W., Chen, L., Thirunarayan, K., & Sheth, A. P. (2014, February). Cursing in english on twitter. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing (pp. 415--425).Google ScholarGoogle Scholar
  28. Wang, Y., & Youn, H. Y. (2019). Feature Weighting Based on Inter-Category and Intra-Category Strength for Twitter Sentiment Analysis. Applied Sciences, 9(1), 92.Google ScholarGoogle ScholarCross RefCross Ref
  29. Yang, W., Fu, Y., & Zhang, D. (2016, July). An Improved Parallel Algorithm for Text Categorization. In 2016 International Symposium on Computer, Consumer and Control (IS3C) (pp. 451--454). IEEE.Google ScholarGoogle Scholar

Index Terms

  1. Age Inference on Twitter using SAGE and TF-IGM

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      NLPIR '20: Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval
      December 2020
      217 pages
      ISBN:9781450377607
      DOI:10.1145/3443279

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 February 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader