research-article

Age Inference on Twitter using SAGE and TF-IGM

Authors:
Joran Cornelisse

University of Amsterdam, Sciencepark, Amsterdam

University of Amsterdam, Sciencepark, Amsterdam
View Profile

,
Reshmi Gopalakrishna Pillai

University of Amsterdam, Sciencepark, Amsterdam

University of Amsterdam, Sciencepark, Amsterdam
View Profile

NLPIR '20: Proceedings of the 4th International Conference on Natural Language Processing and Information RetrievalDecember 2020Pages 24–30https://doi.org/10.1145/3443279.3443300

Published:01 February 2021Publication History

NLPIR '20: Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval

Pages 24–30

ABSTRACT

Social media is increasingly influential in day-to-day life. People are more than ever sharing, posting, liking, and following different activities on disparate social media. Deriving specific attributes of users based on their online behavior is a growing research field. In this study, a novel methodology is proposed for determining the age of Twitter users. We classify three separate age groups, namely, 18--24, 25--54, 55 >. We compute numerous linguistic features from the tweets of users, obtain significant terms extracted by the SAGE algorithms, and retrieve relevant meta-data of users by extracting information on their followed interests on Twitter using TF-IGM. The final logistic regression model obtains a macro F1-score of 78%. This way, effectively combining NLP and IR techniques for attribute inference on social media.

References

Aletras, N., & Chamberlain, B. P. (2018). Predicting twitter user socioeconomic attributes with network and language information. In Proceedings of the 29th on Hypertext and Social Media (pp. 20--24).Google ScholarDigital Library
Bamman, D., & Smith, N. A. (2015, April). Contextualized sarcasm detection on twitter. In Ninth international AAAI conference on web and social media.Google Scholar
Burger, J. D., Henderson, J., Kim, G., & Zarrella, G. (2011, July). Discriminating gender on Twitter. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (pp. 1301--1309).Google Scholar
Chamberlain, B. P., Humby, C., & Deisenroth, M. P. (2017, September). Probabilistic inference of twitter users' age based on what they follow. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 191--203). Springer, Cham.Google ScholarCross Ref
Chen, K., Zhang, Z., Long, J., & Zhang, H. (2016). Turning from TF-IDF to TF-IGM for term weighting in text classification. Expert Systems with Applications, 66, 245--260.Google ScholarDigital Library
Coppersmith, G., Dredze, M., & Harman, C. (2014, June). Quantifying mental health signals in Twitter. In Proceedings of the workshop on computational linguistics and clinical psychology: From linguistic signal to clinical reality (pp. 51--60).Google ScholarCross Ref
Culotta, A., Kumar, N. R., & Cutler, J. (2015, January). Predicting the Demographics of Twitter Users from Website Traffic Data. In AAAI (Vol. 15, pp. 72--8).Google Scholar
Debole, F., & Sebastiani, F. (2004). Supervised term weighting for automated text categorization. In Text mining and its applications (pp. 81--97). Springer, Berlin, Heidelberg.Google ScholarCross Ref
Duan, Y., Chen, Z., Wei, F., Zhou, M., & Shum, H. Y. (2012, December). Twitter topic summarization by ranking tweets using social influence and content quality. In Proceedings of COLING 2012 (pp. 763--780).Google Scholar
Eisenstein, J., Ahmed, A., & Xing, E. P. (2011). Sparse additive generative models of text.Google Scholar
Fang, A., Macdonald, C., Ounis, I., & Habel, P. (2016, March). Topics in tweets: A user study of topic coherence metrics for Twitter data. In European Conference on Information Retrieval (pp. 492--504). Springer, Cham.Google Scholar
Fortin, D., Uncles, M., Burton, S., & Soboleva, A. (2011). Interactive or reactive? Marketing with Twitter. Journal of Consumer Marketing.Google Scholar
Golbeck, J., Robles, C., Edmondson, M., & Turner, K. (2011, October). Predicting personality from twitter. In 2011 IEEE third international conference on privacy, security, risk and trust and 2011 IEEE third international conference on social computing (pp. 149--156). IEEE.Google Scholar
Jebara, T. (2012). Machine learning: discriminative and generative (Vol. 755). Springer Science & Business Media.Google Scholar
Kateb, F., & Kalita, J. (2015). Classifying short text in social media: Twitter as case study. International Journal of Computer Applications, 111(9).Google ScholarCross Ref
Kumar, S., Morstatter, F., & Liu, H. (2014). Twitter data analytics (pp. 1041--4347). New York, NY: Springer New York.Google Scholar
Luo, J., Du, J., Tao, C., Xu, H., & Zhang, Y. (2018, June). Exploring Temporal Patterns of Suicidal Behavior on Twitter. In 2018 IEEE International Conference on Healthcare Informatics Workshop (ICHI-W) (pp. 55--56). IEEE.Google Scholar
Morgan-Lopez, A. A., Kim, A. E., Chew, R. F., & Ruddle, P. (2017). Predicting age groups of Twitter users based on language and metadata features. PloS one, 12(8), e0183537.Google ScholarCross Ref
Nguyen, D., Gravel, R., Trieschnigg, D., & Meder, T. (2013). "How Old Do You Think I Am?"; A Study of Language and Age in Twitter. In Proceedings of the seventh international AAAI conference on weblogs and social media. AAAI Press.Google Scholar
Nguyen, D., Trieschnigg, D., Doğruöz, A. S., Gravel, R., Theune, M., Meder, T., & De Jong, F. (2014, August). Why gender and age prediction from tweets is hard: Lessons from a crowdsourcing experiment. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers (pp. 1950--1961).Google Scholar
Nguyen, T., Phung, D., Adams, B., & Venkatesh, S. (2011, October). Prediction of age, sentiment, and connectivity from social media text. In International Conference on Web Information Systems Engineering (pp. 227--240). Springer, Berlin, Heidelberg.Google Scholar
Perl, J., Wagner, C., Kunegis, J., & Staab, S. (2015, June). Twitter as a Political Network: Predicting the Following and Unfollowing Behavior of German Politicians. In Proceedings of the ACM Web Science Conference (pp. 1--2).Google ScholarDigital Library
Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., ... & Ungar, L. H. (2013). Personality, gender, and age in the language of social media: The open-vocabulary approach. PloS one, 8(9), e73791.Google ScholarCross Ref
Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., & Demirbas, M. (2010, July). Short text classification in twitter to improve information filtering. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval (pp. 841--842).Google ScholarDigital Library
Tsugawa, S., Kikuchi, Y., Kishino, F., Nakajima, K., Itoh, Y., & Ohsaki, H. (2015, April). Recognizing depression from twitter activity. In Proceedings of the 33rd annual ACM conference on human factors in computing systems (pp. 3187--3196).Google Scholar
Uysal, I., & Croft, W. B. (2011, October). User oriented tweet ranking: a filtering approach to microblogs. In Proceedings of the 20th ACM international conference on Information and knowledge management (pp. 2261--2264).Google Scholar
Wang, W., Chen, L., Thirunarayan, K., & Sheth, A. P. (2014, February). Cursing in english on twitter. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing (pp. 415--425).Google Scholar
Wang, Y., & Youn, H. Y. (2019). Feature Weighting Based on Inter-Category and Intra-Category Strength for Twitter Sentiment Analysis. Applied Sciences, 9(1), 92.Google ScholarCross Ref
Yang, W., Fu, Y., & Zhang, D. (2016, July). An Improved Parallel Algorithm for Text Categorization. In 2016 International Symposium on Computer, Consumer and Control (IS3C) (pp. 451--454). IEEE.Google Scholar

Index Terms

Age Inference on Twitter using SAGE and TF-IGM
1. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Environment-specific retrieval
        Web and social media search

Recommendations

A sentiment analysis of audiences on twitter: who is the positive or negative audience of popular twitterers?
ICHIT'11: Proceedings of the 5th international conference on Convergence and hybrid information technology

Microblogging is a new informal communication medium of blogging that differs from a traditional blog in which content is much shorter. Microbloggers post about topics that describe their current status. Twitter is a popular microblogging service and ...
Read More
Information resonance on Twitter: watching Iran
SOMA '10: Proceedings of the First Workshop on Social Media Analytics

Twitter has undoubtedly caught the attention of both the general public, and academia as a microblogging service worthy of study and attention. Twitter has several features that sets it apart from other social media/networking sites, including its 140 ...
Read More
Disinformation Warfare: Understanding State-Sponsored Trolls on Twitter and Their Influence on the Web
WWW '19: Companion Proceedings of The 2019 World Wide Web Conference

Over the past couple of years, anecdotal evidence has emerged linking coordinated campaigns by state-sponsored actors with efforts to manipulate public opinion on the Web, often around major political events, through dedicated accounts, or “trolls.” ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

NLPIR '20: Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval
December 2020
217 pages
ISBN:9781450377607
DOI:10.1145/3443279

Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 February 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Age
Attribute inference
IR
NLP
SAGE
TF-IGM
Twitter
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 121
  Total Downloads
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Age Inference on Twitter using SAGE and TF-IGM

NLPIR '20: Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

A sentiment analysis of audiences on twitter: who is the positive or negative audience of popular twitterers?

Information resonance on Twitter: watching Iran

Disinformation Warfare: Understanding State-Sponsored Trolls on Twitter and Their Influence on the Web

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Age Inference on Twitter using SAGE and TF-IGM

NLPIR '20: Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

A sentiment analysis of audiences on twitter: who is the positive or negative audience of popular twitterers?

Information resonance on Twitter: watching Iran

Disinformation Warfare: Understanding State-Sponsored Trolls on Twitter and Their Influence on the Web

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media