ABSTRACT
Social media is increasingly influential in day-to-day life. People are more than ever sharing, posting, liking, and following different activities on disparate social media. Deriving specific attributes of users based on their online behavior is a growing research field. In this study, a novel methodology is proposed for determining the age of Twitter users. We classify three separate age groups, namely, 18--24, 25--54, 55 >. We compute numerous linguistic features from the tweets of users, obtain significant terms extracted by the SAGE algorithms, and retrieve relevant meta-data of users by extracting information on their followed interests on Twitter using TF-IGM. The final logistic regression model obtains a macro F1-score of 78%. This way, effectively combining NLP and IR techniques for attribute inference on social media.
- Aletras, N., & Chamberlain, B. P. (2018). Predicting twitter user socioeconomic attributes with network and language information. In Proceedings of the 29th on Hypertext and Social Media (pp. 20--24).Google ScholarDigital Library
- Bamman, D., & Smith, N. A. (2015, April). Contextualized sarcasm detection on twitter. In Ninth international AAAI conference on web and social media.Google Scholar
- Burger, J. D., Henderson, J., Kim, G., & Zarrella, G. (2011, July). Discriminating gender on Twitter. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (pp. 1301--1309).Google Scholar
- Chamberlain, B. P., Humby, C., & Deisenroth, M. P. (2017, September). Probabilistic inference of twitter users' age based on what they follow. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 191--203). Springer, Cham.Google ScholarCross Ref
- Chen, K., Zhang, Z., Long, J., & Zhang, H. (2016). Turning from TF-IDF to TF-IGM for term weighting in text classification. Expert Systems with Applications, 66, 245--260.Google ScholarDigital Library
- Coppersmith, G., Dredze, M., & Harman, C. (2014, June). Quantifying mental health signals in Twitter. In Proceedings of the workshop on computational linguistics and clinical psychology: From linguistic signal to clinical reality (pp. 51--60).Google ScholarCross Ref
- Culotta, A., Kumar, N. R., & Cutler, J. (2015, January). Predicting the Demographics of Twitter Users from Website Traffic Data. In AAAI (Vol. 15, pp. 72--8).Google Scholar
- Debole, F., & Sebastiani, F. (2004). Supervised term weighting for automated text categorization. In Text mining and its applications (pp. 81--97). Springer, Berlin, Heidelberg.Google ScholarCross Ref
- Duan, Y., Chen, Z., Wei, F., Zhou, M., & Shum, H. Y. (2012, December). Twitter topic summarization by ranking tweets using social influence and content quality. In Proceedings of COLING 2012 (pp. 763--780).Google Scholar
- Eisenstein, J., Ahmed, A., & Xing, E. P. (2011). Sparse additive generative models of text.Google Scholar
- Fang, A., Macdonald, C., Ounis, I., & Habel, P. (2016, March). Topics in tweets: A user study of topic coherence metrics for Twitter data. In European Conference on Information Retrieval (pp. 492--504). Springer, Cham.Google Scholar
- Fortin, D., Uncles, M., Burton, S., & Soboleva, A. (2011). Interactive or reactive? Marketing with Twitter. Journal of Consumer Marketing.Google Scholar
- Golbeck, J., Robles, C., Edmondson, M., & Turner, K. (2011, October). Predicting personality from twitter. In 2011 IEEE third international conference on privacy, security, risk and trust and 2011 IEEE third international conference on social computing (pp. 149--156). IEEE.Google Scholar
- Jebara, T. (2012). Machine learning: discriminative and generative (Vol. 755). Springer Science & Business Media.Google Scholar
- Kateb, F., & Kalita, J. (2015). Classifying short text in social media: Twitter as case study. International Journal of Computer Applications, 111(9).Google ScholarCross Ref
- Kumar, S., Morstatter, F., & Liu, H. (2014). Twitter data analytics (pp. 1041--4347). New York, NY: Springer New York.Google Scholar
- Luo, J., Du, J., Tao, C., Xu, H., & Zhang, Y. (2018, June). Exploring Temporal Patterns of Suicidal Behavior on Twitter. In 2018 IEEE International Conference on Healthcare Informatics Workshop (ICHI-W) (pp. 55--56). IEEE.Google Scholar
- Morgan-Lopez, A. A., Kim, A. E., Chew, R. F., & Ruddle, P. (2017). Predicting age groups of Twitter users based on language and metadata features. PloS one, 12(8), e0183537.Google ScholarCross Ref
- Nguyen, D., Gravel, R., Trieschnigg, D., & Meder, T. (2013). "How Old Do You Think I Am?"; A Study of Language and Age in Twitter. In Proceedings of the seventh international AAAI conference on weblogs and social media. AAAI Press.Google Scholar
- Nguyen, D., Trieschnigg, D., Doğruöz, A. S., Gravel, R., Theune, M., Meder, T., & De Jong, F. (2014, August). Why gender and age prediction from tweets is hard: Lessons from a crowdsourcing experiment. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers (pp. 1950--1961).Google Scholar
- Nguyen, T., Phung, D., Adams, B., & Venkatesh, S. (2011, October). Prediction of age, sentiment, and connectivity from social media text. In International Conference on Web Information Systems Engineering (pp. 227--240). Springer, Berlin, Heidelberg.Google Scholar
- Perl, J., Wagner, C., Kunegis, J., & Staab, S. (2015, June). Twitter as a Political Network: Predicting the Following and Unfollowing Behavior of German Politicians. In Proceedings of the ACM Web Science Conference (pp. 1--2).Google ScholarDigital Library
- Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., ... & Ungar, L. H. (2013). Personality, gender, and age in the language of social media: The open-vocabulary approach. PloS one, 8(9), e73791.Google ScholarCross Ref
- Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., & Demirbas, M. (2010, July). Short text classification in twitter to improve information filtering. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval (pp. 841--842).Google ScholarDigital Library
- Tsugawa, S., Kikuchi, Y., Kishino, F., Nakajima, K., Itoh, Y., & Ohsaki, H. (2015, April). Recognizing depression from twitter activity. In Proceedings of the 33rd annual ACM conference on human factors in computing systems (pp. 3187--3196).Google Scholar
- Uysal, I., & Croft, W. B. (2011, October). User oriented tweet ranking: a filtering approach to microblogs. In Proceedings of the 20th ACM international conference on Information and knowledge management (pp. 2261--2264).Google Scholar
- Wang, W., Chen, L., Thirunarayan, K., & Sheth, A. P. (2014, February). Cursing in english on twitter. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing (pp. 415--425).Google Scholar
- Wang, Y., & Youn, H. Y. (2019). Feature Weighting Based on Inter-Category and Intra-Category Strength for Twitter Sentiment Analysis. Applied Sciences, 9(1), 92.Google ScholarCross Ref
- Yang, W., Fu, Y., & Zhang, D. (2016, July). An Improved Parallel Algorithm for Text Categorization. In 2016 International Symposium on Computer, Consumer and Control (IS3C) (pp. 451--454). IEEE.Google Scholar
Index Terms
- Age Inference on Twitter using SAGE and TF-IGM
Recommendations
A sentiment analysis of audiences on twitter: who is the positive or negative audience of popular twitterers?
ICHIT'11: Proceedings of the 5th international conference on Convergence and hybrid information technologyMicroblogging is a new informal communication medium of blogging that differs from a traditional blog in which content is much shorter. Microbloggers post about topics that describe their current status. Twitter is a popular microblogging service and ...
Information resonance on Twitter: watching Iran
SOMA '10: Proceedings of the First Workshop on Social Media AnalyticsTwitter has undoubtedly caught the attention of both the general public, and academia as a microblogging service worthy of study and attention. Twitter has several features that sets it apart from other social media/networking sites, including its 140 ...
Disinformation Warfare: Understanding State-Sponsored Trolls on Twitter and Their Influence on the Web
WWW '19: Companion Proceedings of The 2019 World Wide Web ConferenceOver the past couple of years, anecdotal evidence has emerged linking coordinated campaigns by state-sponsored actors with efforts to manipulate public opinion on the Web, often around major political events, through dedicated accounts, or “trolls.” ...
Comments