Abstract
In this article, we address the problem of automatic age estimation of web users based on their posts. Most studies on age identification treat the issue as a classification problem. Instead of following an age category classification approach, we investigate the appropriateness of several regression algorithms on the task of age estimation. We evaluate a number of well-known and widely used machine learning algorithms for numerical estimation, in order to examine their appropriateness on this task. We used a set of 42 text features. The experimental results showed that the Bagging algorithm with RepTree base learner offered the best performance, achieving estimation of web users’ age with mean absolute error equal to 5.44, while the root mean squared error is approximately 7.14.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Labov, W.: Sociolinguistic Patterns (No. 4). University of Pennsylvania Press, Philadelphia (1972)
Trudgill, P.: The social differentiation of English in Norwich, vol. 13. CUP Archive, Cambridge (1974)
Eckert, P.: Age as a sociolinguistic variable. In: Coulmas, F. (ed.) The Handbook of Sociolinguistics. Blackwell, Oxford (1997)
Labov, W.: Principles of linguistic change, cognitive and cultural factors, vol. 3. John Wiley & Sons, New York (2011)
Schler, J., Koppel, M., Argamon, S., Pennebaker, J.W.: Effects of age and gender on blogging. In: AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, vol. 6, pp. 199–205 (2006)
Argamon, S., Koppel, M., Pennebaker, J.W., Schler, J.: Mining the blogosphere: age, gender and the varieties of self-expression. First Monday, 12(9) (2007)
Goswami, S., Sarkar, S., Rustagi, M.: Stylometric analysis of bloggers’ age and gender. In: Third International AAAI Conference on Weblogs and Social Media (2009)
Tam, J., Martell, C.H.: Age detection in chat. In: IEEE International Conference on Semantic Computing, ICSC 2009, pp. 33–39. IEEE (2009)
Peersman, C., Daelemans, W., Van Vaerenbergh, L.: Predicting age and gender in online social networks. In: Proceedings of the 3rd international workshop on Search and Mining User-Generated Contents, pp. 37–44. ACM (2011)
Rosenthal, S., McKeown, K.: Age prediction in blogs: a study of style, content, and online behavior in pre-and post-social media generations. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 763–772. ACL (2011)
Nguyen, D., Smith, N.A., Ros, C.P.: Author age prediction from text using linear regression. In: Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 115–123. ACL (2011)
Rangel, F., Rosso, P., Koppel, M., Stamatatos, E., Inches, G.: Overview of the author profiling task at PAN 2013. Notebook Papers of CLEF (2013)
Flekova, L., Gurevych, I.: Can we hide in the web? Large scale simultaneous age and gender author profiling in social media. In: CLEF 2012 Labs and Work-shop. Notebook Papers (2013)
Rangel, F., Rosso, P.: Use of language and author profiling: identification of gender and age. Natural Language Processing and Cognitive Science, 177 (2013)
Schwartz, H.A., Eichstaedt, J.C., Kern, M.L., Dziurzynski, L., Ramones, S.M., Agrawal, M., Ungar, L.H.: Personality, gender, and age in the language of social media: the open-vocabulary approach. PloS one 8(9), e73791 (2013)
Nguyen, D., Gravel, R., Trieschnigg, D., Meder, T.: “How old do you think i am?”; A study of language and age in twitter. In: Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media. AAAI Press (2013)
Verhoeven, B., Daelemans, W.: CLiPSStylometry Investigation (CSI) corpus: a Dutch corpus for the detection of age, gender, personality, sentiment and deception in text. In: Proceedings of the 9th International Conference on Language Resources and Evaluation (2014)
Chester, D.L.: Why two hidden layers are better than one. In: Proceedings of the International Joint Conference on Neural Networks, vol. 1, pp. 265–268 (1990)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Elsevier, Morgan-Kaufman Series of Data Management Systems, San Francisco (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Simaki, V., Aravantinou, C., Mporas, I., Megalooikonomou, V. (2015). Automatic Estimation of Web Bloggers’ Age Using Regression Models . In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds) Speech and Computer. SPECOM 2015. Lecture Notes in Computer Science(), vol 9319. Springer, Cham. https://doi.org/10.1007/978-3-319-23132-7_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-23132-7_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23131-0
Online ISBN: 978-3-319-23132-7
eBook Packages: Computer ScienceComputer Science (R0)