Abstract
The growth of social networking platforms such as Facebook and Twitter has bridged communication channels between people to share their thoughts and sentiments. However, along with the rapid growth and rise of the Internet, the idea of anonymity has also been introduced wherein user identities are easily falsified and hidden. Hence, presenting difficulty for businesses to give accurate advertisements to specific account demographics. As such, this study searched for the best model to identify gender and age group of Filipino social media accounts through analyzing post contents. Two model structures for the classifier namely, the stacked/combined structure and the parallel structure were experimented on. Different types of features including those based on socio-linguistics, grammar, characters and words were considered. The results show that different model structures, features, feature reduction and classification algorithms apply to age classification and gender classification. For Facebook and Twitter, the best model for classifying age was Support Vector Classifier (SVC) with least absolute shrinkage and selection operator (Lasso) on a parallel model structure for Facebook, while a combined model structure is best for Twitter. For gender classification, the best model for Facebook used Ridge Classifier (RC), while the best model for Twitter used SVC, both utilizing Lasso on a parallel model structure. The features that were dominant in age classification for both Facebook and Twitter were word-based, socio-linguistic features and post time, while socio-linguistic features, specifically netspeak, were important in gender classification for both platforms. Based on the differences of the features affecting the performance of the models, Facebook and Twitter data must be analyzed separately as the posts found in these two platforms differ significantly.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
AMEX iSUPPORT: 10 eye-opening facts about social media in PH (2016). http://isupportworldwide.com/blog/archive/socialmediaphilippines/. Accessed 14 Feb 2017
Burger, J.D., Henderson, J.C.: An exploration of observable features related to blogger age. In: Spring Symposium: Computational Approaches to Analyzing Weblogs, pp. 15–20. AAAI (2006)
Chaffey, D.: Global social media research summary 2016 (2016). https://www.smartinsights.com/social-media-marketing/social-media-strategy/new-global-social-media-research/. Accessed 04 Sept 2017
Cheng, N., Chandramouli, R., Subbalakshmi, K.P.: Author gender identifcation from text. Digit. Investig. 8(1), 78–88 (2011)
Choi, J.Y., Lim, G.G., Woo, M.N.: A study on the anonymity perceptions impacting on posting malicious messages in online communities. In: Proceedings of PACIS 2016 (2016)
Corney, M., Anderson, A., de Vel, O., Mohay, G.: Gender-preferential text mining of e-mail discourse. In: 18th Annual Computer Security Applications Conference on Proceedings, Las Vegas, pp. 282–289 (2002)
Hernandez, D., Guzman-Cabrera, R., Reyes, A., Rocha, M.: Semantic-based features for author profiling identification. In: Working Notes for CLEF 2013 Conference, Valencia (2013)
Huffaker, D.A., Calvert, S.L.: Gender, identity, and language use in teenage blogs. J. Comput. Mediat. Commun. 10(2), 00–00 (2005)
Marquardt, J., Farnadi, G., Vasudevan, G., Moens, M.F., Davalos, S., Teredesai, A., De Cock, M.: Age and gender identification in social media. In: Proceedings of CLEF 2014 Evaluation Labs, pp. 1129–1136 (2014)
Mechti, S., Jaoua, M., Belguith, L.H., Faiz, R.: Author profiling using style-based features. In: Notebook for PAN at CLEF 2013 (2013)
Mukherjee, A., Liu, B.: Improving gender classification of blog authors. In: Proceedings of the 2010 conference on Empirical Methods in natural Language Processing, pp. 207–217. Association for Computational Linguistics (2010)
Newman, M.L., Groom, C.J., Handelman, L.D., Pennebaker, J.W.: Gender differences in language use: an analysis of 14,000 text samples. Discourse Process. 45(3), 211–236 (2008)
Nguyen, D., Smith, N.A., Rose, C.P.: Author age prediction from text using linear regression. In: Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 115–123. Association for Computational Linguistics (2011)
Patra, B.G., Banerjee, S., Das, D., Saikh, T., Bandyopadhyay, S.: Automatic author profiling based on linguistic and stylistic features. In: Notebook for PAN at CLEF (2013)
Pennebaker, J., Booth, R., Boyd, R., Francis, M.: Linguistic Inquiry and Word Count: LIWC2015. Pennebaker Conglomerates, Austin (2015)
Rangel, F., Rosso, P.: Use of language and author profiling: identification of gender and age. In: Natural Language Processing and Cognitive Science, vol. 177 (2013)
Rao, D., Yarowsky, D., Shreevats, A., Gupta, M.: Classifying latent user attributes in Twitter. In: Proceedings of the 2nd International Workshop on Search and Mining User-Generated Contents, pp. 37–44. ACM (2010)
Sap, M., Park, G., Eichstaedt, J.C., Kern, M.L., Stillwell, D., Kosinski, M., Ungar, L.H., Schwartz, H.A.: Developing age and gender predictive lexica over social media (2014)
Schwartz, H.A., Eichstaedt, J.C., Kern, M.L., Dziurzynski, L., Ramones, S.M., Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., Seligman, M.E., Ungar, L.H.: Personality, gender, and age in the language of social media: the open-vocabulary approach. PLoS ONE 8(9), e73791 (2013)
Stenzler, M.A.: How marketers are using social media to grow their businesses (2016). http://www.socialmediaexaminer.com/wp-content/uploads/2016/05/SocialMediaMarketingIndustryReport2016.pdf. Accessed 24 Aug 2017
Stillwell, D., Kosinski, M.: (2016) Mypersonality project, http://mypersonality.org/wiki/doku.php. Accessed 21 Feb 2017
Understanding Analytics: Understanding the importance of demographics in marketing (2015). http://upfrontanalytics.com/understanding-the-importance-of-demographics-in-marketing/. Accessed 02 Sept 2017
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Cheng, J.K., Fernandez, A., Quindoza, R.G.M., Tan, S., Cheng, C. (2018). A Model for Age and Gender Profiling of Social Media Accounts Based on Post Contents. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11302. Springer, Cham. https://doi.org/10.1007/978-3-030-04179-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-04179-3_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04178-6
Online ISBN: 978-3-030-04179-3
eBook Packages: Computer ScienceComputer Science (R0)