Abstract
The demographic attributes gender and age play an important role for social media applications. Previous studies on gender and age prediction mostly explore efficient features which are labor intensive. In this paper, we propose to use the multi-task convolutional neural network (MTCNN) model for predicting gender and age simultaneously on Chinese microblog. With MTCNN, we can effectively reduce the burden of feature engineering and explore common and unique representations for both tasks. Experimental results show that our method can significantly outperform the state-of-the-art baselines.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alowibdi, J.S., Buy, U.A., Yu, P.: Language independent gender classification on Twitter. In: 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 739–743. IEEE (2013)
Bamman, D., Eisenstein, J., Schnoebelen, T.: Gender identity and lexical variation in social media. J. Sociolinguist. 18(2), 135–160 (2014)
Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: Proceedings of the Python for scientific computing conference (SciPy), Austin, TX, vol. 4, p. 3 (2010)
Burger, J.D., Henderson, J., Kim, G., Zarrella, G.: Discriminating gender on Twitter. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1301–1309. Association for Computational Linguistics (2011)
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
Ciot, M., Sonderegger, M., Ruths, D.: Gender inference of Twitter users in Non-English contexts. In: EMNLP, pp. 1136–1145 (2013)
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167. ACM (2008)
Culotta, A., Kumar, N.R., Cutler, J.: Predicting the demographics of Twitter users from website traffic data. In: AAAI, pp. 72–78 (2015)
Jaech, A., Ostendorf, M.: What your username says about you (2015). arXiv preprint: arXiv:1507.02045
Kim, Y.: Convolutional neural networks for sentence classification (2014). arXiv preprint: arXiv:1408.5882
Kingma, D., Ba, J.: Adam: a method for stochastic optimization (2014). arXiv preprint: arXiv:1412.6980
Koppel, M., Argamon, S., Shimoni, A.R.: Automatically categorizing written texts by author gender. Lit. Linguist. Comput. 17(4), 401–412 (2002)
Mislove, A., Lehmann, S., Ahn, Y.Y., Onnela, J.P., Rosenquist, J.N.: Understanding the demographics of Twitter users. In: 5th ICWSM 2011 (2011)
Mukherjee, A., Liu, B.: Improving gender classification of blog authors. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 207–217. Association for Computational Linguistics (2010)
Nguyen, D., Smith, N.A., Rosé, C.P.: Author age prediction from text using linear regression. In: Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 115–123. Association for Computational Linguistics (2011)
Nguyen, D.P., Trieschnigg, R., Doğruöz, A., Gravel, R., Theune, M., Meder, T., de Jong, F.: Why gender and age prediction from tweets is hard: lessons from a crowdsourcing experiment. Association for Computational Linguistics (2014)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Pennacchiotti, M., Popescu, A.M.: A machine learning approach to Twitter user classification. ICWSM 11(1), 281–288 (2011)
Pennebaker, J.W., Stone, L.D.: Words of wisdom: language use over the life span. J. Pers. Soc. Psychol. 85(2), 291 (2003)
Rao, D., Yarowsky, D., Shreevats, A., Gupta, M.: Classifying latent user attributes in Twitter. In: Proceedings of the 2nd International Workshop on Search and Mining User-Generated Contents, pp. 37–44. ACM (2010)
Sarawgi, R., Gajulapalli, K., Choi, Y.: Gender attribution: tracing stylometric evidence beyond topic and genre. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning, pp. 78–86. Association for Computational Linguistics (2011)
Zhang, X., LeCun, Y.: Text understanding from scratch (2015). arXiv preprint: arXiv:1502.01710
Acknowledgements
We thank all the anonymous reviewers for their insightful comments on this paper. This work was partially supported by National Natural Science Foundation of China (61273278 and 61572049).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Wang, L., Li, Q., Chen, X., Li, S. (2016). Multi-task Learning for Gender and Age Prediction on Chinese Microblog. In: Lin, CY., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds) Natural Language Understanding and Intelligent Applications. ICCPOL NLPCC 2016 2016. Lecture Notes in Computer Science(), vol 10102. Springer, Cham. https://doi.org/10.1007/978-3-319-50496-4_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-50496-4_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50495-7
Online ISBN: 978-3-319-50496-4
eBook Packages: Computer ScienceComputer Science (R0)