Abstract
Author profiling consists in inferring the authors’ gender, age, native language, dialects or personality by examining his/her written text. This important task is a very active research area because of its utility in crime, marketing and business.
In this paper, we address the problem of gender identification by applying the Long Short-Term Memory neural network architecture. Which is a novel type of recurrent network architecture that implements an appropriate gradient-based learning algorithm to overcome the vanishing-gradient problem. Experimental results show that our composition outperformed the traditional machine learning methods on gender identification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Wikipedia, “WikimediaDownloads.”https://dumps.wikimedia.org/arwiki/ 20170401/, 2017. [Online. Accessed 10 Apr 2017]
References
Poulston, A.: Stevenson, M., Bontcheva, K.: Topic models and n–gram language models for author profiling. In: Proceedings of CLEF 2015 Evaluation Labs (2015)
Alvarez-Carmona, M.A., et al.: INAOE’s participation at PAN 2015: Author profiling task. In: Working Notes Papers of the CLEF (2015)
Argamon, S., Koppel, M., Fine, J., Shimoni, A.R.: Gender, genre, and writing style in formal written texts. Text-The Hague Then Amsterdam Then Berlin- 23(3), 321–346 (2003)
Aslam, T., Krsul, I., Spafford, E.H.: Use of a taxonomy of security faults (1996)
Bamman, D., Eisenstein, J., Schnoebelen, T.: Gender identity and lexical variation in social media. J. Socioling. 18(2), 135–160 (2014)
Bassem, B., Zrigui, M.: An empirical method for evaluation of author profiling framework. In: PACLIC 31. Cebu (2017)
González-Gallardo, C.E., Montes, A., Sierra, G., Núñez-Juárez, J.A., Salinas-López, A.J., Ek, J.: Tweets classification using corpus dependent tags, character and POS N-grams. In: Proceedings of CLEF 2015 Evaluation Labs (2015)
Chaski, C.E.: Who wrote it? Steps toward a science of authorship identification. Nat. Inst. Justice J. 233(233), 15–22 (1997)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014). arXiv preprint arXiv:1406.1078
Clauset, A., Moore, C., Newman, M.E.: Hierarchical structure and the prediction of missing links in networks. Nature 453(7191), 98 (2008)
Collobert, R., et al.: Natural language processing (almost) from scratch. Journal of Machine Learning Research, 2493–2537, 12 Aug 2011
Ding, H., Samadzadeh, M.H.: Extraction of Java program fingerprints for software authorship identification. J. Syst. Softw. 72(1), 49–57 (2004)
Estival, D., et al.: Author Profiling for English and Arabic Emails (2008)
Gehring, W.J., et al.: A neural system for error detection and compensation. Psychol. Sci. 4(6), 385–390 (1993)
Gokturk, S.B., et al.: System and method for providing objectified image renderings using recognition information from images. U.S. Patent No. 9,430,719. 30 Aug 2016
Hochreiter, S., Schmidhuber, J.: LSTM can solve hard long time lag problems. In: Advances in neural information processing systems, pp. 473–479 (1997)
Inches, G., Crestani, F.: Overview of the international sexual predator identification competition at PAN-2012. In: CLEF (Online working notes/labs/workshop), vol. 30 (2012)
Joachims, T.: Text categorization with Support Vector Machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026683
Kalchbrenner, N., Blunsom, P.: Recurrent continuous translation models. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (2013)
Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences (2014). arXiv preprint arXiv:1404.2188
Kodiyan, D., et al.: Author profiling with bidirectional RNNs using attention with GRUs: notebook for PAN at CLEF 2017. In: CLEF 2017 Evaluation Labs and Workshop–Working Notes Papers, Dublin, Ireland. 11–14 September 2017
Malmasi, S., et al.: Discriminating between similar languages and arabic dialect identification: a report on the third dsl shared task. In: VarDial 3 (2016)
Martinc, M., Ĺ krjanec, I., Zupan, K., Pollak, S.: Pan 2017: Author profiling gender and language variety prediction. CLEF (Working Notes) 2017. In: CEUR Workshop Proceedings 1866, CEUR-WS.org (2017)
Miura, Y., et al.: Author Profiling with Word + Character Neural Attention Network. CLEF (Working Notes) 2017. In: CEUR Workshop Proceedings 1866, CEUR-WS.org (2017)
Peersman, C., Daelemans, W., Van Vaerenbergh, L.: Predicting age and gender in online social networks. In: Proceedings of the 3rd International Workshop on Search and Mining User-Generated Contents, pp. 37–44. ACM (2011)
Pham, D.D., Tran, G.B., Pham, S.B.: Author profiling for Vietnamese blogs. In: International Conference on Asian Language Processing, 2009. IALP 2009, pp. 190–194. IEEE (2009)
Rangel, F., et al.: Overview of the 5th author profiling task at pan 2017: gender and language variety identification in twitter. In: Working Notes Papers of the CLEF (2017)
Säily, T.: Variation in morphological productivity in the BNC: Sociolinguistic and methodological considerations. Corpus Linguist. Linguist. Theory 7(1), 119–141 (2011)
Sallis, P.J., et al.: Identified: Software authorship analysis with case-based reasoning (1998)
Sap, M., Park, G., Eichstaedt, J., Kern, M., Stillwell, D., Kosinski, M., Schwartz, H.A.: Developing age and gender predictive lexica over social media. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1146–1151 (2014)
Socher, R., et al. Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (2013)
Wang, P., Xu, J., Xu, B., Liu, C., Zhang, H., Wang, F., Hao, H.: Semantic clustering and convolutional neural network for short text categorization. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). vol. 2, pp. 352–357 (2015)
Williams, J.D., Zweig, G.: End-to-end lstm-based dialog control optimized with supervised and reinforcement learning (2016). arXiv preprint arXiv:1606.01269
Yin, W., et al. Comparative study of cnn and rnn for natural language processing (2017). arXiv preprint arXiv:1702.01923
Zhou, C., Sun, C., Liu, Z., Lau, F.: A C-LSTM neural network for text classification (2015). arXiv preprint arXiv:1511.08630
Zhou, P., Qi, Z., Zheng, S., Xu, J., Bao, H., Xu, B.: Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling (2016). arXiv preprint arXiv:1611.06639. 2016
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Bsir, B., Zrigui, M. (2018). Bidirectional LSTM for Author Gender Identification. In: Nguyen, N., Pimenidis, E., Khan, Z., Trawiński, B. (eds) Computational Collective Intelligence. ICCCI 2018. Lecture Notes in Computer Science(), vol 11055. Springer, Cham. https://doi.org/10.1007/978-3-319-98443-8_36
Download citation
DOI: https://doi.org/10.1007/978-3-319-98443-8_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98442-1
Online ISBN: 978-3-319-98443-8
eBook Packages: Computer ScienceComputer Science (R0)