Abstract
The rising usage of social media has motivated to invent different methodologies of anonymous writing, which leads to increase in malicious and suspicious activities. This anonymity has created difficulty in finding the suspect. Author profiling deals with characterization of an author through some key attributes such as gender, age, language, dialect region variety, and personality etc. Identifying the gender of the writer of a suspect document is also very popular task. Different social media platforms such as twitter, facebook, instagram, etc. are used regularly by the users for sharing their daily life activities. In this paper, we have proposed a neural architecture for solving the gender prediction task on a multimodal twitter data. Bidirectional GRU is used for learning the encoded representation for the text part of the tweet, and ResNet-50 is used for extracting the features from images. Different types of attention networks have been applied for fusing the text and image representations, followed by a fully connected layer for predicting the gender of a twitter user. PAN-2018 author profiling data is used for evaluating the performance of our proposed approach. Experimental results illustrate that weighted attention performs the best for the gender prediction task. It is observed that, our model has achieved an accuracy of 84.03% and outperformed the previous state-of-the-art works. A deep analysis of the developed system has also been carried out, which demonstrates the writing patterns of male and female users.
Similar content being viewed by others
References
Alanazi SA (2019) Toward identifying features for automatic gender detection: A corpus creation and analysis. IEEE Access 7 111931–111943
Alowibdi JS, Buy UA, Yu P (2013) Language independent gender classification on twitter. In: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining, pp. 739–743
Argamon S, Koppel M, Fine J, Shimoni AR (2003) Gender, genre, and writing style in formal written texts. Text-The Hague Then Amsterdam Then Berlin- 23(3):321–346
Argamon S, Koppel M, Pennebaker JW, Schler J (2009) Automatically profiling the author of an anonymous text. Commun ACM 52(2):119–123
Basile A, Dwyer G, Medvedeva M, Rawee J, Haagsma H, Nissim M (2017) N-gram:, New groningen author-profiling model. arXiv preprint arXiv:1707.03764
Burger JD, Henderson J, Kim G, Zarrella G (2011) Discriminating gender on twitter. In: Proceedings of the conference on empirical methods in natural language processing, pp. 1301–1309. Association for Computational Linguistics
Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555
Ciccone G, Sultan A, Laporte L, Egyed-Zsigmond E, Alhamzeh A, Granitzer M (2018) Stacked gender prediction from tweet texts and images notebook for pan at clef 2018. In: CLEF 2018-Conference and labs of the evaluation, p. 11p
Daelemans W, Kestemont M, Manjavacas E, Potthast M, Rangel F, Rosso P, Specht G, Stamatatos E, Stein B, Tschuggnall M, et al. (2019) Overview of pan 2019: bots and gender profiling, celebrity profiling, cross-domain authorship attribution and style change detection. In: International conference of the cross-language evaluation forum for european languages, pp. 402–416. Springer
Daneshvar S, Inkpen D (2018) Gender identification in twitter using n-grams and lsa. In: Proceedings of the Ninth International Conference of the CLEF Association (CLEF 2018)
Deitrick W, Miller Z, Valyou B, Dickinson B, Munson T, Hu W (2012) Author gender prediction in an email stream using neural networks
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition, pp. 248–255. Ieee
Farzindar A, Inkpen D (2015) Natural language processing for social media. Synthesis Lectures on Human Language Technologies 8(2):1–166
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778
Johansson F (2019) Supervised classification of twitter accounts based on textual content of tweets. PAN, 2019
Koppel M, Argamon S, Shimoni AR (2002) Automatically categorizing written texts by author gender. Literary and linguistic computing 17(4):401–412
Kucukyilmaz T, Cambazoglu BB, Aykanat C, Can F (2006) Chat mining for gender prediction. In: International conference on advances in information systems, pp. 274–283. Springer
Ljubešić N, Fišer D, Erjavec T (2017) Language-independent gender prediction on twitter. In: Proceedings of the Second Workshop on NLP and Computational Social Science, pp. 1–6
Miller Z, Dickinson B, Hu W (2012) Gender prediction on twitter using stream algorithms with n-gram character features
Patra BG, Das KG (2018) Dd. multimodal author profiling for arabic, english, and spanish. In: Proceedings of the Ninth International Conference of the CLEF Association (CLEF 2018), vol. 2125
Pennebaker JW, Mehl MR, Niederhoffer KG (2003) Psychological aspects of natural language use: Our words, our selves. Annual review of psychology 54(1):547–577
Pennington J, Socher R, Manning C (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543
Rangel F, Rosso P, Montes-y Gómez M, Potthast M, Stein B (2018) Overview of the 6th author profiling task at pan 2018: multimodal gender identification in twitter Working Notes Papers of the CLEF
Russell CA, Miller BH (1977) Profile of a terrorist. Studies in conflict & terrorism 1(1):17–34
Schler J, Koppel M, Argamon S, Pennebaker J (2006) Effects of age and gender on blogging aaai spring symposium on computational approaches for analyzing weblogs
Sezerer E, Polatbilek O, Sevgili Ö, Tekir S (2018) Gender prediction from tweets with convolutional neural networks: Notebook for pan at clef 2018 CEUR Workshop Proceedings
Sezerer E, Polatbilek O, Tekir S (2019) A turkish dataset for gender identification of twitter users. In: Proceedings of the 13th Linguistic Annotation Workshop, Florence, Italy, pp. 203–207
Sierra S, González FA (2018) Combining textual and visual representations for multimodal author profiling. Working Notes Papers of the CLEF 2125:219–228
Student: The probable error of a mean. Biometrika pp. 1–25 (1908)
Suman C, Kumar P, Saha S, Bhattacharyya P (2019) Gender, age and dialect recognition using tweets in a deep learning framework-notebook for fire 2019. In: Working notes of the forum for information retrieval evaluation (FIRE 2019). CEUR workshop proceedings. CEUR-WS. org, kolkata, india, december, pp. 12–15
Suman C, Saha S, Bhattacharyya P, Chaudhari RS (2020) Emoji helps! a multi-modal siamese architecture for tweet user verification. Cognitive Computation, pp 1–16
Takahashi T, Tahara T, Nagatani K, Miura Y, Taniguchi T, Ohkuma T (2018) Text and image synergy with feature cross technique for gender identification Working Notes Papers of the CLEF
Valencia AIV, Adorno HG, Rhodes CS, Pineda G.F (2019) Bots and gender identification based on stylometry of tweet minimal structure and n-grams model
van der Goot R, Ljubešić N, Matroos I, Nissim M, Plank B (2018) Bleaching text:, Abstract features for cross-lingual gender prediction. arXiv preprint arXiv:1805.03122
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp. 5998–6008
Wiegmann M, Stein B, Potthast M (2019) Celebrity profiling. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2611–2618
Yu Z, Cui Y, Yu J, Tao D, Tian Q (2019) Multimodal unified attention networks for vision-and-language interactions. arXiv preprint arXiv:1908.04107
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Suman, C., Chaudhary, R.S., Saha, S. et al. An attention based multi-modal gender identification system for social media users. Multimed Tools Appl 81, 27033–27055 (2022). https://doi.org/10.1007/s11042-021-11256-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11256-6