CMHE-AN: Code mixed hybrid embedding based attention network for aggression identification in hindi english code-mixed text

Mundra, Shikha; Mittal, Namita

doi:10.1007/s11042-022-13668-4

CMHE-AN: Code mixed hybrid embedding based attention network for aggression identification in hindi english code-mixed text

1226: Deep-Patterns Emotion Recognition in the Wild
Published: 09 September 2022

Volume 82, pages 11337–11364, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

484 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

The widespread growth in social media platforms provides a plethora of opportunities to enhance interaction and bring awareness about recent activities happening across the countries. Many people use social media to share their thoughts and opinions on societal and political issues. Nonetheless, some individuals misuse these platforms by posting toxic, hostile, and insulting comments. Hence, detecting and controlling such content at its earliest stage is crucial since its spread can harm social relations and negatively impact a person’s life. In current scenarios, social media text consisting non-English languages is increasing due to active participation from multilingual societies. Of several non-English languages, Hindi English code-mixed is more prevalent in India. Most of the previous work to detect cyber aggression concentrates on English texts; therefore, there is high scope left to work on other languages such as Hindi English code-mixed. This paper has proposed a code-mixed hybrid embedding (CMHE) at the character and word level to capture similarly spelled and contextually related words. Furthermore, proposed embedding contributes significantly to the reduction of out of vocabulary words and capture words having similar polarity. After this, a deep learning framework based on CMHE, and a self-attention mechanism is proposed to retrieve significant features for classification. To evaluate proposed model, experiments were performed with two publicly available datasets: TRAC 2-2020 Hindi English code-mixed dataset (77.54% accuracy, 77.09% weighted average f1 score) and hate speech dataset (75.23% accuracy, 73.34% weighted average f1 score). The attained experimental results validate the effectiveness of proposed approach against the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FA-Net: fused attention-based network for Hindi English code-mixed offensive text classification

Article 03 August 2022

Comparative Analysis of Social Media Hate Detection over Code Mixed Hindi-English Language

RETRACTED ARTICLE: Multilingual hate speech detection sentimental analysis on social media platforms using optimal feature extraction and hybrid diagonal gated recurrent neural network

Article 30 May 2023

Data Availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Notes

References

Athavale V, Bharadwaj S, Pamecha M, Prabhu A, Shrivastava M (2016) Towards deep learning in hindi ner: An approach to tackle the labelled data sparsity. In: Proceedings of the 13th international conference on natural language processing, pp 154–160. https://doi.org/10.48550/arXiv.1610.09756
Badjatiya P, Gupta S, Gupta M, Varma V (2017) Deep learning for hate speech detection in tweets. 26th International World Wide Web Conference 2017, WWW 2017 Companion, pp 759–760. https://doi.org/10.1145/3041021.3054223
Bahdanau D, Cho KH, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3Rd international conference on learning representations, ICLR 2015
Bakliwal A, Arora P, Varma V (2012) Hindi subjective lexicon: A lexical resource for Hindi adjective polarity classification. In: Proceedings of the Eighth International conference on language resources and evaluation (LREC’12), pp 1189–1196 European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2012/pdf/673_Paper.pdf. Accessed 27 July 2021
Bhat IA, Mujadia V, Tammewar A, Bhat RA, Shrivastava M (2015) Iiit-h system submission for fire2014 shared task on transliterated search. https://doi.org/10.1145/2824864.2824872
Bhattacharya S, Singh S, Kumar R et al (2020) Developing a multilingual annotated corpus of misogyny and aggression. In: Proceedings of the second workshop on trolling, aggression and cyberbullying, pp 158–168. European Language Resources Association (ELRA). https://aclanthology.org/2020.trac-1.25. Accessed 27 July 2021
Bohra A, Vijay D, Singh V, Akhtar SS, Shrivastava M (2018) A dataset of Hindi-English code-mixed social media text for hate speech detection, pp 36–41. https://doi.org/10.18653/v1/W18-1105
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguistics 5:135–146. https://doi.org/10.1162/tacl∖_a∖_00051
Article Google Scholar
Chetty N, Alathur S (2018) Hate speech review in the context of online social networks. Aggress Violent Behav 40:108–118. https://doi.org/10.1016/j.avb.2018.05.003
Article Google Scholar
Das A, Bandyopadhyay S (2010) Sentiwordnet for indian languages. In: Proceedings of the Eighth Workshop on Asian Language Resouces, pp 56–63
Datta A, Si S, Chakraborty U, Naskar SK (2020) Spyder: Aggression detection on multilingual tweets. In: Proceedings of the second workshop on trolling, aggression and Cyberbullying, Language resources and evaluation conference LREC 2020, pp 87–92
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. In: NAACL HLT 2019 - 2019 Conference of the North american chapter of the association for computational linguistics:, Human language technologies - Proceedings of the conference, vol 1, pp 4171–4186
Guo Q, Qiu X, Liu P, Xue X, Zhang Z (2020) Multi-scale self-attention for text classification. In: Proceedings of the AAAI Conference on artificial intelligence. https://doi.org/10.1609/AAAI.V34I05.6290, vol 34, pp 7847–7854
Hassan S, Kirmani MM, Sheetlani J, Hassan M (2021) Materials today: Proceedings word embedding generation for urdu language using word2vec model. Materials Today: Proceedings. https://doi.org/10.1016/j.matpr.2020.11.766
Huang F, Li X, Yuan C, Zhang S, Zhang J, Qiao S (2021) Attention-emotion-enhanced convolutional lstm for sentiment analysis. IEEE Transactions on Neural Networks and Learning Systems, pp 1–14. https://doi.org/10.1109/TNNLS.2021.3056664
Joshi A, Prabhu A, Shrivastava M, Varma V (2016) Towards sub-word level compositions for sentiment analysis of hindi-English code mixed text. In: Proceedings of COLING 2016, the 26th International conference on computational linguistics: Technical papers, pp 2482–2491. The COLING 2016 Organizing Committee. https://aclanthology.org/C16-1234. Accessed 27 July 2021
Kamble S, Joshi A (2018) Hate speech detection from code-mixed hindi-english tweets using deep learning models. In: 15th International conference on natural language processing (ICON-2018). https://doi.org/10.48550/arXiv.1811.05145
Kim Y (2014) Convolutional neural networks for sentence classification. EMNLP 2014 - 2014 Conference on empirical methods in natural language processing, proceedings of the conference, pp 1746–1751. https://doi.org/10.3115/v1/d14-1181
Kim H, Jeong YS (2019) Sentiment classification using convolutional neural networks. Appl Sci (Switzerland) 9:1–14. https://doi.org/10.3390/app9112347
Google Scholar
Koufakou A, Basile V, Patti V (2020) FlorUniTo@TRAC-2: Retrofitting word embeddings on an abusive lexicon for aggressive language detection. In: Proceedings of the second workshop on trolling, aggression and Cyberbullying, pp 106–112. European Language Resources Association (ELRA). https://aclanthology.org/2020.trac-1.17. Accessed 27 July 2021
Kumar A, Sachdeva N (2020) Multi-input integrative learning using deep neural networks and transfer learning for cyberbullying detection in real-time code-mix data. Multimed Syst 2020:1–15. https://doi.org/10.1007/S00530-020-00672-7
Google Scholar
Kumari K, Singh JP, Dwivedi YK, Rana NP (2021) Bilingual cyber-aggression detection on social media using lstm autoencoder. Soft Comput 25(14):8999–9012. https://doi.org/10.1007/S00500-021-05817-Y
Article Google Scholar
Khanuja S, Bansal D, Mehtani S, Khosla S, Dey A, Gopalan B, Margam DK, Aggarwal P, Nagipogu RT, Dave S, Gupta S, Chandra S, Gali B, Subramanian V, Talukdar P (2021) MuRIL: Multilingual representations for indian languages. https://doi.org/10.48550/arXiv.2103.10730
Li W, Qi F, Tang M, Yu Z (2020) Bidirectional lstm with self-attention mechanism and multi-channel features for sentiment classification. Neurocomputing 387:63–77. https://doi.org/10.1016/J.NEUCOM.2020.010.006 https://doi.org/10.1016/J.NEUCOM.2020.010.006
Article Google Scholar
Liu G, Guo J (2019) Bidirectional lstm with attention mechanism and convolutional layer for text classification. Neurocomputing 337:325–338. https://doi.org/10.1016/J.NEUCOM.2019.01.078
Article Google Scholar
Ma Q, Yu L, Tian S, Chen E, Ng WWY (2019) Global-local mutual attention model for text classification. IEEE/ACM Trans Audio Speech Lang Process 27:2127–2139. https://doi.org/10.1109/TASLP.2019.2942160 https://doi.org/10.1109/TASLP.2019.2942160
Article Google Scholar
Mandal S, Nanmaran K (2019) Normalization of transliterated words in code-mixed data using seq2seq model & levenshtein distance, pp 49–53. https://doi.org/10.18653/v1/w18-6107
Mathur P, Shah R, Sawhney R, Mahata D (2018) Detecting offensive tweets in Hindi-English code-switched language. In: Proceedings of the Sixth international workshop on natural language processing for social media, pp 18–26. Association for computational linguistics. https://doi.org/10.18653/v1/W18-3504. https://aclanthology.org/W18-3504
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst 26:3111–3119
Google Scholar
Modha S, Majumder P, Mandl T, Mandalia C (2020) Detecting and visualizing hate speech in social media: A cyber watchdog for surveillance. Exp Syst Appl 161:113725. https://doi.org/10.1016/j.eswa.2020.113725
Article Google Scholar
One Speaker (1995) Two Languages: Cross-Disciplinary Perspectives on Code-Switching. Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9780511620867
Google Scholar
Paul S, Saha S, Singh JP (2022) Covid-19 and cyberbullying: deep ensemble model to identify cyberbullying from code-switched languages during the pandemic. Multimedia Tools and Applications, pp 1–17. https://doi.org/10.1007/S11042-021-11601-9/TABLES/8
Pires T, Schlinger E, Garrette D (2019) How multilingual is multilingual BERT?. In: Proceedings of the 57th Annual meeting of the association for computational linguistics, pp 4996–5001. Association for computational linguistics. https://doi.org/10.18653/v1/P19-1493. https://aclanthology.org/P19-1493
Samghabadi NS, Mave D, Kar S, Solorio T (2018) Ritual-uh at TRAC 2018 shared task: Aggression identification. In: Shared Task 2018, vol abs/1807.11712. https://doi.org/10.48550/arXiv.1807.11712
Santosh TYSS, Aravind KVS (2019) Hate speech detection in hindi-english code-mixed social media text. ACM International Conference Proceeding Series, pp 310–313. https://doi.org/10.1145/3297001.3297048 https://doi.org/10.1145/3297001.3297048
Sharma A, Kabra A, Jain M (2022) Ceasing hate with moh: Hate speech detection in hindi–english code-switched language. Inform Process Manag 59:102760. https://doi.org/10.1016/j.ipm.2021.102760
Article Google Scholar
Sharma S, Srinivas PYKL, Balabantaray RC (2015) Text normalization of code mix and sentiment analysis. 2015 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2015, pp 1468–1473. https://doi.org/10.1109/ICACCI.2015.7275819
Singh V, Varshney A, Akhtar SS, Vijay D, Shrivastava M (2018) Aggression detection on social media text using deep neural networks. EMNLP 2018, p 43. https://doi.org/10.18653/v1/w18-5106
Tompson J, Goroshin R, Jain A, LeCun Y, Bregler C (2015) Efficient object localization using convolutional networks. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 648–656
Waseem Z (2016) Are you a racist or am i seeing things? annotator influence on hate speech detection on twitter. In: Proceedings of 2016 EMNLP Workshop on natural language processing and computational social science, pp 138–142
Yilmaz S, Toklu S (2020) A deep learning analysis on question classification task using word2vec representations. eural Comput Appl 32(7):32, 2909–2928. https://doi.org/10.1007/S00521-020-04725-W
Google Scholar
Zhao R, Zhou A, Mao K (2016) Automatic detection of cyberbullying on social networks based on bullying features. ACM International Conference Proceeding Series 04-07-January, pp 1–6. https://doi.org/10.1145/2833312.2849567

Download references

Author information

Authors and Affiliations

Malaviya National Institute of Technology, Jaipur, India
Shikha Mundra & Namita Mittal
Manipal University Jaipur, Jaipur, India
Shikha Mundra

Authors

Shikha Mundra
View author publications
You can also search for this author in PubMed Google Scholar
Namita Mittal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shikha Mundra.

Ethics declarations

Competing Interests

The authors declare that they have no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mundra, S., Mittal, N. CMHE-AN: Code mixed hybrid embedding based attention network for aggression identification in hindi english code-mixed text. Multimed Tools Appl 82, 11337–11364 (2023). https://doi.org/10.1007/s11042-022-13668-4

Download citation

Received: 25 October 2021
Revised: 08 July 2022
Accepted: 11 August 2022
Published: 09 September 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s11042-022-13668-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CMHE-AN: Code mixed hybrid embedding based attention network for aggression identification in hindi english code-mixed text

Abstract

Access this article

Similar content being viewed by others

FA-Net: fused attention-based network for Hindi English code-mixed offensive text classification

Comparative Analysis of Social Media Hate Detection over Code Mixed Hindi-English Language

RETRACTED ARTICLE: Multilingual hate speech detection sentimental analysis on social media platforms using optimal feature extraction and hybrid diagonal gated recurrent neural network

Data Availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

CMHE-AN: Code mixed hybrid embedding based attention network for aggression identification in hindi english code-mixed text

Abstract

Access this article

Similar content being viewed by others

FA-Net: fused attention-based network for Hindi English code-mixed offensive text classification

Comparative Analysis of Social Media Hate Detection over Code Mixed Hindi-English Language

RETRACTED ARTICLE: Multilingual hate speech detection sentimental analysis on social media platforms using optimal feature extraction and hybrid diagonal gated recurrent neural network

Data Availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation