Skip to main content

An Inter-modal Attention Framework for Multimodal Offense Detection

  • Conference paper
  • First Online:
Intelligent Computing & Optimization (ICO 2022)

Abstract

In recent years, posting offensive content on social media platforms has increased, and memes play a vital role in continuing this trend. Memes usually come in multimodal forms and can often promulgate offensive views by implicitly expressing a sense of humour. A meme’s visual and textual information is often disparate and can easily evade surveillance systems. Therefore, an automatic offensive meme detection system is required to combat malign content and maintain social media harmony. Past studies explored various multimodal fusion techniques to combine the visual and textual information of the memes. However, they fail to learn the multimodal content’s significant relationships and perform poorly in offensive meme classification. This paper proposes an inter-modal attention framework that synergistically fuses the visual and textual information by focusing on the modality-specific features. The state-of-the-art pre-trained VGG19 and BERT models are employed to extract the visual and textual features, respectively. The proposed method is compared against several baseline models on a publicly available multimodal offensive dataset. The results demonstrated that the proposed approach outperformed all the models by achieving the highest weighted f1-score of 0.635.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  2. Dhingra, B., Liu, H., Yang, Z., Cohen, W.W., Salakhutdinov, R.: Gated-attention readers for text comprehension. arXiv preprint arXiv:1606.01549 (2016)

  3. Gadzicki, K., Khamsehashari, R., Zetzsche, C.: Early vs late fusion in multimodal convolutional neural networks. In: 2020 IEEE 23rd International Conference on Information Fusion (FUSION), pp. 1–6. IEEE (2020)

    Google Scholar 

  4. Hossain, E., Sharif, O., Hoque, M.M.: Nlp-cuet@ dravidianlangtech-eacl2021: investigating visual and textual features to identify trolls from multimodal social media memes. In: Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, pp. 300–306 (2021)

    Google Scholar 

  5. Hs, C., et al.: Trollmeta@ dravidianlangtech-eacl2021: meme classification using deep learning. In: Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, pp. 277–280 (2021)

    Google Scholar 

  6. Huang, B., Bai, Y.: Hub@ dravidianlangtech-eacl2021: Meme classification for tamil text-image fusion. In: Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, pp. 210–215 (2021)

    Google Scholar 

  7. Khedkar, S., Karsi, P., Ahuja, D., Bahrani, A.: Hateful memes, offensive or non-offensive! In: Khanna, A., Gupta, D., Bhattacharyya, S., Hassanien, A.E., Anand, S., Jaiswal, A. (eds.) International Conference on Innovative Computing and Communications. AISC, vol. 1388, pp. 609–621. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-2597-8_52

    Chapter  Google Scholar 

  8. Kumari, K., Singh, J.P.: Identification of cyberbullying on multi-modal social media posts using genetic algorithm. Trans. Emerg. Telecommun. Technol. 32(2), e3907 (2021)

    Google Scholar 

  9. Mishra, A.K., Saumya, S.: Iiit_dwd@ eacl2021: Identifying troll meme in tamil using a hybrid deep learning approach. In: Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, pp. 243–248 (2021)

    Google Scholar 

  10. Pavlopoulos, J., Thain, N., Dixon, L., Androutsopoulos, I.: Convai at semeval-2019 task 6: Offensive language identification and categorization with perspective and bert. In: Proceedings of the 13th international Workshop on Semantic Evaluation, pp. 571–576 (2019)

    Google Scholar 

  11. Perifanos, K., Goutsos, D.: Multimodal hate speech detection in greek social media. Multimodal Technol. Interact. 5(7), 34 (2021)

    Article  Google Scholar 

  12. Que, Q.: Simon@ dravidianlangtech-eacl2021: meme classification for tamil with bert. In: Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, pp. 287–290 (2021)

    Google Scholar 

  13. Sharif, O., Hoque, M.M.: Identification and classification of textual aggression in social media: resource creation and evaluation. In: International Workshop on Combating On line Hostile Posts in Regional Languages during Emergency Situation, pp. 9–20. Springer (2021). https://doi.org/10.1007/978-3-030-73696-5_2

  14. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  15. Singh, B., Upadhyay, N., Verma, S., Bhandari, S.: Classification of hateful memes using multimodal models. In: Data Intelligence and Cognitive Informatics, pp. 181–192. Springer (2022). https://doi.org/10.1007/978-981-16-6460-1_13

  16. Suryawanshi, S., Chakravarthi, B.R., Arcan, M., Buitelaar, P.: Multimodal meme dataset (MultiOFF) for identifying offensive content in image and text. In: Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, pp. 32–41. European Language Resources Association (ELRA), Marseille, France (2020). https://aclanthology.org/2020.trac-1.6

  17. Suryawanshi, S., Chakravarthi, B.R., Verma, P., Arcan, M., McCrae, J.P., Buitelaar, P.: A dataset for troll classification of tamilmemes. In: Proceedings of the WILDRE5-5th Workshop on Indian Language Data: Resources and Evaluation, pp. 7–13 (2020)

    Google Scholar 

  18. Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on deep transfer learning. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds.) ICANN 2018. LNCS, vol. 11141, pp. 270–279. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01424-7_27

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammed Moshiul Hoque .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hossain, E., Hoque, M.M., Hossain, M.A. (2023). An Inter-modal Attention Framework for Multimodal Offense Detection. In: Vasant, P., Weber, GW., Marmolejo-Saucedo, J.A., Munapo, E., Thomas, J.J. (eds) Intelligent Computing & Optimization. ICO 2022. Lecture Notes in Networks and Systems, vol 569. Springer, Cham. https://doi.org/10.1007/978-3-031-19958-5_81

Download citation

Publish with us

Policies and ethics