Skip to main content

Combining Knowledge and Multi-modal Fusion for Meme Classification

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13141))

Included in the following conference series:

Abstract

Internet memes are widespread on social media platforms such as Twitter and Facebook. Recently, meme classification has been an active research topic, especially meme sentiment classification and meme offensive classification. Internet memes contain multi-modal information, and the meme text is embedded in the meme image. The existing methods classify memes by simply concatenating global visual and textual features to generate a multi-modal representation. However, these approaches ignored the noise introduced by global visual features and the potential common information of meme multi-modal representation. In this paper, we propose a model for meme classification named MeBERT. Our method enhances the semantic representation of the meme by introducing conceptual information through external Knowledge Bases (KBs). Then, to reduce noise, a concept-image attention module is designed to extract concept-sensitive visual representation. In addition, a deep convolution tensor fusion module is built to effectively integrate multi-modal information. To verify the effectiveness of the model in the tasks of meme sentiment classification and meme offensive classification, we designed experiments on the Memotion and MultiOFF datasets. The experimental results show that the MeBERT model achieves better performance than state-of-the-art techniques for meme classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)

  2. Baltrušaitis, T., Ahuja, C., Morency, L.P.: Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41(2), 423–443 (2018). https://doi.org/10.1109/TPAMI.2018.2798607

    Article  Google Scholar 

  3. Bonheme, L., Grzes, M.: SESAM at SemEval-2020 task 8: investigating the relationship between image and text in sentiment analysis of memes. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 804–816 (2020)

    Google Scholar 

  4. Cao, R., Fan, Z., Lee, R.K., Chong, W., Jiang, J.: Disentangling hate in online memes. arXiv preprint arXiv:2108.06207 (2021)

  5. Chen, J., Hu, Y., Liu, J., Xiao, Y., Jiang, H.: Deep short text classification with knowledge powered attention. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 6252–6259 (2019). https://doi.org/10.1609/aaai.v33i01.33016252

  6. Devlin, J., Chang, M.W., Lee, K., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  7. Gandhi, S., Kokkula, S., Chaudhuri, A., et al.: Image matters: detecting offensive and non-compliant content/logo in product images. arXiv preprint arXiv:1905.02234 (2019)

  8. Greff, K., Srivastava, R.K., Koutník, J., et al.: LSTM: a search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 28(10), 2222–2232 (2016). https://doi.org/10.1109/TNNLS.2016.2582924

    Article  MathSciNet  Google Scholar 

  9. Guo, X., Ma, J., Zubiaga, A.: NUAA-QMUL at SemEval-2020 task 8: utilizing BERT and densenet for internet meme emotion analysis. arXiv preprint arXiv:2011.02788 (2020)

  10. Guo, Y., Huang, J., Dong, Y., Xu, M.: Guoym at SemEval-2020 task 8: ensemble-based classification of visuo-lingual metaphor in memes. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 1120–1125 (2020)

    Google Scholar 

  11. Keswani, V., Singh, S., Agarwal, S., Modi, A.: IITK at SemEval-2020 task 8: unimodal and bimodal sentiment analysis of internet memes. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 1135–1140 (2020)

    Google Scholar 

  12. Kiela, D., Bhooshan, S., Firooz, H., et al.: Supervised multimodal bitransformers for classifying images and text. arXiv preprint arXiv:1909.02950 (2019)

  13. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of ICLR (2015)

    Google Scholar 

  14. Li, L.H., Yatskar, M., Yin, D., et al.: VisualBERT: a simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557 (2019)

  15. Lu, J., Batra, D., Parikh, D., Lee, S.: VilBERT: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 13–23 (2019)

    Google Scholar 

  16. Majumder, N., Hazarika, D., Gelbukh, A., Cambria, E., Poria, S.: Multimodal sentiment analysis using hierarchical fusion with context modeling. Knowl.-Based Syst. 161, 124–133 (2018). https://doi.org/10.1016/j.knosys.2018.07.041

    Article  Google Scholar 

  17. Shang, L., Zhang, Y., Zha, Y., Chen, Y., Youn, C., Wang, D.: AOMD: an analogy-aware approach to offensive meme detection on social media. Inf. Process. Manag. 58(5), 102664 (2021). https://doi.org/10.1016/j.ipm.2021.102664

    Article  Google Scholar 

  18. Sharma, C., et al.: SemEval-2020 task 8: memotion analysis-the visuo-lingual metaphor! In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 759–773 (2020)

    Google Scholar 

  19. Sharma, M., Kandasamy, I., Vasantha, W.: Memebusters at SemEval-2020 task 8: feature fusion model for sentiment analysis on memes using transfer learning. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 1163–1171 (2020)

    Google Scholar 

  20. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings of ICLR (2015)

    Google Scholar 

  21. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  Google Scholar 

  22. Suryawanshi, S., Chakravarthi, B.R., Arcan, M., Buitelaar, P.: Multimodal meme dataset (multioff) for identifying offensive content in image and text. In: Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, pp. 32–41 (2020)

    Google Scholar 

  23. Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding. In: SIGMOD, pp. 481–492 (2012). https://doi.org/10.1145/2213836.2213891

  24. Wu, Y., Schuster, M., Chen, Z., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)

  25. Yu, J., Jiang, J.: Adapting BERT for target-oriented multimodal sentiment classification. In: IJCAI, pp. 5408–5414 (2019). https://doi.org/10.24963/ijcai.2019/751

  26. Yu, W., Xu, H., Meng, F., et al.: CH-SIMS: a Chinese multimodal sentiment analysis dataset with fine-grained annotation of modality. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 3718–3727 (2020). https://doi.org/10.18653/v1/2020.acl-main.343

  27. Yuan, L., Wang, J., Zhang, X.: YNU-HPCC at SemEval-2020 task 8: using a parallel-channel model for memotion analysis. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 916–921 (2020)

    Google Scholar 

  28. Zadeh, A., Liang, P.P., Mazumder, N., Poria, S., Cambria, E., Morency, L.P.: Memory fusion network for multi-view sequential learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 5634–5641. AAAI Press (2018)

    Google Scholar 

Download references

Acknowledgement

This work is supported by the Chongqing Research Program of Basic Research and Frontier Technology under Grant No. cstc2019jcyj-msxmX0033.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qian Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhong, Q., Wang, Q., Liu, J. (2022). Combining Knowledge and Multi-modal Fusion for Meme Classification. In: Þór Jónsson, B., et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13141. Springer, Cham. https://doi.org/10.1007/978-3-030-98358-1_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-98358-1_47

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-98357-4

  • Online ISBN: 978-3-030-98358-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics