Combining Knowledge and Multi-modal Fusion for Meme Classification

Zhong, Qi; Wang, Qian; Liu, Ji

doi:10.1007/978-3-030-98358-1_47

Qi Zhong¹⁵,
Qian Wang¹⁵ &
Ji Liu¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13141))

Included in the following conference series:

International Conference on Multimedia Modeling

2214 Accesses
1 Citations

Abstract

Internet memes are widespread on social media platforms such as Twitter and Facebook. Recently, meme classification has been an active research topic, especially meme sentiment classification and meme offensive classification. Internet memes contain multi-modal information, and the meme text is embedded in the meme image. The existing methods classify memes by simply concatenating global visual and textual features to generate a multi-modal representation. However, these approaches ignored the noise introduced by global visual features and the potential common information of meme multi-modal representation. In this paper, we propose a model for meme classification named MeBERT. Our method enhances the semantic representation of the meme by introducing conceptual information through external Knowledge Bases (KBs). Then, to reduce noise, a concept-image attention module is designed to extract concept-sensitive visual representation. In addition, a deep convolution tensor fusion module is built to effectively integrate multi-modal information. To verify the effectiveness of the model in the tasks of meme sentiment classification and meme offensive classification, we designed experiments on the Memotion and MultiOFF datasets. The experimental results show that the MeBERT model achieves better performance than state-of-the-art techniques for meme classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
Baltrušaitis, T., Ahuja, C., Morency, L.P.: Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41(2), 423–443 (2018). https://doi.org/10.1109/TPAMI.2018.2798607
Article Google Scholar
Bonheme, L., Grzes, M.: SESAM at SemEval-2020 task 8: investigating the relationship between image and text in sentiment analysis of memes. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 804–816 (2020)
Google Scholar
Cao, R., Fan, Z., Lee, R.K., Chong, W., Jiang, J.: Disentangling hate in online memes. arXiv preprint arXiv:2108.06207 (2021)
Chen, J., Hu, Y., Liu, J., Xiao, Y., Jiang, H.: Deep short text classification with knowledge powered attention. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 6252–6259 (2019). https://doi.org/10.1609/aaai.v33i01.33016252
Devlin, J., Chang, M.W., Lee, K., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Gandhi, S., Kokkula, S., Chaudhuri, A., et al.: Image matters: detecting offensive and non-compliant content/logo in product images. arXiv preprint arXiv:1905.02234 (2019)
Greff, K., Srivastava, R.K., Koutník, J., et al.: LSTM: a search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 28(10), 2222–2232 (2016). https://doi.org/10.1109/TNNLS.2016.2582924
Article MathSciNet Google Scholar
Guo, X., Ma, J., Zubiaga, A.: NUAA-QMUL at SemEval-2020 task 8: utilizing BERT and densenet for internet meme emotion analysis. arXiv preprint arXiv:2011.02788 (2020)
Guo, Y., Huang, J., Dong, Y., Xu, M.: Guoym at SemEval-2020 task 8: ensemble-based classification of visuo-lingual metaphor in memes. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 1120–1125 (2020)
Google Scholar
Keswani, V., Singh, S., Agarwal, S., Modi, A.: IITK at SemEval-2020 task 8: unimodal and bimodal sentiment analysis of internet memes. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 1135–1140 (2020)
Google Scholar
Kiela, D., Bhooshan, S., Firooz, H., et al.: Supervised multimodal bitransformers for classifying images and text. arXiv preprint arXiv:1909.02950 (2019)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of ICLR (2015)
Google Scholar
Li, L.H., Yatskar, M., Yin, D., et al.: VisualBERT: a simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557 (2019)
Lu, J., Batra, D., Parikh, D., Lee, S.: VilBERT: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 13–23 (2019)
Google Scholar
Majumder, N., Hazarika, D., Gelbukh, A., Cambria, E., Poria, S.: Multimodal sentiment analysis using hierarchical fusion with context modeling. Knowl.-Based Syst. 161, 124–133 (2018). https://doi.org/10.1016/j.knosys.2018.07.041
Article Google Scholar
Shang, L., Zhang, Y., Zha, Y., Chen, Y., Youn, C., Wang, D.: AOMD: an analogy-aware approach to offensive meme detection on social media. Inf. Process. Manag. 58(5), 102664 (2021). https://doi.org/10.1016/j.ipm.2021.102664
Article Google Scholar
Sharma, C., et al.: SemEval-2020 task 8: memotion analysis-the visuo-lingual metaphor! In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 759–773 (2020)
Google Scholar
Sharma, M., Kandasamy, I., Vasantha, W.: Memebusters at SemEval-2020 task 8: feature fusion model for sentiment analysis on memes using transfer learning. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 1163–1171 (2020)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings of ICLR (2015)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet Google Scholar
Suryawanshi, S., Chakravarthi, B.R., Arcan, M., Buitelaar, P.: Multimodal meme dataset (multioff) for identifying offensive content in image and text. In: Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, pp. 32–41 (2020)
Google Scholar
Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding. In: SIGMOD, pp. 481–492 (2012). https://doi.org/10.1145/2213836.2213891
Wu, Y., Schuster, M., Chen, Z., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)
Yu, J., Jiang, J.: Adapting BERT for target-oriented multimodal sentiment classification. In: IJCAI, pp. 5408–5414 (2019). https://doi.org/10.24963/ijcai.2019/751
Yu, W., Xu, H., Meng, F., et al.: CH-SIMS: a Chinese multimodal sentiment analysis dataset with fine-grained annotation of modality. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 3718–3727 (2020). https://doi.org/10.18653/v1/2020.acl-main.343
Yuan, L., Wang, J., Zhang, X.: YNU-HPCC at SemEval-2020 task 8: using a parallel-channel model for memotion analysis. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 916–921 (2020)
Google Scholar
Zadeh, A., Liang, P.P., Mazumder, N., Poria, S., Cambria, E., Morency, L.P.: Memory fusion network for multi-view sequential learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 5634–5641. AAAI Press (2018)
Google Scholar

Download references

Acknowledgement

This work is supported by the Chongqing Research Program of Basic Research and Frontier Technology under Grant No. cstc2019jcyj-msxmX0033.

Author information

Authors and Affiliations

College of Computer Science, Chongqing University, Chongqing, 400044, China
Qi Zhong, Qian Wang & Ji Liu

Authors

Qi Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Qian Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ji Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qian Wang .

Editor information

Editors and Affiliations

IT University of Copenhagen, Copenhagen, Denmark
Björn Þór Jónsson
Dublin City University, Dublin, Ireland
Cathal Gurrin
University of Science, VNU-HCM, Ho Chi Minh City, Vietnam
Minh-Triet Tran
University of Bergen, Bergen, Norway
Duc-Tien Dang-Nguyen
National Tsing Hua University, Hsinchu, Taiwan
Anita Min-Chun Hu
Hanoi University of Science and Technology, Hanoi, Vietnam
Binh Huynh Thi Thanh
Median Technologies, Valbonne, France
Benoit Huet

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhong, Q., Wang, Q., Liu, J. (2022). Combining Knowledge and Multi-modal Fusion for Meme Classification. In: Þór Jónsson, B., et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13141. Springer, Cham. https://doi.org/10.1007/978-3-030-98358-1_47

Download citation

DOI: https://doi.org/10.1007/978-3-030-98358-1_47
Published: 15 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98357-4
Online ISBN: 978-3-030-98358-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics