Abstract
Internet memes are widespread on social media platforms such as Twitter and Facebook. Recently, meme classification has been an active research topic, especially meme sentiment classification and meme offensive classification. Internet memes contain multi-modal information, and the meme text is embedded in the meme image. The existing methods classify memes by simply concatenating global visual and textual features to generate a multi-modal representation. However, these approaches ignored the noise introduced by global visual features and the potential common information of meme multi-modal representation. In this paper, we propose a model for meme classification named MeBERT. Our method enhances the semantic representation of the meme by introducing conceptual information through external Knowledge Bases (KBs). Then, to reduce noise, a concept-image attention module is designed to extract concept-sensitive visual representation. In addition, a deep convolution tensor fusion module is built to effectively integrate multi-modal information. To verify the effectiveness of the model in the tasks of meme sentiment classification and meme offensive classification, we designed experiments on the Memotion and MultiOFF datasets. The experimental results show that the MeBERT model achieves better performance than state-of-the-art techniques for meme classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
Baltrušaitis, T., Ahuja, C., Morency, L.P.: Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41(2), 423–443 (2018). https://doi.org/10.1109/TPAMI.2018.2798607
Bonheme, L., Grzes, M.: SESAM at SemEval-2020 task 8: investigating the relationship between image and text in sentiment analysis of memes. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 804–816 (2020)
Cao, R., Fan, Z., Lee, R.K., Chong, W., Jiang, J.: Disentangling hate in online memes. arXiv preprint arXiv:2108.06207 (2021)
Chen, J., Hu, Y., Liu, J., Xiao, Y., Jiang, H.: Deep short text classification with knowledge powered attention. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 6252–6259 (2019). https://doi.org/10.1609/aaai.v33i01.33016252
Devlin, J., Chang, M.W., Lee, K., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Gandhi, S., Kokkula, S., Chaudhuri, A., et al.: Image matters: detecting offensive and non-compliant content/logo in product images. arXiv preprint arXiv:1905.02234 (2019)
Greff, K., Srivastava, R.K., KoutnÃk, J., et al.: LSTM: a search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 28(10), 2222–2232 (2016). https://doi.org/10.1109/TNNLS.2016.2582924
Guo, X., Ma, J., Zubiaga, A.: NUAA-QMUL at SemEval-2020 task 8: utilizing BERT and densenet for internet meme emotion analysis. arXiv preprint arXiv:2011.02788 (2020)
Guo, Y., Huang, J., Dong, Y., Xu, M.: Guoym at SemEval-2020 task 8: ensemble-based classification of visuo-lingual metaphor in memes. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 1120–1125 (2020)
Keswani, V., Singh, S., Agarwal, S., Modi, A.: IITK at SemEval-2020 task 8: unimodal and bimodal sentiment analysis of internet memes. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 1135–1140 (2020)
Kiela, D., Bhooshan, S., Firooz, H., et al.: Supervised multimodal bitransformers for classifying images and text. arXiv preprint arXiv:1909.02950 (2019)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of ICLR (2015)
Li, L.H., Yatskar, M., Yin, D., et al.: VisualBERT: a simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557 (2019)
Lu, J., Batra, D., Parikh, D., Lee, S.: VilBERT: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 13–23 (2019)
Majumder, N., Hazarika, D., Gelbukh, A., Cambria, E., Poria, S.: Multimodal sentiment analysis using hierarchical fusion with context modeling. Knowl.-Based Syst. 161, 124–133 (2018). https://doi.org/10.1016/j.knosys.2018.07.041
Shang, L., Zhang, Y., Zha, Y., Chen, Y., Youn, C., Wang, D.: AOMD: an analogy-aware approach to offensive meme detection on social media. Inf. Process. Manag. 58(5), 102664 (2021). https://doi.org/10.1016/j.ipm.2021.102664
Sharma, C., et al.: SemEval-2020 task 8: memotion analysis-the visuo-lingual metaphor! In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 759–773 (2020)
Sharma, M., Kandasamy, I., Vasantha, W.: Memebusters at SemEval-2020 task 8: feature fusion model for sentiment analysis on memes using transfer learning. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 1163–1171 (2020)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings of ICLR (2015)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Suryawanshi, S., Chakravarthi, B.R., Arcan, M., Buitelaar, P.: Multimodal meme dataset (multioff) for identifying offensive content in image and text. In: Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, pp. 32–41 (2020)
Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding. In: SIGMOD, pp. 481–492 (2012). https://doi.org/10.1145/2213836.2213891
Wu, Y., Schuster, M., Chen, Z., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)
Yu, J., Jiang, J.: Adapting BERT for target-oriented multimodal sentiment classification. In: IJCAI, pp. 5408–5414 (2019). https://doi.org/10.24963/ijcai.2019/751
Yu, W., Xu, H., Meng, F., et al.: CH-SIMS: a Chinese multimodal sentiment analysis dataset with fine-grained annotation of modality. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 3718–3727 (2020). https://doi.org/10.18653/v1/2020.acl-main.343
Yuan, L., Wang, J., Zhang, X.: YNU-HPCC at SemEval-2020 task 8: using a parallel-channel model for memotion analysis. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 916–921 (2020)
Zadeh, A., Liang, P.P., Mazumder, N., Poria, S., Cambria, E., Morency, L.P.: Memory fusion network for multi-view sequential learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 5634–5641. AAAI Press (2018)
Acknowledgement
This work is supported by the Chongqing Research Program of Basic Research and Frontier Technology under Grant No. cstc2019jcyj-msxmX0033.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhong, Q., Wang, Q., Liu, J. (2022). Combining Knowledge and Multi-modal Fusion for Meme Classification. In: Þór Jónsson, B., et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13141. Springer, Cham. https://doi.org/10.1007/978-3-030-98358-1_47
Download citation
DOI: https://doi.org/10.1007/978-3-030-98358-1_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98357-4
Online ISBN: 978-3-030-98358-1
eBook Packages: Computer ScienceComputer Science (R0)