Multimodal Rumor Detection by Using Additive Angular Margin with Class-Aware Attention for Hard Samples

Zhou, Chenyu; Li, Xiuhong; Li, Zhe; Chen, Fan; Wang, Xiaofan; Yang, Dan; Chen, Bin; Li, Songlin

doi:10.1007/978-981-99-8429-9_27

Chenyu Zhou ORCID: orcid.org/0009-0006-9220-7501¹⁵,
Xiuhong Li ORCID: orcid.org/0000-0002-5327-0907¹⁵,
Zhe Li ORCID: orcid.org/0000-0002-0519-7434¹⁶,
Fan Chen¹⁵,
Xiaofan Wang¹⁵,
Dan Yang¹⁵,
Bin Chen¹⁷ &
…
Songlin Li ORCID: orcid.org/0009-0008-6793-5624¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14425))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

943 Accesses

Abstract

Currently, several factors limit the practicality of multimodal rumor detection (MRD). These include incomplete feature fusion in multimodal data, the weak discriminative power in the softmax-based loss, and the detrimental impact of hard negative samples on the learning process. To address these issues, we propose a MRD framework that combines a supervised contrastive loss with an additive angular margin and incorporates class-aware attention. We propose a multi-layer fusion (MLF) module to enhance the multimodal feature fusion to align and fuse token-level features from text and image modalities. And also, by adding an angular margin to the loss function, we bolster the discriminative power of the contrastive loss. Additionally, the class-aware attention module effectively mitigates the impact of hard negative samples on the supervised contrastive loss. Extensive experiments on three real-world multimodal datasets demonstrate that our proposed learning objective leads to an embedding space that effectively distinguishes between rumors and truths. Furthermore, our work has significantly improved the efficacy of rumor detection, enabling us to promptly identify and curtail rumors’ propagation.

Thanks to the open project of key laboratory, Xinjiang Uygur Autonomous Region (No. 2023D04079).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)
Google Scholar
Chen, Y., et al.: Cross-modal ambiguity learning for multimodal fake news detection. In: Proceedings of the ACM Web Conference 2022, pp. 2897–2905 (2022)
Google Scholar
Cubuk, E.D., Zoph, B., Shlens, J., Le, Q.V.: RandAugment: practical automated data augmentation with a reduced search space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 702–703 (2020)
Google Scholar
Cui, Y., Zhou, F., Lin, Y., Belongie, S.: Fine-grained categorization and dataset bootstrapping using deep metric learning with humans in the loop. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1153–1162 (2016)
Google Scholar
Boididou, C., Papadopoulos, S., Zampoglou, M., Apostolidis, L., Papadopoulou, O., Kompatsiaris, Y.: Detection, visualization of misleading content on Twitter. Int. J. Multimedia Inf. Retrieval 7(1), 71–86 (2018)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. ArXiv abs/1810.04805 (2019)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. ArXiv abs/2010.11929 (2020)
Google Scholar
Gao, Y., Wang, X., He, X., Feng, H., Zhang, Y.: Rumor detection with self-supervised learning on texts and social graph. Front. Comp. Sci. 17(4), 174611 (2023)
Article Google Scholar
Han, W., Chen, H., Poria, S.: Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis. arXiv preprint arXiv:2109.00412 (2021)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)
Google Scholar
Hua, J., Cui, X., Li, X., Tang, K., Zhu, P.: Multimodal fake news detection through data augmentation-based contrastive learning. Appl. Soft Comput. 136, 110125 (2023)
Article Google Scholar
Jin, Z., Cao, J., Guo, H., Zhang, Y., Luo, J.: Multimodal fusion with recurrent neural networks for rumor detection on microblogs. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 795–816 (2017)
Google Scholar
Ke, Z., Sheng, J., Li, Z., Silamu, W., Guo, Q.: Knowledge-guided sentiment analysis via learning from natural language explanations. IEEE Access 9, 3570–3578 (2021)
Article Google Scholar
Khattar, D., Goud, J.S., Gupta, M., Varma, V.: MVAE: multimodal variational autoencoder for fake news detection. In: The World Wide Web Conference, pp. 2915–2921 (2019)
Google Scholar
Khosla, P., et al.: Supervised contrastive learning. Adv. Neural. Inf. Process. Syst. 33, 18661–18673 (2020)
Google Scholar
Li, X., Li, Z., Sheng, J., Slamu, W.: Low-resource text classification via cross-lingual language model fine-tuning. In: Sun, M., Li, S., Zhang, Y., Liu, Y., He, S., Rao, G. (eds.) CCL 2020. LNCS (LNAI), vol. 12522, pp. 231–246. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63031-7_17
Chapter Google Scholar
Li, Z., Li, X., Sheng, J., Slamu, W.: AgglutiFiT: efficient low-resource agglutinative language model fine-tuning. IEEE Access 8, 148489–148499 (2020)
Article Google Scholar
Li, Z., Mak, M.W.: Speaker representation learning via contrastive loss with maximal speaker separability. In: 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 962–967. IEEE (2022)
Google Scholar
Li, Z., Mak, M.W., Meng, H.M.L.: Discriminative speaker representation via contrastive learning with class-aware attention in angular space. In: ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE (2023)
Google Scholar
Lu, M., Huang, Z., Li, B., Zhao, Y., Qin, Z., Li, D.: SIFTER: a framework for robust rumor detection. IEEE/ACM Trans. Audio Speech Lang. Process. 30, 429–442 (2022)
Article Google Scholar
Ma, J., Gao, W., Wong, K.F.: Detect rumors in microblog posts using propagation structure via kernel learning. Association for Computational Linguistics (2017)
Google Scholar
Ma, J., Gao, W., Wong, K.F.: Rumor detection on twitter with tree-structured recursive neural networks. Association for Computational Linguistics (2018)
Google Scholar
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: Proceedings of the 28th International Conference on Machine Learning (ICML-2011), pp. 689–696 (2011)
Google Scholar
Peng, L., Jian, S., Li, D., Shen, S.: MRML: multimodal rumor detection by deep metric learning. In: ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE (2023)
Google Scholar
Sang, M., Li, H., Liu, F., Arnold, A.O., Wan, L.: Self-supervised speaker verification with simple Siamese network and self-supervised regularization. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6127–6131. IEEE (2022)
Google Scholar
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)
Google Scholar
Sheng, J., et al.: Multi-view contrastive learning with additive margin for adaptive nasopharyngeal carcinoma radiotherapy prediction. In: Proceedings of the 2023 ACM International Conference on Multimedia Retrieval, pp. 555–559 (2023)
Google Scholar
Wang, Y., et al.: EANN: event adversarial neural networks for multi-modal fake news detection. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 849–857 (2018)
Google Scholar
Wei, Z., Pan, H., Qiao, L., Niu, X., Dong, P., Li, D.: Cross-modal knowledge distillation in multi-modal fake news detection. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4733–4737. IEEE (2022)
Google Scholar
Wu, Y., Zhan, P., Zhang, Y., Wang, L., Xu, Z.: Multimodal fusion with co-attention networks for fake news detection. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 2560–2569 (2021)
Google Scholar
Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3733–3742 (2018)
Google Scholar
Xie, Q., Dai, Z., Hovy, E., Luong, T., Le, Q.: Unsupervised data augmentation for consistency training. Adv. Neural. Inf. Process. Syst. 33, 6256–6268 (2020)
Google Scholar
Xue, J., Wang, Y., Tian, Y., Li, Y., Shi, L., Wei, L.: Detecting fake news by exploring the consistency of multimodal data. Inf. Process. Manag. 58(5), 102610 (2021)
Article Google Scholar
Ying, Q., Hu, X., Zhou, Y., Qian, Z., Zeng, D., Ge, S.: Bootstrapping multi-view representations for fake news detection. In: Proceedings of the AAAI Conference on Artificial Intelligence (2023)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Science and Engineering, Xinjiang University, Xinjiang, China
Chenyu Zhou, Xiuhong Li, Fan Chen, Xiaofan Wang, Dan Yang & Songlin Li
Department of Electrical and Electronic Engineering, The Hong Kong Polytechnic University, Hong Kong, Hong Kong SAR, China
Zhe Li
School of Mechanical Engineering, University of Shanghai for Science and Technology, Shanghai, China
Bin Chen

Authors

Chenyu Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Xiuhong Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Li
View author publications
You can also search for this author in PubMed Google Scholar
Fan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Bin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Songlin Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiuhong Li .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, C. et al. (2024). Multimodal Rumor Detection by Using Additive Angular Margin with Class-Aware Attention for Hard Samples. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14425. Springer, Singapore. https://doi.org/10.1007/978-981-99-8429-9_27

Download citation

DOI: https://doi.org/10.1007/978-981-99-8429-9_27
Published: 24 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8428-2
Online ISBN: 978-981-99-8429-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multimodal Rumor Detection by Using Additive Angular Margin with Class-Aware Attention for Hard Samples