research-article

Leveraging Intra and Inter Modality Relationship for Multimodal Fake News Detection

Authors:

Shivangi Singhal,

Tanisha Pandey,

Rajiv Ratn Shah,

Ponnurangam KumaraguruAuthors Info & Claims

WWW '22: Companion Proceedings of the Web Conference 2022

Pages 726 - 734

https://doi.org/10.1145/3487553.3524650

Published: 16 August 2022 Publication History

Abstract

Recent years have witnessed a massive growth in the proliferation of fake news online. User-generated content is a blend of text and visual information leading to producing different variants of fake news. As a result, researchers started targeting multimodal methods for fake news detection. Existing methods capture high-level information from different modalities and jointly model them to decide. Given multiple input modalities, we hypothesize that not all modalities may be equally responsible for decision-making. Hence, this paper presents a novel architecture that effectively identifies and suppresses information from weaker modalities and extracts relevant information from the strong modality on a per-sample basis. We also establish intra-modality relationship by extracting fine-grained image and text features. We conduct extensive experiments on real-world datasets to show that our approach outperforms the state-of-the-art by an average of 3.05% and 4.525% on accuracy and F1-score, respectively. We also release the code, implementation details, and model checkpoints for the community’s interest.1

References

[1]

Mahdi Abavisani, Liwei Wu, Shengli Hu, Joel Tetreault, and Alejandro Jaimes. 2020. Multimodal categorization of crisis events in social media. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14679–14689.

[2]

P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang. 2018. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6077–6086.

[3]

Pradeep K Atrey, M Anwar Hossain, Abdulmotaleb El Saddik, and Mohan S Kankanhalli. 2010. Multimodal fusion for multimedia analysis: a survey. Multimedia systems, 345–379.

[4]

Christina Boididou, Katerina Andreadou, Symeon Papadopoulos, Duc-Tien Dang-Nguyen, Giulia Boato, Michael Riegler, Yiannis Kompatsiaris, 2015. Verifying Multimedia Use at MediaEval 2015.MediaEval 3, 3 (2015), 7.

[5]

Juan Cao, Peng Qi, Qiang Sheng, Tianyun Yang, Junbo Guo, and Jintao Li. 2020. Exploring the role of visual content in fake news detection. Disinformation, Misinformation, and Fake News in Social Media (2020), 141–161.

[6]

Nadia K Conroy, Victoria L Rubin, and Yimin Chen. 2015. Automatic deception detection: Methods for finding fake news. Proceedings of the association for information science and technology (2015), 1–4.

[7]

Limeng Cui, Suhang Wang, and Dongwon Lee. 2019. SAME: Sentiment-Aware Multi-Modal Embedding for Detecting Fake News. In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). 41–48.

[8]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint (2018).

[9]

Song Feng, Ritwik Banerjee, and Yejin Choi. 2012. Syntactic Stylometry for Deception Detection. In Association for Computational Linguistics (ACL). 171–175.

[10]

Hatice Gunes and Massimo Piccardi. 2005. Affect recognition from face and body: early fusion vs. late fusion. In IEEE international conference on systems, man and cybernetics. 3437–3443.

[11]

Yan Huang, Wei Wang, and Liang Wang. 2017. Instance-aware image and sentence matching with selective multimodal lstm. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2310–2318.

[12]

Minyoung Huh, Andrew Liu, Andrew Owens, and Alexei A Efros. 2018. Fighting fake news: Image splice detection via learned self-consistency. In Proceedings of the European Conference on Computer Vision (ECCV). 101–117.

Digital Library

[13]

Zhiwei Jin, Juan Cao, Han Guo, Yongdong Zhang, and Jiebo Luo. 2017. Multimodal Fusion with Recurrent Neural Networks for Rumor Detection on Microblogs. 795–816.

[14]

Zhezhou Kang, Yanan Cao, Yanmin Shang, Tao Liang, Hengzhu Tang, and Lingling Tong. 2021. Fake News Detection with Heterogenous Deep Graph Convolutional Network. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). 408–420.

Digital Library

[15]

Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3128–3137.

[16]

Dhruv Khattar, Jaipal Singh Goud, Manish Gupta, and Vasudeva Varma. 2019. MVAE: Multimodal Variational Autoencoder for Fake News Detection. In The World Wide Web Conference (WWW). 2915–2921.

[17]

Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Empirical Methods in Natural Language Processing (EMNLP).

[18]

Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A. Shamma, Michael S. Bernstein, and Li Fei-Fei. 2017. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. Int. J. Comput. Vision(2017), 32–73.

[19]

Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu, and Xiaodong He. 2018. Stacked cross attention for image-text matching. In Proceedings of the European Conference on Computer Vision (ECCV). 201–216.

Digital Library

[20]

Kuan Liu, Yanen Li, Ning Xu, and Prem Natarajan. 2018. Learn to combine modalities in multimodal deep learning. arXiv preprint (2018).

[21]

David M Markowitz and Jeffrey T Hancock. 2014. Linguistic traces of a scientific fraud: The case of Diederik Stapel. PloS one 9, 8 (2014), e105937.

[22]

Trisha Mittal, Uttaran Bhattacharya, Rohan Chandra, Aniket Bera, and Dinesh Manocha. 2020. M3er: Multiplicative multimodal emotion recognition using facial, textual, and speech cues. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 1359–1367.

[23]

Hyeonseob Nam, Jung-Woo Ha, and Jeonghee Kim. 2017. Dual attention networks for multimodal reasoning and matching. In IEEE conference on computer vision and pattern recognition (CVPR). 299–307.

[24]

N. Neverova, C. Wolf, G. Taylor, and F. Nebout. 2016. ModDrop: Adaptive Multi-Modal Gesture Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2016), 1692–1706.

[25]

Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y. Ng. 2011. Multimodal Deep Learning. In Proceedings of the 28th International Conference on International Conference on Machine Learning (ICML).

[26]

Kellin Pelrine, Jacob Danovitch, and Reihaneh Rabbany. 2021. The Surprising Performance of Simple Baselines for Misinformation Detection. Association for Computing Machinery, New York, NY, USA, 3432–3441. https://doi.org/10.1145/3442381.3450111

Digital Library

[27]

Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendorff, and Benno Stein. 2018. A Stylometric Inquiry into Hyperpartisan and Fake News. In Association for Computational Linguistics (ACL). 231–240.

[28]

Peng Qi, Juan Cao, Tianyun Yang, Junbo Guo, and Jintao Li. 2019. Exploiting Multi-domain Visual Information for Fake News Detection. In 2019 IEEE International Conference on Data Mining (ICDM). 518–527. https://doi.org/10.1109/ICDM.2019.00062

[29]

Bhavtosh Rath, Xavier Morales, and Jaideep Srivastava. 2021. SCARLET: Explainable Attention based Graph Neural Network for Fake News spreader prediction. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD).

Digital Library

[30]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in neural information processing systems (NeurIPS) (2015), 91–99.

[31]

Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations (ICLR), Yoshua Bengio and Yann LeCun (Eds.).

[32]

Shivangi Singhal, Mudit Dhawan, Rajiv Ratn Shah, and Ponnurangam Kumaraguru. 2021. Inter-Modality Discordance for Multimodal Fake News Detection. In ACM Multimedia Asia (Gold Coast, Australia) (MMAsia ’21). Association for Computing Machinery, New York, NY, USA, Article 33, 7 pages. https://doi.org/10.1145/3469877.3490614

Digital Library

[33]

Shivangi Singhal, Anubha Kabra, Mohit Sharma, Rajiv Ratn Shah, Tanmoy Chakraborty, and Ponnurangam Kumaraguru. 2020. SpotFake+: A Multimodal Framework for Fake News Detection via Transfer Learning. Proceedings of the AAAI Conference(2020), 13915–13916.

[34]

S. Singhal, R. R. Shah, T. Chakraborty, P. Kumaraguru, and S. Satoh. 2019. SpotFake: A Multi-modal Framework for Fake News Detection. In IEEE International Conference on Multimedia Big Data (BigMM). 39–47.

[35]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS). 5998–6008.

[36]

Shuhui Wang, Yangyu Chen, Junbao Zhuo, Qingming Huang, and Qi Tian. 2018. Joint Global and Co-Attentive Representation Learning for Image-Sentence Retrieval. In ACM International Conference on Multimedia (ACMMM). 1398–1406.

[37]

Yaqing Wang, Fenglong Ma, Zhiwei Jin, Ye Yuan, Guangxu Xun, Kishlay Jha, Lu Su, and Jing Gao. 2018. EANN: Event Adversarial Neural Networks for Multi-Modal Fake News Detection. In ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). 849–857.

Digital Library

[38]

Martin Wöllmer, Angeliki Metallinou, Florian Eyben, Björn Schuller, and Shrikanth Narayanan. 2010. Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional lstm modeling. In Proc. INTERSPEECH 2010, Makuhari, Japan. 2362–2365.

[39]

Kun Wu, Xu Yuan, and Yue Ning. 2021. Incorporating Relational Knowledge in Explainable Fake News Detection. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD).

[40]

Yang Wu, Pengwei Zhan, Yunjian Zhang, Liming Wang, and Zhen Xu. 2021. Multimodal Fusion with Co-Attention Networks for Fake News Detection. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2560–2569.

[41]

Yang Yang, Lei Zheng, Jiawei Zhang, Qingcai Cui, Zhoujun Li, and Philip S Yu. 2018. TI-CNN: Convolutional neural networks for fake news detection. arXiv preprint arXiv:1806.00749(2018).

[42]

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems 32 (2019).

[43]

Xinyi Zhou, Jindi Wu, and Reza Zafarani. 2020. SAFE: Similarity-Aware Multi-Modal Fake News Detection. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD).

Cited By

Wei SWang ZLi MLiu XWu B(2025)DCCMA-Net: Disentanglement-based cross-modal clues mining and aggregation network for explainable multimodal fake news detectionInformation Processing & Management10.1016/j.ipm.2025.10408962:4(104089)Online publication date: Jul-2025
https://doi.org/10.1016/j.ipm.2025.104089
Lu QSun XLong YZhao XZou WFeng JWang X(2025)Multimodal dual perception fusion framework for multimodal affective analysisInformation Fusion10.1016/j.inffus.2024.102747115(102747)Online publication date: Mar-2025
https://doi.org/10.1016/j.inffus.2024.102747
Zhang LZhang XZhou ZZhang XYu PLi C(2025)Knowledge-aware multimodal pre-training for fake news detectionInformation Fusion10.1016/j.inffus.2024.102715114(102715)Online publication date: Feb-2025
https://doi.org/10.1016/j.inffus.2024.102715
Show More Cited By

Index Terms

Leveraging Intra and Inter Modality Relationship for Multimodal Fake News Detection
1. Applied computing
  1. Computer forensics
    1. Investigation techniques

Recommendations

Inter-modality Discordance for Multimodal Fake News Detection
MMAsia '21: Proceedings of the 3rd ACM International Conference on Multimedia in Asia

The paradigm shift in the consumption of news via online platforms has cultivated the growth of digital journalism. Contrary to traditional media, lowering entry barriers and enabling everyone to be part of content creation have disabled the concept of ...
Modeling Both Intra- and Inter-Modality Uncertainty for Multimodal Fake News Detection
Multimodal fake news detection has obtained increasing attention recently. Existing works generally encode multimodal contents into a deterministic point in semantic subspaces, and then fuse multimodal features by simple concatenation or attention ...
Intra and Inter-modality Incongruity Modeling and Adversarial Contrastive Learning for Multimodal Fake News Detection
ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval

Multimodal fake news detection (FND) is significant in safeguarding network security and societal safety. Most existing studies only focus on common semantic features between different modalities and utilize simple cross-entropy loss for model training. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '22: Companion Proceedings of the Web Conference 2022

April 2022

1338 pages

ISBN:9781450391306

DOI:10.1145/3487553

Editors:
Frédérique Laforest
INSA Lyon, France
,
Raphaël Troncy
EURECOM, France
,
Lionel Médini
Université Lyon 1, France
,
Ivan Herman
W3C / retired

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 August 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '22

Sponsor:

SIGWEB

WWW '22: The ACM Web Conference 2022

April 25 - 29, 2022

Virtual Event, Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

31
Total Citations
View Citations
890
Total Downloads

Downloads (Last 12 months)245
Downloads (Last 6 weeks)19

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wei SWang ZLi MLiu XWu B(2025)DCCMA-Net: Disentanglement-based cross-modal clues mining and aggregation network for explainable multimodal fake news detectionInformation Processing & Management10.1016/j.ipm.2025.10408962:4(104089)Online publication date: Jul-2025
https://doi.org/10.1016/j.ipm.2025.104089
Lu QSun XLong YZhao XZou WFeng JWang X(2025)Multimodal dual perception fusion framework for multimodal affective analysisInformation Fusion10.1016/j.inffus.2024.102747115(102747)Online publication date: Mar-2025
https://doi.org/10.1016/j.inffus.2024.102747
Zhang LZhang XZhou ZZhang XYu PLi C(2025)Knowledge-aware multimodal pre-training for fake news detectionInformation Fusion10.1016/j.inffus.2024.102715114(102715)Online publication date: Feb-2025
https://doi.org/10.1016/j.inffus.2024.102715
Jiang YWang TXu XWang YSong XMaynard D(2025)Cross-modal augmentation for few-shot multimodal fake news detectionEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.109931142(109931)Online publication date: Feb-2025
https://doi.org/10.1016/j.engappai.2024.109931
Cui SGong LLi T(2025)Hmltnet: multi-modal fake news detection via hierarchical multi-grained features fused with global latent topicNeural Computing and Applications10.1007/s00521-024-10924-6Online publication date: 3-Jan-2025
https://doi.org/10.1007/s00521-024-10924-6
Zhang QLiu JZhang FXie JZha ZLarson K(2024)Natural language-centered inference network for multi-modal fake news detectionProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/281(2542-2550)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/281
Ma HHe DWang XJin DGe MWang LLarson K(2024)Multi-modal sarcasm detection based on dual generative processesProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/252(2279-2287)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/252
Choi EKim J(2024)TT-BLIP: Enhancing Fake News Detection Using BLIP and Tri-Transformer2024 27th International Conference on Information Fusion (FUSION)10.23919/FUSION59988.2024.10706486(1-8)Online publication date: 8-Jul-2024
https://doi.org/10.23919/FUSION59988.2024.10706486
Wu GWang BLi XWang Q(2024)Contrastive Learning Based on Feature Enhancement for Multi-modal Fake News Detection2024 43rd Chinese Control Conference (CCC)10.23919/CCC63176.2024.10661417(7610-7615)Online publication date: 28-Jul-2024
https://doi.org/10.23919/CCC63176.2024.10661417
Zhou DOuyang QLin NZhou YYang A(2024)GS2F: Multimodal Fake News Detection Utilizing Graph Structure and Guided Semantic FusionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/370853624:2(1-22)Online publication date: 16-Dec-2024
https://dl.acm.org/doi/10.1145/3708536
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten