skip to main content
10.1145/3487553.3524650acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Leveraging Intra and Inter Modality Relationship for Multimodal Fake News Detection

Published: 16 August 2022 Publication History

Abstract

Recent years have witnessed a massive growth in the proliferation of fake news online. User-generated content is a blend of text and visual information leading to producing different variants of fake news. As a result, researchers started targeting multimodal methods for fake news detection. Existing methods capture high-level information from different modalities and jointly model them to decide. Given multiple input modalities, we hypothesize that not all modalities may be equally responsible for decision-making. Hence, this paper presents a novel architecture that effectively identifies and suppresses information from weaker modalities and extracts relevant information from the strong modality on a per-sample basis. We also establish intra-modality relationship by extracting fine-grained image and text features. We conduct extensive experiments on real-world datasets to show that our approach outperforms the state-of-the-art by an average of 3.05% and 4.525% on accuracy and F1-score, respectively. We also release the code, implementation details, and model checkpoints for the community’s interest.1

References

[1]
Mahdi Abavisani, Liwei Wu, Shengli Hu, Joel Tetreault, and Alejandro Jaimes. 2020. Multimodal categorization of crisis events in social media. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14679–14689.
[2]
P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang. 2018. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6077–6086.
[3]
Pradeep K Atrey, M Anwar Hossain, Abdulmotaleb El Saddik, and Mohan S Kankanhalli. 2010. Multimodal fusion for multimedia analysis: a survey. Multimedia systems, 345–379.
[4]
Christina Boididou, Katerina Andreadou, Symeon Papadopoulos, Duc-Tien Dang-Nguyen, Giulia Boato, Michael Riegler, Yiannis Kompatsiaris, 2015. Verifying Multimedia Use at MediaEval 2015.MediaEval 3, 3 (2015), 7.
[5]
Juan Cao, Peng Qi, Qiang Sheng, Tianyun Yang, Junbo Guo, and Jintao Li. 2020. Exploring the role of visual content in fake news detection. Disinformation, Misinformation, and Fake News in Social Media (2020), 141–161.
[6]
Nadia K Conroy, Victoria L Rubin, and Yimin Chen. 2015. Automatic deception detection: Methods for finding fake news. Proceedings of the association for information science and technology (2015), 1–4.
[7]
Limeng Cui, Suhang Wang, and Dongwon Lee. 2019. SAME: Sentiment-Aware Multi-Modal Embedding for Detecting Fake News. In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). 41–48.
[8]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint (2018).
[9]
Song Feng, Ritwik Banerjee, and Yejin Choi. 2012. Syntactic Stylometry for Deception Detection. In Association for Computational Linguistics (ACL). 171–175.
[10]
Hatice Gunes and Massimo Piccardi. 2005. Affect recognition from face and body: early fusion vs. late fusion. In IEEE international conference on systems, man and cybernetics. 3437–3443.
[11]
Yan Huang, Wei Wang, and Liang Wang. 2017. Instance-aware image and sentence matching with selective multimodal lstm. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2310–2318.
[12]
Minyoung Huh, Andrew Liu, Andrew Owens, and Alexei A Efros. 2018. Fighting fake news: Image splice detection via learned self-consistency. In Proceedings of the European Conference on Computer Vision (ECCV). 101–117.
[13]
Zhiwei Jin, Juan Cao, Han Guo, Yongdong Zhang, and Jiebo Luo. 2017. Multimodal Fusion with Recurrent Neural Networks for Rumor Detection on Microblogs. 795–816.
[14]
Zhezhou Kang, Yanan Cao, Yanmin Shang, Tao Liang, Hengzhu Tang, and Lingling Tong. 2021. Fake News Detection with Heterogenous Deep Graph Convolutional Network. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). 408–420.
[15]
Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3128–3137.
[16]
Dhruv Khattar, Jaipal Singh Goud, Manish Gupta, and Vasudeva Varma. 2019. MVAE: Multimodal Variational Autoencoder for Fake News Detection. In The World Wide Web Conference (WWW). 2915–2921.
[17]
Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Empirical Methods in Natural Language Processing (EMNLP).
[18]
Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A. Shamma, Michael S. Bernstein, and Li Fei-Fei. 2017. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. Int. J. Comput. Vision(2017), 32–73.
[19]
Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu, and Xiaodong He. 2018. Stacked cross attention for image-text matching. In Proceedings of the European Conference on Computer Vision (ECCV). 201–216.
[20]
Kuan Liu, Yanen Li, Ning Xu, and Prem Natarajan. 2018. Learn to combine modalities in multimodal deep learning. arXiv preprint (2018).
[21]
David M Markowitz and Jeffrey T Hancock. 2014. Linguistic traces of a scientific fraud: The case of Diederik Stapel. PloS one 9, 8 (2014), e105937.
[22]
Trisha Mittal, Uttaran Bhattacharya, Rohan Chandra, Aniket Bera, and Dinesh Manocha. 2020. M3er: Multiplicative multimodal emotion recognition using facial, textual, and speech cues. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 1359–1367.
[23]
Hyeonseob Nam, Jung-Woo Ha, and Jeonghee Kim. 2017. Dual attention networks for multimodal reasoning and matching. In IEEE conference on computer vision and pattern recognition (CVPR). 299–307.
[24]
N. Neverova, C. Wolf, G. Taylor, and F. Nebout. 2016. ModDrop: Adaptive Multi-Modal Gesture Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2016), 1692–1706.
[25]
Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y. Ng. 2011. Multimodal Deep Learning. In Proceedings of the 28th International Conference on International Conference on Machine Learning (ICML).
[26]
Kellin Pelrine, Jacob Danovitch, and Reihaneh Rabbany. 2021. The Surprising Performance of Simple Baselines for Misinformation Detection. Association for Computing Machinery, New York, NY, USA, 3432–3441. https://doi.org/10.1145/3442381.3450111
[27]
Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendorff, and Benno Stein. 2018. A Stylometric Inquiry into Hyperpartisan and Fake News. In Association for Computational Linguistics (ACL). 231–240.
[28]
Peng Qi, Juan Cao, Tianyun Yang, Junbo Guo, and Jintao Li. 2019. Exploiting Multi-domain Visual Information for Fake News Detection. In 2019 IEEE International Conference on Data Mining (ICDM). 518–527. https://doi.org/10.1109/ICDM.2019.00062
[29]
Bhavtosh Rath, Xavier Morales, and Jaideep Srivastava. 2021. SCARLET: Explainable Attention based Graph Neural Network for Fake News spreader prediction. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD).
[30]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in neural information processing systems (NeurIPS) (2015), 91–99.
[31]
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations (ICLR), Yoshua Bengio and Yann LeCun (Eds.).
[32]
Shivangi Singhal, Mudit Dhawan, Rajiv Ratn Shah, and Ponnurangam Kumaraguru. 2021. Inter-Modality Discordance for Multimodal Fake News Detection. In ACM Multimedia Asia (Gold Coast, Australia) (MMAsia ’21). Association for Computing Machinery, New York, NY, USA, Article 33, 7 pages. https://doi.org/10.1145/3469877.3490614
[33]
Shivangi Singhal, Anubha Kabra, Mohit Sharma, Rajiv Ratn Shah, Tanmoy Chakraborty, and Ponnurangam Kumaraguru. 2020. SpotFake+: A Multimodal Framework for Fake News Detection via Transfer Learning. Proceedings of the AAAI Conference(2020), 13915–13916.
[34]
S. Singhal, R. R. Shah, T. Chakraborty, P. Kumaraguru, and S. Satoh. 2019. SpotFake: A Multi-modal Framework for Fake News Detection. In IEEE International Conference on Multimedia Big Data (BigMM). 39–47.
[35]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS). 5998–6008.
[36]
Shuhui Wang, Yangyu Chen, Junbao Zhuo, Qingming Huang, and Qi Tian. 2018. Joint Global and Co-Attentive Representation Learning for Image-Sentence Retrieval. In ACM International Conference on Multimedia (ACMMM). 1398–1406.
[37]
Yaqing Wang, Fenglong Ma, Zhiwei Jin, Ye Yuan, Guangxu Xun, Kishlay Jha, Lu Su, and Jing Gao. 2018. EANN: Event Adversarial Neural Networks for Multi-Modal Fake News Detection. In ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). 849–857.
[38]
Martin Wöllmer, Angeliki Metallinou, Florian Eyben, Björn Schuller, and Shrikanth Narayanan. 2010. Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional lstm modeling. In Proc. INTERSPEECH 2010, Makuhari, Japan. 2362–2365.
[39]
Kun Wu, Xu Yuan, and Yue Ning. 2021. Incorporating Relational Knowledge in Explainable Fake News Detection. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD).
[40]
Yang Wu, Pengwei Zhan, Yunjian Zhang, Liming Wang, and Zhen Xu. 2021. Multimodal Fusion with Co-Attention Networks for Fake News Detection. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2560–2569.
[41]
Yang Yang, Lei Zheng, Jiawei Zhang, Qingcai Cui, Zhoujun Li, and Philip S Yu. 2018. TI-CNN: Convolutional neural networks for fake news detection. arXiv preprint arXiv:1806.00749(2018).
[42]
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems 32 (2019).
[43]
Xinyi Zhou, Jindi Wu, and Reza Zafarani. 2020. SAFE: Similarity-Aware Multi-Modal Fake News Detection. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD).

Cited By

View all
  • (2025)DCCMA-Net: Disentanglement-based cross-modal clues mining and aggregation network for explainable multimodal fake news detectionInformation Processing & Management10.1016/j.ipm.2025.10408962:4(104089)Online publication date: Jul-2025
  • (2025)Multimodal dual perception fusion framework for multimodal affective analysisInformation Fusion10.1016/j.inffus.2024.102747115(102747)Online publication date: Mar-2025
  • (2025)Knowledge-aware multimodal pre-training for fake news detectionInformation Fusion10.1016/j.inffus.2024.102715114(102715)Online publication date: Feb-2025
  • Show More Cited By

Index Terms

  1. Leveraging Intra and Inter Modality Relationship for Multimodal Fake News Detection

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '22: Companion Proceedings of the Web Conference 2022
    April 2022
    1338 pages
    ISBN:9781450391306
    DOI:10.1145/3487553
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 August 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Fragment Embedding
    2. Multimodal Fake News Detection
    3. Multiplicative Fusion

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    WWW '22
    Sponsor:
    WWW '22: The ACM Web Conference 2022
    April 25 - 29, 2022
    Virtual Event, Lyon, France

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)245
    • Downloads (Last 6 weeks)19
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)DCCMA-Net: Disentanglement-based cross-modal clues mining and aggregation network for explainable multimodal fake news detectionInformation Processing & Management10.1016/j.ipm.2025.10408962:4(104089)Online publication date: Jul-2025
    • (2025)Multimodal dual perception fusion framework for multimodal affective analysisInformation Fusion10.1016/j.inffus.2024.102747115(102747)Online publication date: Mar-2025
    • (2025)Knowledge-aware multimodal pre-training for fake news detectionInformation Fusion10.1016/j.inffus.2024.102715114(102715)Online publication date: Feb-2025
    • (2025)Cross-modal augmentation for few-shot multimodal fake news detectionEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.109931142(109931)Online publication date: Feb-2025
    • (2025)Hmltnet: multi-modal fake news detection via hierarchical multi-grained features fused with global latent topicNeural Computing and Applications10.1007/s00521-024-10924-6Online publication date: 3-Jan-2025
    • (2024)Natural language-centered inference network for multi-modal fake news detectionProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/281(2542-2550)Online publication date: 3-Aug-2024
    • (2024)Multi-modal sarcasm detection based on dual generative processesProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/252(2279-2287)Online publication date: 3-Aug-2024
    • (2024)TT-BLIP: Enhancing Fake News Detection Using BLIP and Tri-Transformer2024 27th International Conference on Information Fusion (FUSION)10.23919/FUSION59988.2024.10706486(1-8)Online publication date: 8-Jul-2024
    • (2024)Contrastive Learning Based on Feature Enhancement for Multi-modal Fake News Detection2024 43rd Chinese Control Conference (CCC)10.23919/CCC63176.2024.10661417(7610-7615)Online publication date: 28-Jul-2024
    • (2024)GS2F: Multimodal Fake News Detection Utilizing Graph Structure and Guided Semantic FusionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/370853624:2(1-22)Online publication date: 16-Dec-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media