Abstract:
Recently, fake news detection (FND) is an essential task in the field of social network analysis, and multimodal detection methods that combine text and image have been s...Show MoreMetadata
Abstract:
Recently, fake news detection (FND) is an essential task in the field of social network analysis, and multimodal detection methods that combine text and image have been significantly explored in the last five years. However, the physical features of images that can be clearly shown in the frequency level are often ignored, and thus cross-modal feature extraction and interaction still remain a great challenge when the frequency domain is introduced for multimodal FND. To address this issue, we propose a frequency-aware cross-modal interaction network (FCINet) for multimodal FND in this article. First, a triple-branch encoder with robust feature extraction capacity is proposed to explore the representation of frequency, spatial, and text domains, separately. Then, we design a parallel cross-modal interaction strategy to fully exploit the interdependencies among them to facilitate multimodal FND. Finally, a combined loss function including deep auxiliary supervision and event classification is introduced to improve the generalization ability for multitask training. Extensive experiments and visual analysis on two public real-world multimodal fake news datasets show that the presented FCINet obtains excellent performance and exceeds numerous state-of-the-art methods.
Published in: IEEE Transactions on Computational Social Systems ( Volume: 11, Issue: 5, October 2024)