ABSTRACT
Misuse of real photographs with conflicting image captions in news items is one case of out-of-context (OOC) misuse of media. To detect out-of-context given pair of news (i.e., captions) and attached image, people should determine the truthfulness of the statement and evaluate whether the triplet talks to the same event. This paper presents a new method to detect the OOC media challenge introduced in ACMMM'22 Grand Challenge on Detecting Cheapfakes. For $Task_1$ (i.e., detect conflicting image-caption triplets), our approach uses bottom-up attention with visual semantic reasoning to extract global features of the image to perform comprehensive image-text matching, multiple models of natural language processing to extract semantic relation of the caption and utilize boosting to improve the accuracy of the performance. Our method achieved an 85.5% accuracy score on 80% of the testing dataset and has 4.5% higher accuracy than baseline. For Task2 (i.e., determine whether a given (Image, Caption) pair is real or fake), we detect the veracity of captioned image base semantic features and correlation between image/caption and achieve 76% accuracy. Our source code is available at https://github.com/latuanvinh1998/Cheapfakes_detection_acmmm. Docker for submission is available at https://hub.docker.com/repository/docker/latuanvinh1998/acmmmcheapfakes.
Supplemental Material
- Tankut Akgul, Tugce Erkilic Civelek, Deniz Ugur, and Ali C Begen. 2021. COSMOS on Steroids: a Cheap Detector for Cheapfakes. In Proceedings of the 12th ACM Multimedia Systems Conference. 327--331.Google ScholarDigital Library
- Shivangi Aneja, Chris Bregler, and Matthias Nießner. 2021a. COSMOS: Catching out-of-context misinformation with self-supervised learning. arXiv preprint arXiv:2101.06278 (2021).Google Scholar
- Shivangi Aneja, Cise Midoglu, Duc-Tien Dang-Nguyen, Michael Alexander Riegler, Paal Halvorsen, Matthias Nießner, Balu Adsumilli, and Chris Bregler. 2021b. MMSys' 21 grand challenge on detecting cheapfakes. arXiv preprint arXiv:2107.05297 (2021).Google Scholar
- Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).Google ScholarDigital Library
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
- Fartash Faghri, David J Fleet, Jamie Ryan Kiros, and Sanja Fidler. 2017. Vse: Improving visual-semantic embeddings with hard negatives. arXiv preprint arXiv:1707.05612 (2017).Google Scholar
- Lisa Fazio. 2020. Out-of-context photos are a powerful low-tech form of misinformation. The Conversation (2020).Google Scholar
- Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2000. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The annals of statistics, Vol. 28, 2 (2000), 337--407.Google Scholar
- Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. 2020. Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654 (2020).Google Scholar
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
- Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, et al. 2017. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International journal of computer vision, Vol. 123, 1 (2017), 32--73.Google Scholar
- Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu, and Xiaodong He. 2018. Stacked cross attention for image-text matching. In Proceedings of the European Conference on Computer Vision (ECCV). 201--216.Google ScholarDigital Library
- Kunpeng Li, Yulun Zhang, Kai Li, Yuanyuan Li, and Yun Fu. 2019. Visual semantic reasoning for image-text matching. In Proceedings of the IEEE/CVF International conference on computer vision. 4654--4662.Google ScholarCross Ref
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, Vol. 28 (2015).Google Scholar
- Bin Wang and C-C Jay Kuo. 2020. Sbert-wk: A sentence embedding method by dissecting bert-based word models. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 28 (2020), 2146--2157.Google ScholarDigital Library
- Adina Williams, Nikita Nangia, and Samuel R Bowman. 2017. A broad-coverage challenge corpus for sentence understanding through inference. arXiv preprint arXiv:1704.05426 (2017).Google Scholar
Index Terms
A Combination of Visual-Semantic Reasoning and Text Entailment-based Boosting Algorithm for Cheapfake Detection
Recommendations
A Textual-Visual-Entailment-based Unsupervised Algorithm for Cheapfake Detection
MM '22: Proceedings of the 30th ACM International Conference on MultimediaThe growth of communication has led to misinformation in many different forms. "Cheapfake" is a recently coined term referring to manipulated media generated by non-AI techniques. One of the most prevalent ways to create cheapfakes is by simply altering ...
Multimodal Cheapfakes Detection by Utilizing Image Captioning for Global Context
ICDAR '22: Proceedings of the 3rd ACM Workshop on Intelligent Cross-Data Analysis and RetrievalThe rapid development of technology in social media platforms has led to abundant misinformation and fake news spreading in the community. One of the most prevalent ways to misleading information on social media is cheapfakes, which are more accessible ...
Semantic Image Captioning using Cosine Similarity Ranking with Semantic Search
IC3-2023: Proceedings of the 2023 Fifteenth International Conference on Contemporary ComputingSocial media has become an integral part of our daily lives, and its use has increased exponentially in recent years. With the rise of smartphones and internet accessibility, people of all ages, ranging from school-going children to older generations, ...
Comments