skip to main content
10.1145/3503161.3551595acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

A Combination of Visual-Semantic Reasoning and Text Entailment-based Boosting Algorithm for Cheapfake Detection

Authors Info & Claims
Published:10 October 2022Publication History

ABSTRACT

Misuse of real photographs with conflicting image captions in news items is one case of out-of-context (OOC) misuse of media. To detect out-of-context given pair of news (i.e., captions) and attached image, people should determine the truthfulness of the statement and evaluate whether the triplet talks to the same event. This paper presents a new method to detect the OOC media challenge introduced in ACMMM'22 Grand Challenge on Detecting Cheapfakes. For $Task_1$ (i.e., detect conflicting image-caption triplets), our approach uses bottom-up attention with visual semantic reasoning to extract global features of the image to perform comprehensive image-text matching, multiple models of natural language processing to extract semantic relation of the caption and utilize boosting to improve the accuracy of the performance. Our method achieved an 85.5% accuracy score on 80% of the testing dataset and has 4.5% higher accuracy than baseline. For Task2 (i.e., determine whether a given (Image, Caption) pair is real or fake), we detect the veracity of captioned image base semantic features and correlation between image/caption and achieve 76% accuracy. Our source code is available at https://github.com/latuanvinh1998/Cheapfakes_detection_acmmm. Docker for submission is available at https://hub.docker.com/repository/docker/latuanvinh1998/acmmmcheapfakes.

Skip Supplemental Material Section

Supplemental Material

MM22-mmgc44.mp4

Presentation video

References

  1. Tankut Akgul, Tugce Erkilic Civelek, Deniz Ugur, and Ali C Begen. 2021. COSMOS on Steroids: a Cheap Detector for Cheapfakes. In Proceedings of the 12th ACM Multimedia Systems Conference. 327--331.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Shivangi Aneja, Chris Bregler, and Matthias Nießner. 2021a. COSMOS: Catching out-of-context misinformation with self-supervised learning. arXiv preprint arXiv:2101.06278 (2021).Google ScholarGoogle Scholar
  3. Shivangi Aneja, Cise Midoglu, Duc-Tien Dang-Nguyen, Michael Alexander Riegler, Paal Halvorsen, Matthias Nießner, Balu Adsumilli, and Chris Bregler. 2021b. MMSys' 21 grand challenge on detecting cheapfakes. arXiv preprint arXiv:2107.05297 (2021).Google ScholarGoogle Scholar
  4. Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google ScholarGoogle Scholar
  6. Fartash Faghri, David J Fleet, Jamie Ryan Kiros, and Sanja Fidler. 2017. Vse: Improving visual-semantic embeddings with hard negatives. arXiv preprint arXiv:1707.05612 (2017).Google ScholarGoogle Scholar
  7. Lisa Fazio. 2020. Out-of-context photos are a powerful low-tech form of misinformation. The Conversation (2020).Google ScholarGoogle Scholar
  8. Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2000. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The annals of statistics, Vol. 28, 2 (2000), 337--407.Google ScholarGoogle Scholar
  9. Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. 2020. Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654 (2020).Google ScholarGoogle Scholar
  10. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  11. Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, et al. 2017. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International journal of computer vision, Vol. 123, 1 (2017), 32--73.Google ScholarGoogle Scholar
  12. Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu, and Xiaodong He. 2018. Stacked cross attention for image-text matching. In Proceedings of the European Conference on Computer Vision (ECCV). 201--216.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kunpeng Li, Yulun Zhang, Kai Li, Yuanyuan Li, and Yun Fu. 2019. Visual semantic reasoning for image-text matching. In Proceedings of the IEEE/CVF International conference on computer vision. 4654--4662.Google ScholarGoogle ScholarCross RefCross Ref
  14. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, Vol. 28 (2015).Google ScholarGoogle Scholar
  15. Bin Wang and C-C Jay Kuo. 2020. Sbert-wk: A sentence embedding method by dissecting bert-based word models. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 28 (2020), 2146--2157.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Adina Williams, Nikita Nangia, and Samuel R Bowman. 2017. A broad-coverage challenge corpus for sentence understanding through inference. arXiv preprint arXiv:1704.05426 (2017).Google ScholarGoogle Scholar

Index Terms

  1. A Combination of Visual-Semantic Reasoning and Text Entailment-based Boosting Algorithm for Cheapfake Detection

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MM '22: Proceedings of the 30th ACM International Conference on Multimedia
      October 2022
      7537 pages
      ISBN:9781450392037
      DOI:10.1145/3503161

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 October 2022

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate995of4,171submissions,24%

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader