Skip to main content

Abstract

Fake content has grown at an incredible rate over the past few years. The spread of social media and online platforms makes their dissemination on a large scale increasingly accessible by malicious actors. In parallel, due to the growing diffusion of fake image generation methods, many Deep Learning-based detection techniques have been proposed. Most of those methods rely on extracting salient features from RGB images to detect through a binary classifier if the image is fake or real.

In this paper, we proposed DepthFake, a study on how to improve classical RGB-based approaches with depth-maps. The depth information is extracted from RGB images with recent monocular depth estimation techniques. Here, we demonstrate the effective contribution of depth-maps to the deepfake detection task on robust pre-trained architectures. The proposed RGBD approach is in fact able to achieve an average improvement of \(3.20\%\) and up to \(11.7\%\) for some deepfake attacks with respect to standard RGB architectures over the FaceForensic++ dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.bbc.com/news/technology-60780142.

  2. 2.

    https://www.tensorflow.org/.

  3. 3.

    http://dlib.net/.

References

  1. Afchar, D., Nozick, V., Yamagishi, J., Echizen, I.: Mesonet: a compact facial video forgery detection network. In: 2018 IEEE International Workshop on Information Forensics and Security (WIFS), pp. 1–7. IEEE (2018)

    Google Scholar 

  2. Agarwal, S., Farid, H., Fried, O., Agrawala, M.: Detecting deep-fake videos from phoneme-viseme mismatches. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 660–661 (2020)

    Google Scholar 

  3. Agarwal, S., Farid, H., Gu, Y., He, M., Nagano, K., Li, H.: Protecting world leaders against deep fakes. In: CVPR Workshops, vol. 1 (2019)

    Google Scholar 

  4. Barni, M., et al.: Aligned and non-aligned double jpeg detection using convolutional neural networks. J. Vis. Commun. Image Represent. 49, 153–163 (2017)

    Article  Google Scholar 

  5. Bhat, S.F., Alhashim, I., Wonka, P.: Adabins: depth estimation using adaptive bins. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Los Alamitos, CA, USA, pp. 4008–4017. IEEE Computer Society (2021). https://doi.org/10.1109/CVPR46437.2021.00400. https://doi.ieeecomputersociety.org/10.1109/CVPR46437.2021.00400

  6. Bonettini, N., Cannas, E.D., Mandelli, S., Bondi, L., Bestagini, P., Tubaro, S.: Video face manipulation detection through ensemble of CNNs. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 5012–5019. IEEE (2021)

    Google Scholar 

  7. Caldelli, R., Galteri, L., Amerini, I., Del Bimbo, A.: Optical flow based cnn for detection of unlearnt deepfake manipulations. Pattern Recogn. Lett. 146, 31–37 (2021). https://doi.org/10.1016/j.patrec.2021.03.005. https://www.sciencedirect.com/science/article/pii/S0167865521000842

  8. Chiu, M.J., Chiu, W.C., Chen, H.T., Chuang, J.H.: Real-time monocular depth estimation with extremely light-weight neural network. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 7050–7057 (2021). https://doi.org/10.1109/ICPR48806.2021.9411998

  9. Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  10. Cozzolino, D., Gragnaniello, D., Poggi, G., Verdoliva, L.: Towards universal gan image detection. In: 2021 International Conference on Visual Communications and Image Processing (VCIP), pp. 1–5 (2021). https://doi.org/10.1109/VCIP53242.2021.9675329

  11. Cozzolino, D., Nießner, M., Verdoliva, L.: Audio-visual person-of-interest deepfake detection (2022). https://doi.org/10.48550/ARXIV.2204.03083. https://arxiv.org/abs/2204.03083

  12. Cozzolino, D., Rössler, A., Thies, J., Nießner, M., Verdoliva, L.: ID-reveal: identity-aware deepfake video detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15108–15117 (2021)

    Google Scholar 

  13. Dang, H., Liu, F., Stehouwer, J., Liu, X., Jain, A.K.: On the detection of digital face manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  14. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)

    Google Scholar 

  15. Dolhansky, B., et al.: The deepfake detection challenge (DFDC) dataset. arXiv preprint arXiv:2006.07397 (2020)

  16. Du, M., Pentyala, S., Li, Y., Hu, X.: Towards generalizable deepfake detection with locality-aware autoencoder. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 325–334 (2020)

    Google Scholar 

  17. Gragnaniello, D., Cozzolino, D., Marra, F., Poggi, G., Verdoliva, L.: Are GAN generated images easy to detect? A critical analysis of the state-of-the-art. In: 2021 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2021)

    Google Scholar 

  18. Güera, D., Delp, E.J.: Deepfake video detection using recurrent neural networks. In: 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6. IEEE (2018)

    Google Scholar 

  19. He, L., Lu, J., Wang, G., Song, S., Zhou, J.: SOSD-Net: joint semantic object segmentation and depth estimation from monocular images. Neurocomputing 440, 251–263 (2021)

    Article  Google Scholar 

  20. Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications (2017). https://doi.org/10.48550/ARXIV.1704.04861. https://arxiv.org/abs/1704.04861

  21. Huh, M., Liu, A., Owens, A., Efros, A.A.: Fighting fake news: image splice detection via learned self-consistency. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 101–117 (2018)

    Google Scholar 

  22. Khan, F., Hussain, S., Basak, S., Lemley, J., Corcoran, P.: An efficient encoder-decoder model for portrait depth estimation from single images trained on pixel-accurate synthetic data. Neural Netw. 142(C), 479–491 (2021). https://doi.org/10.1016/j.neunet.2021.07.007

  23. Li, Y., Lyu, S.: Exposing deepfake videos by detecting face warping artifacts. arXiv preprint arXiv:1811.00656 (2018)

  24. Li, Z., Wang, X., Liu, X., Jiang, J.: Binsformer: revisiting adaptive bins for monocular depth estimation (2022). https://doi.org/10.48550/ARXIV.2204.00987. https://arxiv.org/abs/2204.00987

  25. Liy, C.M., InIctuOculi, L.: Exposing AI created fake videos by detecting eye blinking. In: 2018 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE (2018)

    Google Scholar 

  26. Lukas, J., Fridrich, J., Goljan, M.: Determining digital image origin using sensor imperfections. In: Image and Video Communications and Processing 2005, vol. 5685, pp. 249–260. International Society for Optics and Photonics (2005)

    Google Scholar 

  27. Masi, I., Killekar, A., Mascarenhas, R.M., Gurudatt, S.P., AbdAlmageed, W.: Two-branch recurrent network for isolating deepfakes in videos. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 667–684. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_39

    Chapter  Google Scholar 

  28. Mittal, T., Bhattacharya, U., Chandra, R., Bera, A., Manocha, D.: Emotions don’t lie: an audio-visual deepfake detection method using affective cues. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 2823–2832 (2020)

    Google Scholar 

  29. Papa, L., Alati, E., Russo, P., Amerini, I.: Speed: separable pyramidal pooling encoder-decoder for real-time monocular depth estimation on low-resource settings. IEEE Access 10, 44881–44890 (2022). https://doi.org/10.1109/ACCESS.2022.3170425

    Article  Google Scholar 

  30. Peluso, V., et al.: Monocular depth perception on microcontrollers for edge applications. IEEE Trans. Circuits Syst. Video Technol. 32(3), 1524–1536 (2022). https://doi.org/10.1109/TCSVT.2021.3077395

    Article  Google Scholar 

  31. Qi, H., et al.: Deeprhythm: exposing deepfakes with attentional visual heartbeat rhythms. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 4318–4327 (2020)

    Google Scholar 

  32. Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction (2021). https://doi.org/10.48550/ARXIV.2103.13413. https://arxiv.org/abs/2103.13413

  33. Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., Niessner, M.: Faceforensics++: learning to detect manipulated facial images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  34. Szegedy, C., et al.: Going deeper with convolutions. https://doi.org/10.48550/ARXIV.1409.4842. https://arxiv.org/abs/1409.4842

  35. Thies, J., Zollhöfer, M., Nießner, M.: Deferred neural rendering: image synthesis using neural textures. ACM Trans. Graph. 38(4), 1–12 (2019). https://doi.org/10.1145/3306346.3323035

    Article  Google Scholar 

  36. Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., Niessner, M.: Face2face: real-time face capture and reenactment of RGB videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  37. Véges, M., Lörincz, A.: Absolute human pose estimation with depth prediction network. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–7 (2019). https://doi.org/10.1109/IJCNN.2019.8852387

  38. Yang, X., Li, Y., Lyu, S.: Exposing deep fakes using inconsistent head poses. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8261–8265. IEEE (2019)

    Google Scholar 

  39. Yu, N., Davis, L., Fritz, M.: Attributing fake images to GANs: analyzing fingerprints in generated images. arXiv preprint arXiv:1811.08180 (2018)

  40. Yu, N., Davis, L.S., Fritz, M.: Attributing fake images to GANs: learning and analyzing GAN fingerprints. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7556–7566 (2019)

    Google Scholar 

  41. Zhao, H., Zhou, W., Chen, D., Wei, T., Zhang, W., Yu, N.: Multi-attentional deepfake detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2185–2194 (2021)

    Google Scholar 

  42. Zhou, Y., Lim, S.N.: Joint audio-visual deepfake detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14800–14809 (2021)

    Google Scholar 

  43. Zi, B., Chang, M., Chen, J., Ma, X., Jiang, Y.G.: Wilddeepfake: a challenging real-world dataset for deepfake detection. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 2382–2390 (2020)

    Google Scholar 

Download references

Acknowledgments

This work has been partially supported by the Sapienza University of Rome project RM12117A56C08D64 2022-2024.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Irene Amerini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Maiano, L., Papa, L., Vocaj, K., Amerini, I. (2023). DepthFake: A Depth-Based Strategy for Detecting Deepfake Videos. In: Rousseau, JJ., Kapralos, B. (eds) Pattern Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges. ICPR 2022. Lecture Notes in Computer Science, vol 13646. Springer, Cham. https://doi.org/10.1007/978-3-031-37745-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-37745-7_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-37744-0

  • Online ISBN: 978-3-031-37745-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics