Abstract
Fake content has grown at an incredible rate over the past few years. The spread of social media and online platforms makes their dissemination on a large scale increasingly accessible by malicious actors. In parallel, due to the growing diffusion of fake image generation methods, many Deep Learning-based detection techniques have been proposed. Most of those methods rely on extracting salient features from RGB images to detect through a binary classifier if the image is fake or real.
In this paper, we proposed DepthFake, a study on how to improve classical RGB-based approaches with depth-maps. The depth information is extracted from RGB images with recent monocular depth estimation techniques. Here, we demonstrate the effective contribution of depth-maps to the deepfake detection task on robust pre-trained architectures. The proposed RGBD approach is in fact able to achieve an average improvement of \(3.20\%\) and up to \(11.7\%\) for some deepfake attacks with respect to standard RGB architectures over the FaceForensic++ dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Afchar, D., Nozick, V., Yamagishi, J., Echizen, I.: Mesonet: a compact facial video forgery detection network. In: 2018 IEEE International Workshop on Information Forensics and Security (WIFS), pp. 1–7. IEEE (2018)
Agarwal, S., Farid, H., Fried, O., Agrawala, M.: Detecting deep-fake videos from phoneme-viseme mismatches. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 660–661 (2020)
Agarwal, S., Farid, H., Gu, Y., He, M., Nagano, K., Li, H.: Protecting world leaders against deep fakes. In: CVPR Workshops, vol. 1 (2019)
Barni, M., et al.: Aligned and non-aligned double jpeg detection using convolutional neural networks. J. Vis. Commun. Image Represent. 49, 153–163 (2017)
Bhat, S.F., Alhashim, I., Wonka, P.: Adabins: depth estimation using adaptive bins. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Los Alamitos, CA, USA, pp. 4008–4017. IEEE Computer Society (2021). https://doi.org/10.1109/CVPR46437.2021.00400. https://doi.ieeecomputersociety.org/10.1109/CVPR46437.2021.00400
Bonettini, N., Cannas, E.D., Mandelli, S., Bondi, L., Bestagini, P., Tubaro, S.: Video face manipulation detection through ensemble of CNNs. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 5012–5019. IEEE (2021)
Caldelli, R., Galteri, L., Amerini, I., Del Bimbo, A.: Optical flow based cnn for detection of unlearnt deepfake manipulations. Pattern Recogn. Lett. 146, 31–37 (2021). https://doi.org/10.1016/j.patrec.2021.03.005. https://www.sciencedirect.com/science/article/pii/S0167865521000842
Chiu, M.J., Chiu, W.C., Chen, H.T., Chuang, J.H.: Real-time monocular depth estimation with extremely light-weight neural network. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 7050–7057 (2021). https://doi.org/10.1109/ICPR48806.2021.9411998
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Cozzolino, D., Gragnaniello, D., Poggi, G., Verdoliva, L.: Towards universal gan image detection. In: 2021 International Conference on Visual Communications and Image Processing (VCIP), pp. 1–5 (2021). https://doi.org/10.1109/VCIP53242.2021.9675329
Cozzolino, D., Nießner, M., Verdoliva, L.: Audio-visual person-of-interest deepfake detection (2022). https://doi.org/10.48550/ARXIV.2204.03083. https://arxiv.org/abs/2204.03083
Cozzolino, D., Rössler, A., Thies, J., Nießner, M., Verdoliva, L.: ID-reveal: identity-aware deepfake video detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15108–15117 (2021)
Dang, H., Liu, F., Stehouwer, J., Liu, X., Jain, A.K.: On the detection of digital face manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Dolhansky, B., et al.: The deepfake detection challenge (DFDC) dataset. arXiv preprint arXiv:2006.07397 (2020)
Du, M., Pentyala, S., Li, Y., Hu, X.: Towards generalizable deepfake detection with locality-aware autoencoder. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 325–334 (2020)
Gragnaniello, D., Cozzolino, D., Marra, F., Poggi, G., Verdoliva, L.: Are GAN generated images easy to detect? A critical analysis of the state-of-the-art. In: 2021 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2021)
Güera, D., Delp, E.J.: Deepfake video detection using recurrent neural networks. In: 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6. IEEE (2018)
He, L., Lu, J., Wang, G., Song, S., Zhou, J.: SOSD-Net: joint semantic object segmentation and depth estimation from monocular images. Neurocomputing 440, 251–263 (2021)
Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications (2017). https://doi.org/10.48550/ARXIV.1704.04861. https://arxiv.org/abs/1704.04861
Huh, M., Liu, A., Owens, A., Efros, A.A.: Fighting fake news: image splice detection via learned self-consistency. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 101–117 (2018)
Khan, F., Hussain, S., Basak, S., Lemley, J., Corcoran, P.: An efficient encoder-decoder model for portrait depth estimation from single images trained on pixel-accurate synthetic data. Neural Netw. 142(C), 479–491 (2021). https://doi.org/10.1016/j.neunet.2021.07.007
Li, Y., Lyu, S.: Exposing deepfake videos by detecting face warping artifacts. arXiv preprint arXiv:1811.00656 (2018)
Li, Z., Wang, X., Liu, X., Jiang, J.: Binsformer: revisiting adaptive bins for monocular depth estimation (2022). https://doi.org/10.48550/ARXIV.2204.00987. https://arxiv.org/abs/2204.00987
Liy, C.M., InIctuOculi, L.: Exposing AI created fake videos by detecting eye blinking. In: 2018 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE (2018)
Lukas, J., Fridrich, J., Goljan, M.: Determining digital image origin using sensor imperfections. In: Image and Video Communications and Processing 2005, vol. 5685, pp. 249–260. International Society for Optics and Photonics (2005)
Masi, I., Killekar, A., Mascarenhas, R.M., Gurudatt, S.P., AbdAlmageed, W.: Two-branch recurrent network for isolating deepfakes in videos. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 667–684. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_39
Mittal, T., Bhattacharya, U., Chandra, R., Bera, A., Manocha, D.: Emotions don’t lie: an audio-visual deepfake detection method using affective cues. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 2823–2832 (2020)
Papa, L., Alati, E., Russo, P., Amerini, I.: Speed: separable pyramidal pooling encoder-decoder for real-time monocular depth estimation on low-resource settings. IEEE Access 10, 44881–44890 (2022). https://doi.org/10.1109/ACCESS.2022.3170425
Peluso, V., et al.: Monocular depth perception on microcontrollers for edge applications. IEEE Trans. Circuits Syst. Video Technol. 32(3), 1524–1536 (2022). https://doi.org/10.1109/TCSVT.2021.3077395
Qi, H., et al.: Deeprhythm: exposing deepfakes with attentional visual heartbeat rhythms. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 4318–4327 (2020)
Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction (2021). https://doi.org/10.48550/ARXIV.2103.13413. https://arxiv.org/abs/2103.13413
Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., Niessner, M.: Faceforensics++: learning to detect manipulated facial images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Szegedy, C., et al.: Going deeper with convolutions. https://doi.org/10.48550/ARXIV.1409.4842. https://arxiv.org/abs/1409.4842
Thies, J., Zollhöfer, M., Nießner, M.: Deferred neural rendering: image synthesis using neural textures. ACM Trans. Graph. 38(4), 1–12 (2019). https://doi.org/10.1145/3306346.3323035
Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., Niessner, M.: Face2face: real-time face capture and reenactment of RGB videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Véges, M., Lörincz, A.: Absolute human pose estimation with depth prediction network. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–7 (2019). https://doi.org/10.1109/IJCNN.2019.8852387
Yang, X., Li, Y., Lyu, S.: Exposing deep fakes using inconsistent head poses. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8261–8265. IEEE (2019)
Yu, N., Davis, L., Fritz, M.: Attributing fake images to GANs: analyzing fingerprints in generated images. arXiv preprint arXiv:1811.08180 (2018)
Yu, N., Davis, L.S., Fritz, M.: Attributing fake images to GANs: learning and analyzing GAN fingerprints. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7556–7566 (2019)
Zhao, H., Zhou, W., Chen, D., Wei, T., Zhang, W., Yu, N.: Multi-attentional deepfake detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2185–2194 (2021)
Zhou, Y., Lim, S.N.: Joint audio-visual deepfake detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14800–14809 (2021)
Zi, B., Chang, M., Chen, J., Ma, X., Jiang, Y.G.: Wilddeepfake: a challenging real-world dataset for deepfake detection. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 2382–2390 (2020)
Acknowledgments
This work has been partially supported by the Sapienza University of Rome project RM12117A56C08D64 2022-2024.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Maiano, L., Papa, L., Vocaj, K., Amerini, I. (2023). DepthFake: A Depth-Based Strategy for Detecting Deepfake Videos. In: Rousseau, JJ., Kapralos, B. (eds) Pattern Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges. ICPR 2022. Lecture Notes in Computer Science, vol 13646. Springer, Cham. https://doi.org/10.1007/978-3-031-37745-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-37745-7_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-37744-0
Online ISBN: 978-3-031-37745-7
eBook Packages: Computer ScienceComputer Science (R0)