Skip to main content
Log in

SRTNet: a spatial and residual based two-stream neural network for deepfakes detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

With the rapid development of Internet technology, the Internet is full of false information, and Deepfakes, as a kind of visual forgery content, brings the greatest impact to people. The existing mainstream Deepfakes public datasets often have millions of frames, and if the first N frames are used to train the model some key features may be lost. If all frames are used, the model is easily overfitted and training often takes several days, which greatly consumes computational resources. Therefore, we propose an adaptive video frame extraction algorithm to extract the required number of frames from all video frames. The algorithm is able to reduce data redundancy and increase feature richness. In addition, we design a two-stream Deepfakes detection network SRTNet by combining the image spatial domain and residual domain, which consists of spatial-stream and residual-stream. The spatial-stream uses the original RGB image as input to capture high-level tampering artifacts. Residual-stream uses three sets of high-pass filters to process the input image to obtain the image residuals to capture the tampering traces. Two-stream parallel training, and the features are concatenated to enable the model to capture tamper features from both spatial and residual domains to achieve better detection performance. The experimental results show that the proposed adaptive frame extraction algorithm can improve the model performance. And the proposed detection network SRTNet achieves better results than previous work on mainstream Deepfake dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Afchar D, Nozick V, Yamagishi J, Echizen I (2018) Mesonet: a compact facial video forgery detection network. In: 2018 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, pp 1–7

  2. Bayar B, Stamm M C (2016) A deep learning approach to universal image manipulation detection using a new convolutional layer. In: Proceedings of the 4th ACM workshop on information hiding and multimedia security, pp 5–10

  3. Chen H, Hu G, Lei Z, Chen Y, Robertson N M, Li S Z (2020) Attention-based two-stream convolutional networks for face spoofing detection. IEEE Trans Inf Forensics Secur 15:578–593. https://doi.org/10.1109/TIFS.2019.2922241

    Article  Google Scholar 

  4. Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258

  5. Cozzolino D, Poggi G, Verdoliva L (2015) Splicebuster: a new blind image splicing detector. In: 2015 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, pp 1–6

  6. Cozzolino D, Thies J, Rössler A, Riess C, Nießner M, Verdoliva L (2018) Forensictransfer: weakly-supervised domain adaptation for forgery detection. arXiv:1812.02510

  7. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255

  8. Deepfakes github. Website, https://github.com/deepfakes/faceswap. Accessed: 24 March 2022

  9. Fei J, Xia Z, Yu P, Xiao F (2020) Exposing ai-generated videos with motion magnification. Multimed Tools Applic, 1–14

  10. Fridrich J, Kodovsky J (2012) Rich models for steganalysis of digital images. IEEE Trans Inf Forensics Secur 7(3):868–882

    Article  Google Scholar 

  11. Gong C, Wang D, Li M, Chandra V, Liu Q (2021) Keepaugment: a simple information-preserving data augmentation approach. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1055–1064

  12. Guo Z, Yang G, Chen J, Sun X (2020) Fake face detection via adaptive manipulation traces extraction network. arXiv:2005.04945

  13. Hu J, Liao X, Wang W, Qin Z (2021) Detecting compressed deepfake videos in social networks using frame-temporality two-stream convolutional network. IEEE Trans Circuits Syst Video Technol

  14. Jin X, He Z, Wang Y, Yu J, Xu J (2021) Towards general object-based video forgery detection via dual-stream networks and depth information embedding. Multimed Tools Applic, 1–17

  15. Jin X, Su Y, Zou L, Zhang C, Jing P, Song X (2018) Video logo removal detection based on sparse representation. Multimed Tools Applic 77(22):29303–29322

    Article  Google Scholar 

  16. King D E (2009) Dlib-ml: a machine learning toolkit. J Mach Learn Res 10:1755–1758

    Google Scholar 

  17. Kingma D P, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980

  18. Kohli A, Gupta A (2021) Detecting deepfake, faceswap and face2face facial forgeries using frequency cnn. Multimed Tools Applic 80(12):18461–18478

    Article  Google Scholar 

  19. Li H, Huang J (2019) Localization of deep inpainting using high-pass fully convolutional network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

  20. Li H, Luo W, Qiu X, Huang J (2016) Identification of various image operations using residual-based features. IEEE Trans Circuits Syst Video Technol 28(1):31–45

    Article  Google Scholar 

  21. Li H, Luo W, Qiu X, Huang J (2018) Identification of various image operations using residual-based features. IEEE Trans Circuits Syst Video Technol 28 (1):31–45. https://doi.org/10.1109/TCSVT.2016.2599849https://doi.org/10.1109/TCSVT.2016.2599849

    Article  Google Scholar 

  22. Li Y, Chang MC, Farid H, Lyu S (2018) In ictu oculi: exposing ai generated fake face videos by detecting eye blinking. arXiv:1806.02877

  23. Li Y, Lyu S (2019) Exposing deepfake videos by detecting face warping artifacts. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

  24. Li Y, Yang X, Sun P, Qi H, Lyu S (2020) Celeb-df: a large-scale challenging dataset for deepfake forensics. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3207–3216

  25. Loshchilov I, Hutter F (2016) Sgdr: stochastic gradient descent with warm restarts. arXiv:1608.03983

  26. Masi I, Killekar A, Mascarenhas R M, Gurudatt S P, AbdAlmageed W (2020) Two-branch recurrent network for isolating deepfakes in videos. In: European conference on computer vision. Springer, pp 667–684

  27. Matern F, Riess C, Stamminger M (2019) Exploiting visual artifacts to expose deepfakes and face manipulations. In: 2019 IEEE Winter Applications of Computer Vision Workshops (WACVW), pp 83–92

  28. Mo H, Chen B, Luo W (2018) Fake faces identification via convolutional neural network. In: Proceedings of the 6th ACM workshop on information hiding and multimedia security, pp 43–47

  29. Nguyen H H, Fang F, Yamagishi J, Echizen I (2019) Multi-task learning for detecting and segmenting manipulated facial images and videos. arXiv:1906.06876

  30. Nguyen H H, Yamagishi J, Echizen I (2019) Capsule-forensics: using capsule networks to detect forged images and videos. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 2307–2311

  31. Pan X, Zhang X, Lyu S (2012) Exposing image splicing with inconsistent local noise variances. In: 2012 IEEE International Conference on Computational Photography (ICCP). IEEE, pp 1–10

  32. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: an imperative style, high-performance deep learning library. Adv Neur Inform Process Syst 32:8026–8037

    Google Scholar 

  33. Qian Y, Yin G, Sheng L, Chen Z, Shao J (2020) Thinking in frequency: face forgery detection by mining frequency-aware clues. In: European conference on computer vision. Springer, pp 86–103

  34. Reinsel D, Gantz J, Rydning J (2017) Data age 2025: the evolution of data to life-critical. Don’t Focus on Big Data, 2

  35. Rossler A, Cozzolino D, Verdoliva L, Riess C, Thies J, Nießner M (2019) Faceforensics++: learning to detect manipulated facial images. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1–11

  36. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  37. Tariq S, Lee S, Kim H, Shin Y, Woo S S (2018) Detecting both machine and human created fake face images in the wild. In: Proceedings of the 2nd international workshop on multimedia privacy and security, pp 81–87

  38. Tariq S, Lee S, Woo S S (2020) A convolutional lstm based residual network for deepfake video detection. arXiv:2009.07480

  39. Thies J, Zollhöfer M, Nießner M (2019) Deferred neural rendering: image synthesis using neural textures. ACM Transactions on Graphics (TOG) 38(4):1–12

    Article  Google Scholar 

  40. Thies J, Zollhofer M, Stamminger M, Theobalt C, Nießner M (2016) Face2face: real-time face capture and reenactment of rgb videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2387–2395

  41. Tolosana R, Vera-Rodriguez R, Fierrez J, Morales A, Ortega-Garcia J (2020) Deepfakes and beyond: a survey of face manipulation and fake detection. Inform Fus 64:131–148

    Article  Google Scholar 

  42. Trinh L, Tsang M, Rambhatla S, Liu Y (2021) Interpretable and trustworthy deepfake detection via dynamic prototypes. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1973–1983

  43. Wu X, Xie Z, Gao Y, Xiao Y (2020) Sstnet: detecting manipulated faces through spatial, steganalysis and temporal features. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 2952–2956

  44. Xingjian SHI, Chen Z, Wang H, Yeung D-Y, Wong W-K, Woo W- (2015) Convolutional lstm network: a machine learning approach for precipitation nowcasting. In: Advances in neural information processing systems, pp 802–810

  45. Yang X, Li Y, Lyu S (2019) Exposing deep fakes using inconsistent head poses. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 8261–8265

Download references

Acknowledgements

This project is supported in part by the National Natural Science Foundation of China under grant 62172059 and 62072055, Hunan Provincial Natural Science Foundations of China under Grant 2020JJ4626 and 2020JJ4029, Scientific Research Fund of Hunan Provincial Education Department of China under Grant 19B004 and 16B004, the Opening Project of State Key Laboratory of Information Security under Grant 2021-ZD-07, the Opening Project of Guangdong Provincial Key Laboratory of Information Security Technology under Grant 2020B1212060078, Postgraduate Scientific Research Innovation Project of Changsha University of Science and Technology under Grant CX2021SS76, “Double First-class” International Cooperation and Development Scientific Research Project of Changsha University of Science and Technology under Grant 2018IC25.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiangling Ding.

Ethics declarations

Conflict of Interests

AllThe authors have no financial or proprietary interests in any material discussed in this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, D., Zhu, W., Ding, X. et al. SRTNet: a spatial and residual based two-stream neural network for deepfakes detection. Multimed Tools Appl 82, 14859–14877 (2023). https://doi.org/10.1007/s11042-022-13966-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13966-x

Keywords

Navigation