SRTNet: a spatial and residual based two-stream neural network for deepfakes detection

Zhang, Dengyong; Zhu, Wenjie; Ding, Xiangling; Yang, Gaobo; Li, Feng; Deng, Zelin; Song, Yun

doi:10.1007/s11042-022-13966-x

SRTNet: a spatial and residual based two-stream neural network for deepfakes detection

Published: 10 October 2022

Volume 82, pages 14859–14877, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Dengyong Zhang^1,2,
Wenjie Zhu^1,2,
Xiangling Ding ORCID: orcid.org/0000-0002-6581-4633^3,4,5,
Gaobo Yang⁶,
Feng Li^1,2,
Zelin Deng^1,2 &
…
Yun Song^1,2

475 Accesses
4 Citations
Explore all metrics

Abstract

With the rapid development of Internet technology, the Internet is full of false information, and Deepfakes, as a kind of visual forgery content, brings the greatest impact to people. The existing mainstream Deepfakes public datasets often have millions of frames, and if the first N frames are used to train the model some key features may be lost. If all frames are used, the model is easily overfitted and training often takes several days, which greatly consumes computational resources. Therefore, we propose an adaptive video frame extraction algorithm to extract the required number of frames from all video frames. The algorithm is able to reduce data redundancy and increase feature richness. In addition, we design a two-stream Deepfakes detection network SRTNet by combining the image spatial domain and residual domain, which consists of spatial-stream and residual-stream. The spatial-stream uses the original RGB image as input to capture high-level tampering artifacts. Residual-stream uses three sets of high-pass filters to process the input image to obtain the image residuals to capture the tampering traces. Two-stream parallel training, and the features are concatenated to enable the model to capture tamper features from both spatial and residual domains to achieve better detection performance. The experimental results show that the proposed adaptive frame extraction algorithm can improve the model performance. And the proposed detection network SRTNet achieves better results than previous work on mainstream Deepfake dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Methods for image denoising using convolutional neural network: a review

Article Open access 10 June 2021

Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward

Article 04 June 2022

Deepfake: An Overview

References

Afchar D, Nozick V, Yamagishi J, Echizen I (2018) Mesonet: a compact facial video forgery detection network. In: 2018 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, pp 1–7
Bayar B, Stamm M C (2016) A deep learning approach to universal image manipulation detection using a new convolutional layer. In: Proceedings of the 4th ACM workshop on information hiding and multimedia security, pp 5–10
Chen H, Hu G, Lei Z, Chen Y, Robertson N M, Li S Z (2020) Attention-based two-stream convolutional networks for face spoofing detection. IEEE Trans Inf Forensics Secur 15:578–593. https://doi.org/10.1109/TIFS.2019.2922241
Article Google Scholar
Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258
Cozzolino D, Poggi G, Verdoliva L (2015) Splicebuster: a new blind image splicing detector. In: 2015 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, pp 1–6
Cozzolino D, Thies J, Rössler A, Riess C, Nießner M, Verdoliva L (2018) Forensictransfer: weakly-supervised domain adaptation for forgery detection. arXiv:1812.02510
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255
Deepfakes github. Website, https://github.com/deepfakes/faceswap. Accessed: 24 March 2022
Fei J, Xia Z, Yu P, Xiao F (2020) Exposing ai-generated videos with motion magnification. Multimed Tools Applic, 1–14
Fridrich J, Kodovsky J (2012) Rich models for steganalysis of digital images. IEEE Trans Inf Forensics Secur 7(3):868–882
Article Google Scholar
Gong C, Wang D, Li M, Chandra V, Liu Q (2021) Keepaugment: a simple information-preserving data augmentation approach. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1055–1064
Guo Z, Yang G, Chen J, Sun X (2020) Fake face detection via adaptive manipulation traces extraction network. arXiv:2005.04945
Hu J, Liao X, Wang W, Qin Z (2021) Detecting compressed deepfake videos in social networks using frame-temporality two-stream convolutional network. IEEE Trans Circuits Syst Video Technol
Jin X, He Z, Wang Y, Yu J, Xu J (2021) Towards general object-based video forgery detection via dual-stream networks and depth information embedding. Multimed Tools Applic, 1–17
Jin X, Su Y, Zou L, Zhang C, Jing P, Song X (2018) Video logo removal detection based on sparse representation. Multimed Tools Applic 77(22):29303–29322
Article Google Scholar
King D E (2009) Dlib-ml: a machine learning toolkit. J Mach Learn Res 10:1755–1758
Google Scholar
Kingma D P, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Kohli A, Gupta A (2021) Detecting deepfake, faceswap and face2face facial forgeries using frequency cnn. Multimed Tools Applic 80(12):18461–18478
Article Google Scholar
Li H, Huang J (2019) Localization of deep inpainting using high-pass fully convolutional network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Li H, Luo W, Qiu X, Huang J (2016) Identification of various image operations using residual-based features. IEEE Trans Circuits Syst Video Technol 28(1):31–45
Article Google Scholar
Li H, Luo W, Qiu X, Huang J (2018) Identification of various image operations using residual-based features. IEEE Trans Circuits Syst Video Technol 28 (1):31–45. https://doi.org/10.1109/TCSVT.2016.2599849 https://doi.org/10.1109/TCSVT.2016.2599849
Article Google Scholar
Li Y, Chang MC, Farid H, Lyu S (2018) In ictu oculi: exposing ai generated fake face videos by detecting eye blinking. arXiv:1806.02877
Li Y, Lyu S (2019) Exposing deepfake videos by detecting face warping artifacts. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Li Y, Yang X, Sun P, Qi H, Lyu S (2020) Celeb-df: a large-scale challenging dataset for deepfake forensics. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3207–3216
Loshchilov I, Hutter F (2016) Sgdr: stochastic gradient descent with warm restarts. arXiv:1608.03983
Masi I, Killekar A, Mascarenhas R M, Gurudatt S P, AbdAlmageed W (2020) Two-branch recurrent network for isolating deepfakes in videos. In: European conference on computer vision. Springer, pp 667–684
Matern F, Riess C, Stamminger M (2019) Exploiting visual artifacts to expose deepfakes and face manipulations. In: 2019 IEEE Winter Applications of Computer Vision Workshops (WACVW), pp 83–92
Mo H, Chen B, Luo W (2018) Fake faces identification via convolutional neural network. In: Proceedings of the 6th ACM workshop on information hiding and multimedia security, pp 43–47
Nguyen H H, Fang F, Yamagishi J, Echizen I (2019) Multi-task learning for detecting and segmenting manipulated facial images and videos. arXiv:1906.06876
Nguyen H H, Yamagishi J, Echizen I (2019) Capsule-forensics: using capsule networks to detect forged images and videos. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 2307–2311
Pan X, Zhang X, Lyu S (2012) Exposing image splicing with inconsistent local noise variances. In: 2012 IEEE International Conference on Computational Photography (ICCP). IEEE, pp 1–10
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: an imperative style, high-performance deep learning library. Adv Neur Inform Process Syst 32:8026–8037
Google Scholar
Qian Y, Yin G, Sheng L, Chen Z, Shao J (2020) Thinking in frequency: face forgery detection by mining frequency-aware clues. In: European conference on computer vision. Springer, pp 86–103
Reinsel D, Gantz J, Rydning J (2017) Data age 2025: the evolution of data to life-critical. Don’t Focus on Big Data, 2
Rossler A, Cozzolino D, Verdoliva L, Riess C, Thies J, Nießner M (2019) Faceforensics++: learning to detect manipulated facial images. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1–11
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Tariq S, Lee S, Kim H, Shin Y, Woo S S (2018) Detecting both machine and human created fake face images in the wild. In: Proceedings of the 2nd international workshop on multimedia privacy and security, pp 81–87
Tariq S, Lee S, Woo S S (2020) A convolutional lstm based residual network for deepfake video detection. arXiv:2009.07480
Thies J, Zollhöfer M, Nießner M (2019) Deferred neural rendering: image synthesis using neural textures. ACM Transactions on Graphics (TOG) 38(4):1–12
Article Google Scholar
Thies J, Zollhofer M, Stamminger M, Theobalt C, Nießner M (2016) Face2face: real-time face capture and reenactment of rgb videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2387–2395
Tolosana R, Vera-Rodriguez R, Fierrez J, Morales A, Ortega-Garcia J (2020) Deepfakes and beyond: a survey of face manipulation and fake detection. Inform Fus 64:131–148
Article Google Scholar
Trinh L, Tsang M, Rambhatla S, Liu Y (2021) Interpretable and trustworthy deepfake detection via dynamic prototypes. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1973–1983
Wu X, Xie Z, Gao Y, Xiao Y (2020) Sstnet: detecting manipulated faces through spatial, steganalysis and temporal features. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 2952–2956
Xingjian SHI, Chen Z, Wang H, Yeung D-Y, Wong W-K, Woo W- (2015) Convolutional lstm network: a machine learning approach for precipitation nowcasting. In: Advances in neural information processing systems, pp 802–810
Yang X, Li Y, Lyu S (2019) Exposing deep fakes using inconsistent head poses. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 8261–8265

Download references

Acknowledgements

This project is supported in part by the National Natural Science Foundation of China under grant 62172059 and 62072055, Hunan Provincial Natural Science Foundations of China under Grant 2020JJ4626 and 2020JJ4029, Scientific Research Fund of Hunan Provincial Education Department of China under Grant 19B004 and 16B004, the Opening Project of State Key Laboratory of Information Security under Grant 2021-ZD-07, the Opening Project of Guangdong Provincial Key Laboratory of Information Security Technology under Grant 2020B1212060078, Postgraduate Scientific Research Innovation Project of Changsha University of Science and Technology under Grant CX2021SS76, “Double First-class” International Cooperation and Development Scientific Research Project of Changsha University of Science and Technology under Grant 2018IC25.

Author information

Authors and Affiliations

Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha, 410004, Hunan, China
Dengyong Zhang, Wenjie Zhu, Feng Li, Zelin Deng & Yun Song
School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha, 410004, Hunan, China
Dengyong Zhang, Wenjie Zhu, Feng Li, Zelin Deng & Yun Song
School of Computer and Communication Engineering, Hunan University of Science and Technology, Xiangtan, 411201, Hunan, China
Xiangling Ding
State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing, 100093, Beijing, China
Xiangling Ding
Guangdong Provincial Key Laboratory of Information Security Technology, Guangzhou, 51000, Guangdong, China
Xiangling Ding
School of Information Science and Engineering, Hunan University, Changsha, 410082, Hunan, China
Gaobo Yang

Authors

Dengyong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wenjie Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Xiangling Ding
View author publications
You can also search for this author in PubMed Google Scholar
Gaobo Yang
View author publications
You can also search for this author in PubMed Google Scholar
Feng Li
View author publications
You can also search for this author in PubMed Google Scholar
Zelin Deng
View author publications
You can also search for this author in PubMed Google Scholar
Yun Song
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiangling Ding.

Ethics declarations

Conflict of Interests

AllThe authors have no financial or proprietary interests in any material discussed in this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, D., Zhu, W., Ding, X. et al. SRTNet: a spatial and residual based two-stream neural network for deepfakes detection. Multimed Tools Appl 82, 14859–14877 (2023). https://doi.org/10.1007/s11042-022-13966-x

Download citation

Received: 15 September 2021
Revised: 10 January 2022
Accepted: 13 September 2022
Published: 10 October 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s11042-022-13966-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SRTNet: a spatial and residual based two-stream neural network for deepfakes detection

Abstract

Access this article

Similar content being viewed by others

Methods for image denoising using convolutional neural network: a review

Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward

Deepfake: An Overview

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SRTNet: a spatial and residual based two-stream neural network for deepfakes detection

Abstract

Access this article

Similar content being viewed by others

Methods for image denoising using convolutional neural network: a review

Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward

Deepfake: An Overview

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation