Abstract
With the rapid development of Internet technology, the Internet is full of false information, and Deepfakes, as a kind of visual forgery content, brings the greatest impact to people. The existing mainstream Deepfakes public datasets often have millions of frames, and if the first N frames are used to train the model some key features may be lost. If all frames are used, the model is easily overfitted and training often takes several days, which greatly consumes computational resources. Therefore, we propose an adaptive video frame extraction algorithm to extract the required number of frames from all video frames. The algorithm is able to reduce data redundancy and increase feature richness. In addition, we design a two-stream Deepfakes detection network SRTNet by combining the image spatial domain and residual domain, which consists of spatial-stream and residual-stream. The spatial-stream uses the original RGB image as input to capture high-level tampering artifacts. Residual-stream uses three sets of high-pass filters to process the input image to obtain the image residuals to capture the tampering traces. Two-stream parallel training, and the features are concatenated to enable the model to capture tamper features from both spatial and residual domains to achieve better detection performance. The experimental results show that the proposed adaptive frame extraction algorithm can improve the model performance. And the proposed detection network SRTNet achieves better results than previous work on mainstream Deepfake dataset.
Similar content being viewed by others
References
Afchar D, Nozick V, Yamagishi J, Echizen I (2018) Mesonet: a compact facial video forgery detection network. In: 2018 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, pp 1–7
Bayar B, Stamm M C (2016) A deep learning approach to universal image manipulation detection using a new convolutional layer. In: Proceedings of the 4th ACM workshop on information hiding and multimedia security, pp 5–10
Chen H, Hu G, Lei Z, Chen Y, Robertson N M, Li S Z (2020) Attention-based two-stream convolutional networks for face spoofing detection. IEEE Trans Inf Forensics Secur 15:578–593. https://doi.org/10.1109/TIFS.2019.2922241
Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258
Cozzolino D, Poggi G, Verdoliva L (2015) Splicebuster: a new blind image splicing detector. In: 2015 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, pp 1–6
Cozzolino D, Thies J, Rössler A, Riess C, Nießner M, Verdoliva L (2018) Forensictransfer: weakly-supervised domain adaptation for forgery detection. arXiv:1812.02510
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255
Deepfakes github. Website, https://github.com/deepfakes/faceswap. Accessed: 24 March 2022
Fei J, Xia Z, Yu P, Xiao F (2020) Exposing ai-generated videos with motion magnification. Multimed Tools Applic, 1–14
Fridrich J, Kodovsky J (2012) Rich models for steganalysis of digital images. IEEE Trans Inf Forensics Secur 7(3):868–882
Gong C, Wang D, Li M, Chandra V, Liu Q (2021) Keepaugment: a simple information-preserving data augmentation approach. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1055–1064
Guo Z, Yang G, Chen J, Sun X (2020) Fake face detection via adaptive manipulation traces extraction network. arXiv:2005.04945
Hu J, Liao X, Wang W, Qin Z (2021) Detecting compressed deepfake videos in social networks using frame-temporality two-stream convolutional network. IEEE Trans Circuits Syst Video Technol
Jin X, He Z, Wang Y, Yu J, Xu J (2021) Towards general object-based video forgery detection via dual-stream networks and depth information embedding. Multimed Tools Applic, 1–17
Jin X, Su Y, Zou L, Zhang C, Jing P, Song X (2018) Video logo removal detection based on sparse representation. Multimed Tools Applic 77(22):29303–29322
King D E (2009) Dlib-ml: a machine learning toolkit. J Mach Learn Res 10:1755–1758
Kingma D P, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Kohli A, Gupta A (2021) Detecting deepfake, faceswap and face2face facial forgeries using frequency cnn. Multimed Tools Applic 80(12):18461–18478
Li H, Huang J (2019) Localization of deep inpainting using high-pass fully convolutional network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Li H, Luo W, Qiu X, Huang J (2016) Identification of various image operations using residual-based features. IEEE Trans Circuits Syst Video Technol 28(1):31–45
Li H, Luo W, Qiu X, Huang J (2018) Identification of various image operations using residual-based features. IEEE Trans Circuits Syst Video Technol 28 (1):31–45. https://doi.org/10.1109/TCSVT.2016.2599849https://doi.org/10.1109/TCSVT.2016.2599849
Li Y, Chang MC, Farid H, Lyu S (2018) In ictu oculi: exposing ai generated fake face videos by detecting eye blinking. arXiv:1806.02877
Li Y, Lyu S (2019) Exposing deepfake videos by detecting face warping artifacts. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Li Y, Yang X, Sun P, Qi H, Lyu S (2020) Celeb-df: a large-scale challenging dataset for deepfake forensics. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3207–3216
Loshchilov I, Hutter F (2016) Sgdr: stochastic gradient descent with warm restarts. arXiv:1608.03983
Masi I, Killekar A, Mascarenhas R M, Gurudatt S P, AbdAlmageed W (2020) Two-branch recurrent network for isolating deepfakes in videos. In: European conference on computer vision. Springer, pp 667–684
Matern F, Riess C, Stamminger M (2019) Exploiting visual artifacts to expose deepfakes and face manipulations. In: 2019 IEEE Winter Applications of Computer Vision Workshops (WACVW), pp 83–92
Mo H, Chen B, Luo W (2018) Fake faces identification via convolutional neural network. In: Proceedings of the 6th ACM workshop on information hiding and multimedia security, pp 43–47
Nguyen H H, Fang F, Yamagishi J, Echizen I (2019) Multi-task learning for detecting and segmenting manipulated facial images and videos. arXiv:1906.06876
Nguyen H H, Yamagishi J, Echizen I (2019) Capsule-forensics: using capsule networks to detect forged images and videos. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 2307–2311
Pan X, Zhang X, Lyu S (2012) Exposing image splicing with inconsistent local noise variances. In: 2012 IEEE International Conference on Computational Photography (ICCP). IEEE, pp 1–10
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: an imperative style, high-performance deep learning library. Adv Neur Inform Process Syst 32:8026–8037
Qian Y, Yin G, Sheng L, Chen Z, Shao J (2020) Thinking in frequency: face forgery detection by mining frequency-aware clues. In: European conference on computer vision. Springer, pp 86–103
Reinsel D, Gantz J, Rydning J (2017) Data age 2025: the evolution of data to life-critical. Don’t Focus on Big Data, 2
Rossler A, Cozzolino D, Verdoliva L, Riess C, Thies J, Nießner M (2019) Faceforensics++: learning to detect manipulated facial images. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1–11
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Tariq S, Lee S, Kim H, Shin Y, Woo S S (2018) Detecting both machine and human created fake face images in the wild. In: Proceedings of the 2nd international workshop on multimedia privacy and security, pp 81–87
Tariq S, Lee S, Woo S S (2020) A convolutional lstm based residual network for deepfake video detection. arXiv:2009.07480
Thies J, Zollhöfer M, Nießner M (2019) Deferred neural rendering: image synthesis using neural textures. ACM Transactions on Graphics (TOG) 38(4):1–12
Thies J, Zollhofer M, Stamminger M, Theobalt C, Nießner M (2016) Face2face: real-time face capture and reenactment of rgb videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2387–2395
Tolosana R, Vera-Rodriguez R, Fierrez J, Morales A, Ortega-Garcia J (2020) Deepfakes and beyond: a survey of face manipulation and fake detection. Inform Fus 64:131–148
Trinh L, Tsang M, Rambhatla S, Liu Y (2021) Interpretable and trustworthy deepfake detection via dynamic prototypes. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1973–1983
Wu X, Xie Z, Gao Y, Xiao Y (2020) Sstnet: detecting manipulated faces through spatial, steganalysis and temporal features. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 2952–2956
Xingjian SHI, Chen Z, Wang H, Yeung D-Y, Wong W-K, Woo W- (2015) Convolutional lstm network: a machine learning approach for precipitation nowcasting. In: Advances in neural information processing systems, pp 802–810
Yang X, Li Y, Lyu S (2019) Exposing deep fakes using inconsistent head poses. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 8261–8265
Acknowledgements
This project is supported in part by the National Natural Science Foundation of China under grant 62172059 and 62072055, Hunan Provincial Natural Science Foundations of China under Grant 2020JJ4626 and 2020JJ4029, Scientific Research Fund of Hunan Provincial Education Department of China under Grant 19B004 and 16B004, the Opening Project of State Key Laboratory of Information Security under Grant 2021-ZD-07, the Opening Project of Guangdong Provincial Key Laboratory of Information Security Technology under Grant 2020B1212060078, Postgraduate Scientific Research Innovation Project of Changsha University of Science and Technology under Grant CX2021SS76, “Double First-class” International Cooperation and Development Scientific Research Project of Changsha University of Science and Technology under Grant 2018IC25.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
AllThe authors have no financial or proprietary interests in any material discussed in this article.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, D., Zhu, W., Ding, X. et al. SRTNet: a spatial and residual based two-stream neural network for deepfakes detection. Multimed Tools Appl 82, 14859–14877 (2023). https://doi.org/10.1007/s11042-022-13966-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13966-x