Abstract
Recent facial manipulation techniques based on deep learning can create a highly realistic face by changing expression, attributes, identity, or creating an entire face synthesis, that called recently Deep-Fake. With the rapid appearance of such applications, they have raised great security concerns. Therefore, corresponding forensic techniques are proposed to tackle this issue. However, existing techniques are either based on complex deep networks with a binary classification that are unable to distinguish between facial manipulation types or rely on fragile hand-crafted features with unsatisfactory results. To overcome these issues, we propose a learning-based detection method by creating an uncomplicated CNN network called FMD-Net relying on the dynamic textures as input. Moreover, it is able to distinguish between facial manipulation types such as Deepfake, Face2Face, FaceSwap, and NeuralTexture. By using dynamic textures of each video shot, motion and appearance features are combined which helped the network learn manipulation artifacts and provides a robust performance at various compression levels. We conduct extensive experiments on various benchmark datasets (FaceForensics++, DFDC, and Celeb-DF) to empirically demonstrate the superiority and effectiveness of the proposed method with both binary classification and multi-classification against the state-of-the-art methods.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13102-9/MediaObjects/11042_2022_13102_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13102-9/MediaObjects/11042_2022_13102_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13102-9/MediaObjects/11042_2022_13102_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13102-9/MediaObjects/11042_2022_13102_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13102-9/MediaObjects/11042_2022_13102_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13102-9/MediaObjects/11042_2022_13102_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13102-9/MediaObjects/11042_2022_13102_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13102-9/MediaObjects/11042_2022_13102_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13102-9/MediaObjects/11042_2022_13102_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13102-9/MediaObjects/11042_2022_13102_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13102-9/MediaObjects/11042_2022_13102_Fig11_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13102-9/MediaObjects/11042_2022_13102_Fig12_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13102-9/MediaObjects/11042_2022_13102_Fig13_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13102-9/MediaObjects/11042_2022_13102_Fig14_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13102-9/MediaObjects/11042_2022_13102_Fig15_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13102-9/MediaObjects/11042_2022_13102_Fig16_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13102-9/MediaObjects/11042_2022_13102_Fig17_HTML.png)
Similar content being viewed by others
Notes
Small sequence of frames
face detection is based on classification and regression tree analysis (CART) [31].
https://github.com/ondyari/FaceForensics
H.264 codec was used to compress all videos with the quantization parameters 23 for light compression (C23) and 40 for strong compression (C40)
References
Afchar D, Nozick V, Yamagishi J, Echizen I (2018) Mesonet: a compact facial video forgery detection network. In: IEEE international workshop on information forensics and security (WIFS), vol 2018. IEEE, pp 1–7
Amrani M, Hammad M, Jiang F, Wang K, Amrani A (2018) Very deep feature extraction and fusion for arrhythmias detection. Neural Comput & Applic 30(7):2047–2057
Arora M, Kumar M (2021) Autofer: Pca and pso based automatic facial emotion recognition. Multimed Tools Appl 80(2):3039–3049
(auth) PK (2017) MATLAB Deep Learning: With Machine Learning, Neural Networks and Artificial Intelligence,1st edn. Apress
Bakas J, Naskar R, Dixit R (2019) Detection and localization of inter-frame video forgeries based on inconsistency in correlation distribution between haralick coded frames. Multimedia Tools and Applications 78(4):4905–4935 . https://doi.org/10.1007/s11042-018-6570-8
Bansal M, Kumar M, Kumar M, Kumar K (2021) An efficient technique for object recognition using shi-tomasi corner detection algorithm. Soft Comput 25(6):4423–4432
Bayar B, Stamm MC (2016) A deep learning approach to universal image manipulation detection using a new convolutional layer. In: Proceedings of the 4th ACM workshop on information hiding and multimedia security. ACM, pp 5–10
Bishop CM (2006) Pattern recognition and machine learning. springer, Berlin
Boylan JF (2018) The new york times will deepfake technology destroy democracy?. https://www.nytimes.com/2018/10/17/opinion/deep-fake-technology-democracy.html
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR
Cozzolino D, Poggi G, Verdoliva L (2017) Recasting residual-based local descriptors as convolutional neural networks: an application to image forgery detection. In: Proceedings of the 5th ACM Workshop on information hiding and multimedia security. ACM, pp 159–164
Dargan S, Kumar M, Ayyagari MR, Kumar G (2019) A survey of deep learning and its applications: a new paradigm to machine learning. Arch Comput Methods Eng, 1–22
Dolhansky B, Howes R, Pflaum B, Baram N, Ferrer CC (2019) The deepfake detection challenge (dfdc) preview dataset. arXiv:191008854
Doretto G, Chiuso A, Wu YN, Soatto S (2003) Dynamic textures. Int J Comput Vis 51(2):91–109
Elaskily MA, Elnemr HA, Dessouky MM, Faragallah OS (2019) Two stages object recognition based copy-move forgery detection algorithm. Multimed Tools Appl 78(11):15353–15373. https://doi.org/10.1007/s11042-018-6891-7
Fadl S, Han Q, Qiong L (2020) Exposing video inter-frame forgery via histogram of oriented gradients and motion energy image. Multidim Syst Sign Process. 1–20
Fadl SM, Semary NA (2017) Robust copy–move forgery revealing in digital images using polar coordinate system. Neurocomputing 265:57–65. https://doi.org/10.1016/j.neucom.2016.11.091
Fridrich J, Kodovsky J (2012) Rich models for steganalysis of digital images. IEEE Trans Inf Forensics Secur 7(3):868–882. https://doi.org/10.1109/TIFS.2012.2190402
Fung S, Lu X, Zhang C, Li CT (2021) Deepfakeucl: Deepfake detection via unsupervised contrastive learning. arXiv:210411507
Gupta S, Mohan N, Kumar M (2020) A study on source device attribution using still images. Arch Comput Methods Eng 1–15
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:150203167
K S, Mehtre B (2018) Detection of inter-frame forgeries in digital videos. https://doi.org/10.1016/j.forsciint.2018.04.056https://doi.org/10.1016/j.forsciint.2018.04.056. http://www.sciencedirect.com/science/article/pii/S0379073818302809. Forensic Sci Int 289:186–206
Khalid H, Woo SS (2020) Oc-fakedect: Classifying deepfakes using one-class variational autoencoder. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 656–657
Korshunov P, Marcel S (2018) Deepfakes: a new threat to face recognition? assessment and detection. arXiv:181208685
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Kumar A, Kumar M, Kaur A (2021a) Face detection in still images under occlusion and non-uniform illumination. Multimed Tools Appl 80(10):14565–14590
Kumar M, Kumar M et al (2021b) Xgboost: 2d-object recognition using shape descriptors and extreme gradient boosting classifier. In: Computational methods and data engineering. Springer, pp 207–222
Kumar P, Vatsa M, Singh R (2020) Detecting face2face facial reenactment in videos. In: The IEEE winter conference on applications of computer vision (WACV)
Laws KI (1980) Textured image segmentation. Tech. rep. University of Southern California Los Angeles Image Processing INST
Li Y, Yang X, Sun P, Qi H, Lyu S (2020) Celeb-df: A large-scale challenging dataset for deepfake forensics
Lienhart R, Kuranov A, Pisarevsky V (2003) Empirical analysis of detection cascades of boosted classifiers for rapid object detection. In: Michaelis B, Krell G (eds) Recognition, pattern. Springer, Berlin, pp 297–304
Matern F, Riess C, Stamminger M (2019) Exploiting visual artifacts to expose deepfakes and face manipulations. In: 2019 IEEE Winter applications of computer vision workshops (WACVW), pp 83–92, DOI https://doi.org/10.1109/WACVW.2019.00020, (to appear in print)
Megahed A, Han Q (2020) Face2face manipulation detection based on histogram of oriented gradients. In: 2020 IEEE 19th International conference on trust, security and privacy in computing and communications (TrustCom), pp 1260–1267, DOI https://doi.org/10.1109/TrustCom50675.2020.00169, (to appear in print)
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–814
Pun CM, Liu B, Yuan XC (2016) Multi-scale noise estimation for image splicing forgery detection. J Vis Commun Image Represent 38:195–206. https://doi.org/10.1016/j.jvcir.2016.03.005
Rahmouni N, Nozick V, Yamagishi J, Echizen I (2017) Distinguishing computer graphics from natural images using convolution neural networks. In: 2017 IEEE Workshop on information forensics and security (WIFS). IEEE, pp 1–6
Rössler A, Cozzolino D, Verdoliva L, Riess C, Thies J, Nießner M (2018) Faceforensics: A large-scale video dataset for forgery detection in human faces. arXiv:180309179
Rössler A, Cozzolino D, Verdoliva L, Riess C, Thies J, Nießner M (2019) Faceforensics++: Learning to detect manipulated facial images. In: Proceedings of the IEEE international conference on computer vision, pp 1–11
Sabir E, Cheng J, Jaiswal A, AbdAlmageed W, Masi I, Natarajan P (2019) Recurrent convolutional strategies for face manipulation detection in videos. Interfaces (GUI) 3(1)
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958
Szummer M, Picard RW (1996) Temporal texture modeling. In: Proceedings of 3rd IEEE international conference on image processing, vol 3. IEEE, pp 823–826
Tharwat A (2018) Classification assessment methods. Applied Computing and Informatics. https://doi.org/10.1016/j.aci.2018.08.003https://doi.org/10.1016/j.aci.2018.08.003. http://www.sciencedirect.com/science/article/pii/S2210832718301546
Wang G, Zhou J, Wu Y (2020) Exposing deep-faked videos by anomalous co-motion pattern detection. arXiv:200804848
Wu X, Xie Z, Gao Y, Xiao Y (2020) Sstnet: Detecting Manipulated faces through spatial, steganalysis and temporal features. In: ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 2952–2956
Zhang Q, Lu W, Weng J (2016) Joint image splicing detection in dct and contourlet transform domain. J Vis Commun Image Represent 40:449–458. https://doi.org/10.1016/j.jvcir.2016.07.013
Zhao G, Pietikäinen M (165) Dynamic texture recognition using volume local binary patterns. In: Dynamical vision. Springer
Zhao G, Pietikainen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(6):915–928
Zhou P, Han X, Morariu VI (2017) Two-stream neural networks for tampered face detection. In: 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW). IEEE, pp 1831–1839
Acknowledgements
This work was supported by the National Natural Science Foundation of China [grant numbers 61771168, 61471141, 61361166006, 61571018, and 61531003]; Key Technology Program of Shenzhen, China, [grant number JSGG20160427185010977]; Basic Research Project of Shenzhen, China [grant number JCYJ20150513151706561].
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Declaration of competing interests
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Megahed, A., Han, Q. Identify videos with facial manipulations based on convolution neural network and dynamic texture. Multimed Tools Appl 81, 43441–43466 (2022). https://doi.org/10.1007/s11042-022-13102-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13102-9