Recent facial manipulation techniques based on deep learning can create a highly realistic face by changing expression, attributes, identity, or creating an entire face synthesis, that called recently Deep-Fake. With the rapid appearance of such applications, they have raised great security concerns. Therefore, corresponding forensic techniques are proposed to tackle this issue. However, existing techniques are either based on complex deep networks with a binary classification that are unable to distinguish between facial manipulation types or rely on fragile hand-crafted features with unsatisfactory results. To overcome these issues, we propose a learning-based detection method by creating an uncomplicated CNN network called FMD-Net relying on the dynamic textures as input. Moreover, it is able to distinguish between facial manipulation types such as Deepfake, Face2Face, FaceSwap, and NeuralTexture. By using dynamic textures of each video shot, motion and appearance features are combined which helped the network learn manipulation artifacts and provides a robust performance at various compression levels. We conduct extensive experiments on various benchmark datasets (FaceForensics++, DFDC, and Celeb-DF) to empirically demonstrate the superiority and effectiveness of the proposed method with both binary classification and multi-classification against the state-of-the-art methods.
Megahed, A., Han, Q. Identify videos with facial manipulations based on convolution neural network and dynamic texture. Multimed Tools Appl 81, 43441–43466 (2022). https://doi.org/10.1007/s11042-022-13102-9
