Skip to main content
Log in

Emergence of deepfakes and video tampering detection approaches: A survey

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Digital content, particularly the digital videos recorded at specific angle, though, provides a truthful picture of reality but the widespread proliferation of easy-to-use content editing softwares doubt about its authenticity. Recently, Artificial Intelligence (AI) based content altering mechanism, known as deepfake, became popular on social media platforms, wherein any person can be able to purport the behaviour of another person in a video who is actually not there. Depending on the type of manipulation performed, different types of deepfakes are described in this paper. Moreover, rely on digital content for trustworthy evidence as well as to avoid spread of misinformation, integrity and authenticity of digital content has-been of utmost concerns. This paper aims to present a survey of the state-of-art video integrity verification techniques with special emphasis on emerging deepfake video detection approaches. Seeing the advancement in creation of more realistic deepfake videos, this review facilitates the development of more generalized methods with a thorough discussion on different research trends in the wake of deepfake detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. GAN: Generative Adversarial Network

  2. http://www.fakeapp.com/

  3. United States v Beeler, 62 F Supp. 2d 136 (July 1, 1999, United States District Court, D. Maine).

  4. Dolan v State of Florida, 743 S. 2d 544 (July 21, 1999, Court of Appeal of Florida, Fourth District).

  5. Defense Advanced Research Project Agency

  6. Coarse-to-Fine Deep Convolutional Neural Network

  7. Residual Network

  8. This dataset is available as a part of FaceForensics.

  9. DeepFake Detection Challenge

  10. Deep Neural Network

  11. https://www.ncbi.nlm.nih.gov/pubmed/9399231.

  12. http://bionumbers.hms.harvard.edu/bionumber.aspx?id=100706&ver=0.

  13. CNN: Convolutional Neural Network

  14. LRCN: Long-Term Recurrent CNN, a combination of CNN and LSTM

  15. LSTM: Long Short Term Memory

  16. VAE: Variational AutoEncoder

  17. Progressive Growing GAN

  18. https://github.com/EricGzq/Hybrid-Fake-Face-Dataset

  19. Gated Recurrent Unit

  20. Root Mean Square Energy

  21. Available at: https://www.descript.com/lyrebird-ai?source=lyrebird

    Table 15 Lip-sync Deepfake detection techniques (A: Accuracy, EER: Effective Error Rate)
  22. Available at: https://www.asvspoof.org/

  23. Available at: https://github.com/resemble-ai/Resemblyzer

References

  1. Adami N, Signoroni A, Leonardi R (2007) State-of-the-art and trends in scalable video compression with wavelet-based approaches. IEEE Trans Circ Syst Video Technol 17(9):1238–1255

    Article  Google Scholar 

  2. Afchar D, Nozick V, Yamagishi J, Echizen I (2018) Mesonet: a compact facial video forgery detection network. In: 2018 IEEE International workshop on information forensics and security (WIFS). IEEE, pp 1–7

  3. Agarwal S, El-Gaaly T, Farid H, Lim S N (2020) Detecting deep-fake videos from appearance and behavior. arXiv:2004.14491

  4. Agarwal S, Farid H (2021) Detecting deep-fake videos from aural and oral dynamics

  5. Agarwal S, Farid H, Gu Y, He M, Nagano K, Li H (2019) Protecting world leaders against deep fakes. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 38–45

  6. Ajder H Deepfake threat intelligence: a statistics snapshot from june 2020. http://deeptracelabs.com/deepfake-threat-intelligence-a-statistics-snapshot-from-june-2020/

  7. Al-Sanjary O I, Ahmed A A, Sulong G (2016) Development of a video tampering dataset for forensic investigation. Forensic Sci Int 266:565–572

    Article  Google Scholar 

  8. Amerini I, Galteri L, Caldelli R, Del Bimbo A (2019) Deepfake video detection through optical flow based cnn. In: Proceedings of the IEEE international conference on computer vision workshops, pp 0–0

  9. Anina I, Zhou Z, Zhao G, Pietikäinen M (2015) Ouluvs2: a multi-view audiovisual database for non-rigid mouth motion analysis. In: 2015 11Th IEEE international conference and workshops on automatic face and gesture recognition (FG), vol 1. IEEE, pp 1–5

  10. APTLY: Audio processing techniques lab at york. http://bil.eecs.yorku.ca/aptly-lab./

  11. Aslani S, Mahdavi-Nasab H (2013) Optical flow based moving object detection and tracking for traffic surveillance. Int J Electr Comput Eng 7(9):1252–1256

    Google Scholar 

  12. Baddar W J, Gu G, Lee S, Ro Y M (2017) Dynamics transfer gan:, Generating video by transferring arbitrary temporal dynamics from a source video to a single target image. Accessed 5 May 2021. arXiv:1712.03534

  13. Baidu text-to-speech system. https://cloud.baidu.com/product/speech/tts

  14. Baltrušaitis T, Robinson P, Morency LP (2016) Openface: an open source facial behavior analysis toolkit. In: 2016 IEEE Winter conference on applications of computer vision (WACV). IEEE, pp 1–10

  15. Barker J (2013) The grid audiovisual sentence corpus, available at: http://spandh.dcs.shef.ac.uk/gridcorpus/

  16. Bidokhti A, Ghaemmaghami S (2015) Detection of regional copy/move forgery in mpeg videos using optical flow. In: 2015 The international symposium on artificial intelligence and signal processing (AISP). IEEE, pp 13–17

  17. Bonettini N, Cannas E D, Mandelli S, Bondi L, Bestagini P, Tubaro S (2020)

  18. Bregler C, Covell M, Slaney M (1997) Video rewrite: Driving visual speech with audio. In: Proceedings of the 24th annual conference on Computer graphics and interactive techniques, pp 353–360

  19. Bromley J, Guyon I, LeCun Y, Säckinger E, Shah R (1994) Signature verification using a” siamese” time delay neural network. In: Advances in neural information processing systems, pp 737–744

  20. Caldelli R, Galteri L, Amerini I, Del Bimbo A (2021) Optical flow based cnn for detection of unlearnt deepfake manipulations. Pattern Recogn Lett 146:31–37

    Article  Google Scholar 

  21. Chakravarty P, Tuytelaars T (2016) Cross-modal supervision for learning active speaker detection in video. In: European conference on computer vision. Springer, pp 285–301

  22. Chan C, Ginosar S, Zhou T, Efros A A (2019) Everybody dance now. In: Proceedings of the IEEE international conference on computer vision, pp 5933–5942

  23. Chao J, Jiang X, Sun T (2012) A novel video inter-frame forgery model detection scheme based on optical flow consistency. In: International workshop on digital watermarking. Springer, pp 267–281

  24. Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details:, Delving deep into convolutional nets. arXiv:1405.3531

  25. Chen H, Chandrasekar V, Tan H, Cifelli R (2019) Rainfall estimation from ground radar and trmm precipitation radar using hybrid deep neural networks. Geophysical Research Letters

  26. Chen H, Wo Y, Han G (2018) Multi-granularity geometrically robust video hashing for tampering detection. Multimed Tools Appl 77(5):5303–5321

    Article  Google Scholar 

  27. Chen T, Kumar A, Nagarsheth P, Sivaraman G, Khoury E (2020) Generalization of audio deepfake detection. In: Proceedings of the Odyssey 2020 the speaker and language recognition workshop, pp 132–137

  28. Chen T Q, Rubanova Y, Bettencourt J, Duvenaud D. K (2018) Neural ordinary differential equations. In: Advances in neural information processing systems, pp 6571–6583

  29. Cheung G K, Baker S, Hodgins J, Kanade T (2004) Markerless human motion transfer. In: Proceedings of the 2nd international symposium on 3d data processing, visualization and transmission, 2004. 3DPVT 2004. IEEE, pp 373–378

  30. Chingovska I, Anjos A, Marcel S (2012) On the effectiveness of local binary patterns in face anti-spoofing. In: 2012 BIOSIG-proceedings of the international conference of biometrics special interest group (BIOSIG). IEEE, pp 1–7

  31. Chintha A, Thai B, Sohrawardi S J, Bhatt K, Hickerson A, Wright M, Ptucha R (2020) Recurrent convolutional structures for audio spoof and video deepfake detection. IEEE J Sel Top Signal Process 14(5):1024–1037

    Article  Google Scholar 

  32. Cho W, Choi S, Park D. K, Shin I, Choo J (2019) Image-to-image translation via group-wise deep whitening-and-coloring transformation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 10639–10647

  33. Choi Y, Choi M, Kim M, Ha J W, Kim S, Choo J (2018) Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8789–8797

  34. Choi Y, Choi M, Kim M, Ha J W, Kim S, Choo J (2018) Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8789–8797

  35. Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258

  36. Chugh K, Gupta P, Dhall A, Subramanian R (2020)

  37. Chung J S, Zisserman A (2016) Lip reading in the wild. In: Asian conference on computer vision. Springer, pp 87–103

  38. Chung J S, Zisserman A (2016) Out of time: automated lip sync in the wild. In: Asian conference on computer vision. Springer, pp 251–263

  39. Ciftci U A, Demir I (2019) Fakecatcher:, Detection of synthetic portrait videos using biological signals. arXiv:1901.02212

  40. Cole S (2017) Ai-assisted fake porn is here and we’re all fucked https://www.vice.com/en_us/article/gydydm/gal-gadot-fake-ai-porn

  41. collection, D.: Xiph.org video test media. Accessed 5 May 2021. https://media.xiph.org/video/derf/

  42. Cozzolino D, Rössler A, Thies J, Nießner M, Verdoliva L (2020) Id-reveal:, Identity-aware deepfake video detection. arXiv:2012.02512

  43. D’Amiano L, Cozzolino D, Poggi G, Verdoliva L (2018) A patchmatch-based dense-field algorithm for video copy–move detection and localization. IEEE Trans Circ Syst Video Technol 29(3):669–682

    Article  Google Scholar 

  44. De Roover C, De Vleeschouwer C, Lefebvre F, Macq B (2005) Robust video hashing based on radial projections of key frames. IEEE Trans Signal Process 53(10):4020–4037

    Article  MathSciNet  Google Scholar 

  45. Demir I, Ciftci U A (2021) Where do deep fakes look? synthetic face detection via gaze tracking. arXiv:2101.01165

  46. (2019) Dessa: Detecting audio deepfakes with ai. available at:. https://medium.com/dessa-news/detecting-audio-deepfakes-f2edfd8e2b35

  47. Ding X, Zhang D (2019) Detection of motion-compensated frame-rate up-conversion via optical flow-based prediction residue. Optik p 163766

  48. Dolhansky B, Bitton J, Pflaum B, Lu J, Howes R, Wang M, Ferrer C C (2020) The deepfake detection challenge dataset. arXiv:2006.07397

  49. Dolhansky B, Howes R, Pflaum B, Baram N, Ferrer C C (2019) The deepfake detection challenge (dfdc) preview dataset. arXiv:1910.08854

  50. Dong Q, Yang G, Zhu N (2012) A mcea based passive forensics scheme for detecting frame-based video tampering. Digit Investig 9(2):151–159

    Article  Google Scholar 

  51. Dufour N (2019) Google ai blog. contributing data to deepfake detection research. Accessed 5 May 2021. https://ai.googleblog.com/2019/09/contributing-data-to-deepfake-detection.html

  52. Durall R, Keuper M, Keuper J (2020) Watch your up-convolution: Cnn based generative deep neural networks are failing to reproduce spectral distributions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7890–7899

  53. Durall R, Keuper M, Pfreundt F. J, Keuper J (2019) Unmasking deepfakes with simple features. arXiv:1911.00686

  54. Esser P, Haux J, Milbich T, et al. (2018) Towards learning a realistic rendering of human behavior. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 0–0

  55. Feng D, Lu X, Lin X (2020) Deep detection for face manipulation. In: International conference on neural information processing. Springer, pp 316–323

  56. Fernandes S, Raj S, Ortiz E, Vintila I, Salter M, Urosevic G, Jha S (2019) Predicting heart rate variations of deepfake videos using neural ode. In: Proceedings of the IEEE international conference on computer vision workshops, pp 0–0

  57. Fernando T, Fookes C, Denman S, Sridharan S (2019) Exploiting human social cognition for the detection of fake and fraudulent faces via memory networks. arXiv:1911.07844

  58. Garg R, Varna A L, Hajj-Ahmad A, Wu M (2013) “seeing” enf: power-signature-based timestamp for digital multimedia via optical sensing and signal processing. IEEE Trans Inf Forensics Secur 8(9):1417–1432

    Article  Google Scholar 

  59. Garrido P, Valgaerts L, Sarmadi H, Steiner I, Varanasi K, Perez P, Theobalt C (2015) Vdub: Modifying face video of actors for plausible visual alignment to a dubbed audio track. In: Computer graphics forum, vol 34. Wiley Online Library, pp 193–204

  60. Grisham S (2018) Stephanie grisham on twitter. tampering performed on white house secretary’s video https://twitter.com/PressSec/status/1060374680991883265

  61. Guan H, Kozak M, Robertson E, Lee Y, Yates A N, Delgado A, Zhou D, Kheyrkhah T, Smith J, Fiscus J (2019) Mfc datasets: Large-scale benchmark datasets for media forensic challenge evaluation. In: 2019 IEEE Winter applications of computer vision workshops (WACVW). IEEE, pp 63–72

  62. Guan W, Wang W, Dong J, Peng B, Tan T (2021) Robust face-swap detection based on 3d facial shape information. arXiv:2104.13665

  63. Guarnera L, Giudice O, Battiato S (2020) Deepfake detection by analyzing convolutional traces. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 666–667

  64. Güera D, Baireddy S, Bestagini P, Tubaro S, Delp E J (2019) We need no pixels:, Video manipulation detection using stream descriptors. arXiv:1906.08743

  65. Güera D, Delp E J (2018) Deepfake video detection using recurrent neural networks. In: 2018 15Th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, pp 1–6

  66. Guo Z, Yang G, Chen J, Sun X (2020) Fake face detection via adaptive residuals extraction network. arXiv:2005.04945

  67. Haliassos A, Vougioukas K, Petridis S, Pantic M (2020) Lips don’t lie:, A generalisable and robust approach to face forgery detection. arXiv:2012.07657

  68. Hasan H R, Salah K (2019) Combating deepfake videos using blockchain and smart contracts. IEEE Access 7:41596–41606

    Article  Google Scholar 

  69. He Z, Zuo W, Kan M, Shan S, Chen X (2019) Attgan: Facial attribute editing by only changing what you want. IEEE Trans Image Process 28 (11):5464–5478

    Article  MathSciNet  Google Scholar 

  70. He Z, Zuo W, Kan M, Shan S, Chen X (2019) Attgan: Facial attribute editing by only changing what you want. IEEE Trans Image Process 28 (11):5464–5478

    Article  MathSciNet  Google Scholar 

  71. Hecker C, Raabe B, Enslow R. W, DeWeese J, Maynard J, van Prooijen K (2008) Real-time motion retargeting to highly varied user-created morphologies. ACM Transactions on Graphics (TOG) 27(3):1–11

    Article  Google Scholar 

  72. Hernandez-Ortega J, Tolosana R, Fierrez J, Morales A (2020) Deepfakeson-phys:, Deepfakes detection based on heart rate estimation. arXiv:2010.00400

  73. Horn B K, Schunck B G (1981) Determining optical flow. Artificial intelligence 17(1–3):185–203

    Article  Google Scholar 

  74. Hsieh C K, Chiu C C, Su P C (2018) Video forensics for detecting shot manipulation using the information of deblocking filtering. In: 2018 IEEE 42Nd annual computer software and applications conference (COMPSAC), vol 2. IEEE, pp 353–358

  75. Huang Y, Juefei-Xu F, Wang R, Xie X, Ma L, Li J, Miao W, Liu Y, Pu G (2020) Fakelocator:, Robust localization of gan-based face manipulations via semantic segmentation networks with bells and whistles. arXiv:2001.09598

  76. Jeon H, Bang Y, Woo S S (2020) Fdftnet:, Facing off fake images using fake detection fine-tuning network. arXiv:2001.01265

  77. Jiang L, Wu W, Li R, Qian C, Loy C C (2020) Deeperforensics-1.0:, A large-scale dataset for real-world face forgery detection. arXiv:2001.03024

  78. Jr E O (2019) Thieves used audio deepfake of a ceo to steal $243,000 https://www.vice.com/en_in/article/d3a7qa/thieves-used-audio-deep-fake-of-a-ceo-to-steal-dollar243000

  79. Karras T, Aila T, Laine S, Lehtinen J (2017) Progressive growing of gans for improved quality, stability, and variation. arXiv:1710.10196

  80. Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4401–4410

  81. Karras T, Laine S, Aittala M, Hellsten J, Lehtinen J, Aila T (2019) Analyzing and improving the image quality of stylegan. arXiv:1912.04958

  82. Khalid H, Woo S S (2020) Oc-fakedect: Classifying deepfakes using one-class variational autoencoder. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 656–657

  83. Khalil S S, Youssef S M, Saleh SN (2021) icaps-dfake: an integrated capsule-based model for deepfake image and video detection. Future Internet 13(4):93

    Article  Google Scholar 

  84. Khan S A, Artusi A, Dai H (2021)

  85. Khodabakhsh A, Ramachandra R, Raja K, Wasnik P, Busch C (2018) Fake face detection methods: Can they be generalized?. In: 2018 International conference of the biometrics special interest group (BIOSIG). IEEE, pp 1–6

  86. Kingma D P, Dhariwal P (2018) Glow: Generative flow with invertible 1x1 convolutions. In: Advances in neural information processing systems, pp 10215–10224

  87. Kingma D P, Welling M (2013) Auto-encoding variational bayes. arXiv:1312.6114

  88. Kingra S, Aggarwal N, Singh R. D (2016) Video inter-frame forgery detection: A survey. Indian J Sci Technol 9(44)

  89. Kingra S, Aggarwal N, Singh R D (2017) Inter-frame forgery detection in h. 264 videos using motion and brightness gradients. Multimed Tools Appl 76(24):25767–25786

    Article  Google Scholar 

  90. Kobayashi K, Toda T (2018) Sprocket: Open-source voice conversion software. In: Odyssey, pp 203–210

  91. Kobayashi M, Okabe T, Sato Y (2010) Detecting forgery from static-scene video based on inconsistency in noise level functions. IEEE Trans Inf Forensics Secur 5(4):883–892

    Article  Google Scholar 

  92. Kohli A, Gupta A (2021) Detecting deepfake, faceswap and face2face facial forgeries using frequency cnn. Multimedia Tools and Applications, pp 1–18

  93. Korshunov P, Halstead M, Castan D, Graciarena M, McLaren M, Burns B, Lawson A, Marcel S (2019) Tampered speaker inconsistency detection with phonetically aware audio-visual features. In: International conference on machine learning, CONF

  94. Korshunov P, Marcel S (2018) Deepfakes:, a new threat to face recognition? assessment and detection. arXiv:1812.08685

  95. Korshunov P, Marcel S (2018) Speaker inconsistency detection in tampered video. In: 2018 26Th european signal processing conference (EUSIPCO). IEEE, pp 2375–2379

  96. Kumar A, Bhavsar A, Verma R (2020) Detecting deepfakes with metric learning. In: 2020 8Th international workshop on biometrics and forensics (IWBF). IEEE, pp 1–6

  97. Kumar N, Kaur N, Gupta D (2020) Major convolutional neural networks in image classification: a survey. In: Proceedings of International Conference on IoT Inclusive Life (ICIIL 2019), NITTTR Chandigarh, India. Springer, pp 243–258

  98. Kumar N, Kaur N, Gupta D (2020) Red green blue depth image classification using pre-trained deep convolutional neural network. Pattern Recognit Image Anal 30(3):382–390

    Article  Google Scholar 

  99. Kumar P, Vatsa M, Singh R (2020) Detecting face2face facial reenactment in videos. arXiv:2001.07444

  100. Laptev I, Marszałek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies

  101. Lee S, Tariq S, Kim J, Woo S. S (2021) Tar:, Generalized forensic framework to detect deepfakes using weakly supervised learning. arXiv:2105.06117

  102. Lee S, Yoo C D (2006) Video fingerprinting based on centroids of gradient orientations. In: 2006 IEEE International conference on acoustics speech and signal processing proceedings, vol 2. IEEE, pp II–II

  103. Lee S, Yoo C D (2008) Robust video fingerprinting based on affine covariant regions. In: 2008 IEEE International conference on acoustics, speech and signal processing. IEEE, pp 1237–1240

  104. Li H, Hu L, Wei L, Nagano K, Jaewoo S, Fursund J, Saito S Avatar digitization from a single image for real-time rendering (2020). US Patent 10,535,163

  105. Li L, Bao J, Zhang T, Yang H, Chen D, Wen F, Guo B (2019) Face x-ray for more general face forgery detection. arXiv:1912.13458

  106. Li M, Monga V (2012) Robust video hashing via multilinear subspace projections. IEEE Transactions on Image Processing 21(10):4397–4409

    Article  MathSciNet  Google Scholar 

  107. Li R, Liu Z, Zhang Y, Li Y, Fu Z (2018) Noise-level estimation based detection of motion-compensated frame interpolation in video sequences. Multimedia Tools and Applications 77(1):663–688

    Article  Google Scholar 

  108. Li X, Lang Y, Chen Y, Mao X, He Y, Wang S, Xue H, Lu Q (2020) Sharp multiple instance learning for deepfake video detection. arXiv:2008.04585

  109. Li Y, Chang M. C, Lyu S (2018) In ictu oculi:, Exposing ai generated fake face videos by detecting eye blinking. arXiv:1806.02877

  110. Li Y, Yang X, Sun P, Qi H, Lyu S (2019) Celeb-df:, A new dataset for deepfake forensics. arXiv:1909.12962

  111. Liu M, Ding Y, Xia M, Liu X, Ding E, Zuo W, Wen S (2019) Stgan: a unified selective transfer network for arbitrary image attribute editing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3673–3682

  112. Liu Y, Guan Q, Zhao X, Cao Y (2018) Image forgery localization based on multi-scale convolutional neural networks. In: Proceedings of the 6th ACM workshop on information hiding and multimedia security, pp 85–90

  113. Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In: Proceedings of the IEEE international conference on computer vision, pp 3730–3738

  114. Long C, Basharat A, Hoogs A (2019) A coarse-to-fine deep convolutional neural network framework for frame duplication detection and localization in forged videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 1–10

  115. Lucas B. D, Kanade T et al (1981) An iterative image registration technique with an application to stereo vision

  116. Malekesmaeili M, Fatourechi M, Ward R K (2009) Video copy detection using temporally informative representative images. In: 2009 International conference on machine learning and applications. IEEE, pp 69–74

  117. Maras M H, Alexandrou A (2019) Determining authenticity of video evidence in the age of artificial intelligence and in the wake of deepfake videos. The Int J Evid Proof 23(3):255–262

    Article  Google Scholar 

  118. Mase K (1991) Recognition of facial expression from optical flow. IEICE Trans Inf Syst 74(10):3474–3483

    Google Scholar 

  119. Masi I, Killekar A, Mascarenhas RM, Gurudatt S. P, AbdAlmageed W (2020) Two-branch recurrent network for isolating deepfakes in videos. arXiv:2008.03412

  120. Matern F, Riess C, Stamminger M (2019) Exploiting visual artifacts to expose deepfakes and face manipulations. In: 2019 IEEE Winter applications of computer vision workshops (WACVW). IEEE, pp 83–92

  121. Mehra A (2020) Deepfake detection using capsule networks with long short-term memory networks. Master’s thesis, University of Twente

  122. Milani S, Bestagini P, Tagliasacchi M, Tubaro S (2012) Multiple compression detection for video sequences. In: 2012 IEEE 14Th international workshop on multimedia signal processing (MMSP). IEEE, pp 112–117

  123. Mirsky Y, Lee W (2021) The creation and detection of deepfakes: a survey. ACM Computing Surveys (CSUR) 54(1):1–41

    Article  Google Scholar 

  124. Mittal T, Bhattacharya U, Chandra R, Bera A, Manocha D (2020) Emotions don’t lie:, A deepfake detection method using audio-visual affective cues. arXiv:2003.06711

  125. Mohammadi SH (2019) Text to speech synthesis using deep neural network with constant unit length spectrogram. US Patent 10,186,252

  126. Montserrat D M, Hao H, Yarlagadda S K, Baireddy S, Shao R, Horváth J, Bartusiak E, Yang J, Güera D, Zhu F et al (2020) Deepfakes detection with automatic face weighting. arXiv:2004.12027

  127. Nagothu D, Chen Y, Blasch E, Aved A, Zhu S (2019) Detecting malicious false frame injection attacks on surveillance systems at the edge using electrical network frequency signals. Sensors 19(11):2424

    Article  Google Scholar 

  128. Nagothu D, Schwell J, Chen Y, Blasch E, Zhu S (2019) A study on smart online frame forging attacks against video surveillance system. In: Sensors and systems for space applications XII, vol 11017. International Society for Optics and Photonics, p 110170L

  129. Nguyen H H, Fang F, Yamagishi J, Echizen I (2019) Multi-task learning for detecting and segmenting manipulated facial images and videos. arXiv:1906.06876

  130. Nguyen H H, Yamagishi J, Echizen I (2019) Capsule-forensics: Using capsule networks to detect forged images and videos. In: ICASSP 2019-2019 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 2307–2311

  131. Nguyen H M, Derakhshani R (2020) Eyebrow recognition for identifying deepfake videos. In: 2020 International conference of the biometrics special interest group (BIOSIG). IEEE, pp 1–5

  132. Nguyen T T, Nguyen C M, Nguyen D T, Nguyen D T, Nahavandi S (2019) Deep learning for deepfakes creation and detection. arXiv:1909.11573

  133. Nguyen X H, Tran T S, Nguyen K D, Truong D T, et al. (2021) Learning spatio-temporal features to detect manipulated facial videos created by the deepfake techniques. Forensic Science International: Digital Investigation 36:301108

    Google Scholar 

  134. Nirkin Y, Keller Y, Hassner T (2019) Fsgan: Subject agnostic face swapping and reenactment. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7184–7193

  135. Nirkin Y, Wolf L, Keller Y, Hassner T (2020) Deepfake detection based on the discrepancy between the face and its context. arXiv:2008.12262

  136. Noguchi A, Yanai K (2010) A surf-based spatio-temporal feature for feature-fusion-based action recognition. In: European conference on computer vision. Springer, pp 153–167

  137. Oord Avd, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) Wavenet:, A generative model for raw audio. arXiv:1609.03499

  138. Oostveen J, Kalker T, Haitsma J (2002) Feature extraction and a database strategy for video fingerprinting. In: International conference on advances in visual information systems. Springer, pp 117–128

  139. Ouyang J, Liu Y, Shu H (2017) Robust hashing for image authentication using sift feature and quaternion zernike moments. Multimed Tools Appl 76(2):2609–2626

    Article  Google Scholar 

  140. Papadopoulou O, Zampoglou M, Papadopoulos S, Kompatsiaris Y, Teyssou D (2018) Invid fake video corpus v2. 0 (version 2.0) Dataset on Zenodo

  141. Parkhi O M, Vedaldi A, Zisserman A (2015) Deep face recognition

  142. Posters B (2018) Bill posters on instagram. artificially generated video of mark zuckerberg https://twitter.com/PressSec/status/1060374680991883265

  143. Project A (2017) Ami corpus download. available at: http://groups.inf.ed.ac.uk/ami/download/

  144. Project R Tools for digital forensics. http://www.rewindproject.eu/

  145. Qadir G, Yahaya S, Ho AT (2012) Surrey university library for forensic analysis (sulfa) of video content

  146. Qi H, Guo Q, Juefei-Xu F, Xie X, Ma L, Feng W, Liu Y, Zhao J (2020) Deeprhythm: exposing deepfakes with attentional visual heartbeat rhythms. In: Proceedings of the 28th ACM international conference on multimedia, pp 4318–4327

  147. Rössler A, Cozzolino D, Verdoliva L, Riess C, Thies J, Nießner M (2018) Faceforensics: A large-scale video dataset for forgery detection in human faces. arXiv:1803.09179

  148. Rössler A, Cozzolino D, Verdoliva L, Riess C, Thies J, Nießner M (2019) Faceforensics++: Learning to detect manipulated facial images. arXiv:1901.08971

  149. Roy S, Sun Q (2007) Robust hash for detecting and localizing image tampering. In: 2007 IEEE International conference on image processing, vol 6. IEEE, pp VI–117

  150. Sabir E, Cheng J, Jaiswal A, AbdAlmageed W, Masi I, Natarajan P (2019) Recurrent convolutional strategies for face manipulation detection in videos. Interfaces (GUI) 3:1

    Google Scholar 

  151. Saikia N (2015) Perceptual hashing in the 3d-dwt domain. In: 2015 International conference on green computing and internet of things (ICGCIot). IEEE, pp 694–698

  152. Sanderson C (2019) Vidtimit audio-video dataset. available at: http://conradsanderson.id.au/vidtimit/

  153. Saunders J, Comerford A, Williams G (2019) Detecting deep fakes with mice: Machines vs biology https://i.blackhat.com/USA-19/wednesday/us-19-williams-detecting-deep-Fakes-With-Mice-wp.pdf

  154. Saxena S, Subramanyam A, Ravi H (2016) Video inpainting detection and localization using inconsistencies in optical flow. In: 2016 IEEE Region 10 conference (TENCON). IEEE, pp 1361–1365

  155. Seeling P, Reisslein M (2001) Video traces research group http://trace.eas.asu.edu/

  156. Selvaraju R R, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626

  157. Shang Z, Xie H, Zha Z, Yu L, Li Y, Zhang Y (2021) Prrnet: Pixel-region relation network for face forgery detection. Pattern Recogn 116:107950

    Article  Google Scholar 

  158. Shen J, Pang R, Weiss R J, Schuster M, Jaitly N, Yang Z, Chen Z, Zhang Y, Wang Y, Skerrv-Ryan R et al (2018) Natural tts synthesis by conditioning wavenet on mel spectrogram predictions. In: 2018 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4779–4783

  159. Singh R D, Aggarwal N (2017) Detection of upscale-crop and splicing for digital video authentication. Digit Investig 21:31–52

    Article  Google Scholar 

  160. Singh RD, Aggarwal N (2017) Optical flow and prediction residual based hybrid forensic system for inter-frame tampering detection. Journal of Circuits, Systems and Computers 26(07):1750107

    Article  Google Scholar 

  161. Singh R D, Aggarwal N (2018) Video content authentication techniques: a comprehensive survey. Multimed Syst 24(2):211–240

    Article  Google Scholar 

  162. Song F, Tan X, Liu X, Chen S (2014) Eyes closeness detection from still images with multi-scale histograms of principal oriented gradients. Pattern Recogn 47(9):2825–2838

    Article  Google Scholar 

  163. Sowmya K, Chennamma H (2015) A survey on video forgery detection. Int J Comput Eng Appl 9(2):17–27

    Google Scholar 

  164. Stehouwer J, Dang H, Liu F, Liu X, Jain A (2019) On the detection of digital face manipulation. arXiv:1910.01717

  165. Su Y, Xu J (2010) Detection of double-compression in mpeg-2 videos. In: 2010 2Nd international workshop on intelligent systems and applications. IEEE, pp 1–4

  166. Sun K, Zhao Y, Jiang B, Cheng T, Xiao B, Liu D, Mu Y, Wang X, Liu W, Wang J (2019) High-resolution representations for labeling pixels and regions. arXiv:1904.04514

  167. Sun Q, Liu Y, Chua T. S, Schiele B (2019) Meta-transfer learning for few-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 403–412

  168. Sun X, Wu B, Chen W (2020) Identifying invariant texture violation for robust deepfake detection. arXiv:2012.10580

  169. Suwajanakorn S, Seitz S M, Kemelmacher-Shlizerman I (2017) Synthesizing obama: learning lip sync from audio. ACM Transactions on Graphics (TOG) 36(4):1–13

    Article  Google Scholar 

  170. Tachibana H, Uenoyama K, Aihara S (2018) Efficiently trainable text-to-speech system based on deep convolutional networks with guided attention. In: 2018 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4784–4788

  171. Tamgade S N, Bora V R (2009) Motion vector estimation of video image by pyramidal implementation of lucas kanade optical flow. In: 2009 Second international conference on emerging trends in engineering & technology. IEEE, pp 914–917

  172. Tan M, Le Q V (2019) Efficientnet:, Rethinking model scaling for convolutional neural networks. arXiv:1905.11946

  173. Tariq S, Lee S, Woo S S (2020) A convolutional lstm based residual network for deepfake video detection. arXiv:2009.07480

  174. Thies J, Elgharib M, Tewari A, Theobalt C (2019) Nießner, M.: Neural voice puppetry: Audio-driven facial reenactment. arXiv:1912.05566

  175. Thies J, Zollhöfer M, Nießner M (2019) Deferred neural rendering: Image synthesis using neural textures. ACM Transactions on Graphics (TOG) 38(4):1–12

    Article  Google Scholar 

  176. Thies J, Zollhofer M, Stamminger M, Theobalt C, Nießner M (2016) face2face: Real-time face capture and reenactment of rgb videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2387–2395

  177. Tian Y, Pei K, Jana S, Ray B (2018) Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In: Proceedings of the 40th international conference on software engineering. ACM, pp 303–314

  178. Todisco M, Wang X, Vestman V, Sahidullah M, Delgado H, Nautsch A, Yamagishi J, Evans N, Kinnunen T, Lee K A (2019) Asvspoof 2019:, Future horizons in spoofed and fake audio detection. arXiv:1904.05441

  179. Tolosana R, Vera-Rodriguez R, Fierrez J, Morales A, Ortega-Garcia J (2020) Deepfakes and beyond:, A survey of face manipulation and fake detection. arXiv:2001.00179

  180. TRECVID: Trec video retrieval evaluation. http://trecvid.nist.gov/

  181. Tulyakov S, Liu M Y, Yang X, Kautz J (2018) Mocogan: Decomposing motion and content for video generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1526–1535

  182. Verdoliva L (2020) Media forensics and deepfakes:, an overview. arXiv:2001.06564

  183. Vincent J (2018) Jordan peele use ai to make barack obama deliver a psa about fake news https://www.theverge.com/tldr/2018/4/17/17247334/ai-fake-news-video%-barack-obama-jordan-peele-buzzfeed

  184. Wahab A W A, Bagiwa M A, Idris M Y I, Khan S, Razak Z, Ariffin M R K (2014) Passive video forgery detection techniques: a survey. In: 2014 10Th international conference on information assurance and security. IEEE, pp 29–34

  185. Wan L, Wang Q, Papir A, Moreno I L (2018) Generalized end-to-end loss for speaker verification. In: 2018 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4879–4883

  186. Wang J, Wu Z, Chen J, Jiang Y G (2021) M2tr:, Multi-modal multi-scale transformers for deepfake detection. arXiv:2104.09770

  187. Wang Q, Li Z, Zhang Z, Ma Q (2014) Video inter-frame forgery identification based on optical flow consistency. Sensors & Transducers 166(3):229

    Google Scholar 

  188. Wang R, Juefei-Xu F, Huang Y, Guo Q, Xie X, Ma L, Liu Y (2020) Deepsonar:, Towards effective and robust detection of ai-synthesized fake voices. arXiv:2005.13770

  189. Wang R, Juefei-Xu F, Ma L, Xie X, Huang Y, Wang J, Liu Y (2020) Fakespotter: a simple yet robust baseline for spotting ai-synthesized fake faces. In: International joint conference on artificial intelligence (IJCAI)

  190. Wang S Y, Wang O, Zhang R, Owens A, Efros A A (2020) Cnn-generated images are surprisingly easy to spot... for now. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol 7

  191. Wang T C, Liu M. Y, Zhu J. Y, Liu G, Tao A, Kautz J, Catanzaro B (2018) Video-to-video synthesis. arXiv:1808.06601

  192. Wang W, Farid H (2006) Exposing digital forgeries in video by detecting double mpeg compression. In: Proceedings of the 8th workshop on Multimedia and security. ACM, pp 37–47

  193. Wang W, Farid H (2009) Exposing digital forgeries in video by detecting double quantization. In: Proceedings of the 11th ACM workshop on Multimedia and security. ACM, pp 39–48

  194. Wang W, Jiang X, Wang S, Wan M, Sun T (2013) Identifying video forgery process using optical flow. In: International workshop on digital watermarking. Springer, pp 244–257

  195. Wang Y, Skerry-Ryan R, Stanton D, Wu Y, Weiss R J, Jaitly N, Yang Z, Xiao Y, Chen Z, Bengio S et al (2017) Tacotron:, Towards end-to-end speech synthesis. arXiv:1703.10135

  196. Wheatley T, Weinberg A, Looser C, Moran T, Hajcak G (2011) Mind perception: Real but not artificial faces sustain neural activity beyond the n170/vpp PloS one 6(3)

  197. Wiles O, Koepke A, Zisserman A (2018) Self-supervised learning of a facial attribute embedding from video. arXiv:1808.06882

  198. Wodajo D, Atnafu S (2021) Deepfake video detection using convolutional vision transformer. arXiv:2102.11126

  199. Xie W, Nagrani A, Chung J S, Zisserman A (2019) Utterance-level aggregation for speaker recognition in the wild. In: ICASSP 2019-2019 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5791–5795

  200. Xu F, Liu Y, Stoll C, Tompkin J, Bharaj G, Dai Q, Seidel H P, Kautz J, Theobalt C (2011) Video-based characters: creating new human performances from a multi-view video database. In: ACM SIGGRAPH 2011 Papers, pp 1–10

  201. Yang X, Li Y, Lyu S (2019) Exposing deep fakes using inconsistent head poses. In: ICASSP 2019-2019 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 8261–8265

  202. Yoo D G, Kang S J, Kim Y H (2013) Direction-select motion estimation for motion-compensated frame rate up-conversion. J Disp Technol 9 (10):840–850

    Article  Google Scholar 

  203. Zampoglou M, Markatopoulou F, Mercier G, Touska D, Apostolidis E, Papadopoulos S, Cozien R, Patras I, Mezaris V, Kompatsiaris I (2019) Detecting tampered videos with multimedia forensics and deep learning. In: International conference on multimedia modeling. Springer, pp 374–386

  204. Zhang X, Li H, Qi Y, Leow W K, Ng T K (2006) Rain removal in video by combining temporal and chromatic properties. In: 2006 IEEE International conference on multimedia and expo. IEEE, pp 461–464

  205. Zhang Z, Robinson D, Tepper J (2018) Detecting hate speech on twitter using a convolution-gru based deep neural network. In: European semantic web conference, pp 745–760. Springer

  206. Zhao T, Xu X, Xu M, Ding H, Xiong Y, Xia W (2020) Learning to recognize patch-wise consistency for deepfake detection. arXiv:2012.09311

  207. Zhao Y, Wang S, Feng G, Tang Z (2010) A robust image hashing method based on zernike moments. J Comput Inf Syst 6(3):717–725

    Google Scholar 

  208. Zhu B, Fang H, Sui Y, Li L (2020) Deepfakes for medical video de-identification: Privacy protection and diagnostic information preservation. In: Proceedings of the AAAI/ACM conference on ai, ethics, and society, pp 414–420

Download references

Acknowledgments

This Work is carried out at Design Innovation Center, Panjab University, Chandigarh, INDIA, established by the Ministry of Education, Government of India.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Staffy Kingra.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kingra, S., Aggarwal, N. & Kaur, N. Emergence of deepfakes and video tampering detection approaches: A survey. Multimed Tools Appl (2022). https://doi.org/10.1007/s11042-022-13100-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-022-13100-x

Keywords

Navigation