ABSTRACT
In this paper, we report on perceptual experiments indicating that there are distinct and quantitatively measurable differences in the way we visually perceive genuine versus face-swapped videos.
Recent progress in deep learning has made face-swapping techniques a powerful tool for creative purposes, but also a means for unethical forgeries. Currently, it remains unclear why people are misled, and which indicators they use to recognize potential manipulations. Here, we conduct three perceptual experiments focusing on a wide range of aspects: the conspicuousness of artifacts, the viewing behavior using eye tracking, the recognition accuracy for different video lengths, and the assessment of emotions.
Our experiments show that responses differ distinctly when watching manipulated as opposed to original faces, from which we derive perceptual cues to recognize face swaps. By investigating physiologically measurable signals, our findings yield valuable insights that may also be useful for advanced algorithmic detection.
Supplemental Material
Available for Download
- Laura Abbruzzese, Nadia Magnani, Ian Hamilton Robertson, and Mauro Mancuso. 2019. Age and gender differences in emotion recognition. Frontiers in psychology 10 (2019), 2371. https://doi.org/10.3389/fpsyg.2019.02371Google ScholarCross Ref
- Darius Afchar, Vincent Nozick, Junichi Yamagishi, and Isao Echizen. 2018. Mesonet: a compact facial video forgery detection network. In IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, New York, NY, USA, 1–7. https://doi.org/10.1109/WIFS.2018.8630761Google ScholarCross Ref
- Shruti Agarwal, Hany Farid, Yuming Gu, Mingming He, Koki Nagano, and Hao Li. 2019. Protecting World Leaders Against Deep Fakes. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop (CVPRW). IEEE, New York, NY, USA, 38–45.Google Scholar
- Robert R. Althoff and Neal J. Cohen. 1999. Eye-Movement-Based Memory Effect: A Reprocessing Effect in Face Perception. Journal of Experimental Psychology: Learning, Memory, and Cognition 25, 4 (7 1999), 997–1010. https://doi.org/10.1037/0278-7393.25.4.997Google ScholarCross Ref
- Lisa Feldman Barrett, Batja Mesquita, and Maria Gendron. 2011. Context in emotion perception. Current Directions in Psychological Science 20, 5 (2011), 286–290. https://doi.org/10.1177/0963721411422522Google ScholarCross Ref
- Elina Birmingham and Alan Kingstone. 2009. Human social attention. Progress in brain research 176 (2009), 309–320. https://doi.org/10.1016/S0079-6123(09)17618-5Google ScholarCross Ref
- Dario Bombari, Fred W Mast, and Janek S Lobmaier. 2009. Featural, Configural, and Holistic Face-Processing Strategies Evoke Different Scan Patterns. Perception 38, 10 (2009), 1508–1521. https://doi.org/10.1068/p6117Google ScholarCross Ref
- Isabelle Boutet, Chantal L Lemieux, Marc-André Goulet, and Charles A Collin. 2017. Faces elicit different scanning patterns depending on task demands. Attention, Perception, & Psychophysics 79, 4 (2017), 1050–1063. https://doi.org/10.3758/s13414-017-1284-yGoogle ScholarCross Ref
- Julie N Buchan, Martin Paré, and Kevin G Munhall. 2007. Spatial statistics of gaze fixations during dynamic face processing. Social Neuroscience 2, 1 (2007), 1–13. https://doi.org/10.1080/17470910601043644Google ScholarCross Ref
- Martin Čadík, Robert Herzog, Rafał Mantiuk, Radosław Mantiuk, Karol Myszkowski, and Hans-Peter Seidel. 2013. Learning to predict localized distortions in rendered images. In Computer Graphics Forum, Vol. 32. Wiley Online Library, Hoboken, New Jersey, USA, 401–410. https://doi.org/10.1111/cgf.12248Google ScholarCross Ref
- Manuel G. Calvo and Lauri Nummenmaa. 2009. Eye-movement assessment of the time course in facial expression recognition: Neurophysiological implications. Cognitive, Affective, & Behavioral Neuroscience 9, 4(2009), 398–411. https://doi.org/10.3758/CABN.9.4.398Google ScholarCross Ref
- Pilar Carrera-Levillain and Jose-Miguel Fernandez-Dols. 1994. Neutral faces in context: Their emotional meaning and their function. Journal of Nonverbal Behavior 18, 4 (1994), 281–299. https://doi.org/10.1007/BF02172290Google ScholarCross Ref
- Susana Castillo, Tilke Judd, and Diego Gutierrez. 2011. Using eye-tracking to assess different image retargeting methods. In ACM SIGGRAPH Symposium on Applied Perception in Graphics and Visualization (APVG). ACM, New York, NY, USA, 7–14. https://doi.org/10.1145/2077451.2077453Google ScholarDigital Library
- Susana Castillo, Christian Wallraven, and Douglas Cunningham. 2014. The Semantic Space for Facial Communication. Computer Animation and Virtual Worlds 25, 3-4 (May 2014), 223–231. https://doi.org/10.1002/cav.1593Google ScholarDigital Library
- BBC News Daniel Thomas. 2020. Deepfakes: A threat to democracy or just a bit of fun?BBC. Retrieved September 16, 2020 from https://www.bbc.com/news/business-51204954Google Scholar
- DeepFaceLab. 2019. DeepFaceLab. https://github.com/iperov/DeepFaceLab Accessed: 2020-01-06.Google Scholar
- DeepFakes. 2019. DeepFakes. https://github.com/deepfakes/faceswap Accessed: 2020-01-06.Google Scholar
- Tom Dobber, Nadia Metoui, Damian Trilling, Natali Helberger, and Claes de Vreese. 2020. Do (Microtargeted) Deepfakes Have Real Effects on Political Attitudes?The International Journal of Press/Politics 0 (2020), 1940161220944364. https://doi.org/10.1177/1940161220944364Google ScholarCross Ref
- Howard E Egeth and Steven Yantis. 1997. Visual attention: Control, representation, and time course. Annual review of psychology 48, 1 (1997), 269–297. https://doi.org/10.1146/annurev.psych.48.1.2699Google ScholarCross Ref
- Hedwig Eisenbarth and Georg W Alpers. 2011. Happy mouth and sad eyes: scanning emotional facial expressions. Emotion 11, 4 (2011), 860. https://doi.org/10.1037/a0022758Google ScholarCross Ref
- Paul Ekman, E Richard Sorenson, and Wallace V Friesen. 1969. Pan-cultural elements in facial displays of emotion. Science 164, 3875 (1969), 86–88. https://doi.org/10.1126/science.164.3875.86Google ScholarCross Ref
- Ulrich Engelke, Daniel P Darcy, Grant H Mulliken, Sebastian Bosse, Maria G Martini, Sebastian Arndt, Jan-Niklas Antons, Kit Yan Chan, Naeem Ramzan, and Kjell Brunnström. 2016. Psychophysiology-based QoE assessment: A survey. IEEE Journal of Selected Topics in Signal Processing 11, 1(2016), 6–21. https://doi.org/10.1109/JSTSP.2016.2609843Google ScholarCross Ref
- FaceSwap. 2019. FaceSwap. https://github.com/MarekKowalski/FaceSwap Accessed: 2020-01-06.Google Scholar
- Steven Fernandes, Sunny Raj, Eddy Ortiz, Iustina Vintila, Margaret Salter, Gordana Urosevic, and Sumit Jha. 2019. Predicting Heart Rate Variations of Deepfake Videos using Neural ODE. In IEEE International Conference on Computer Vision Workshops (CVPRW). IEEE, New York, NY, USA, 1721–1729. https://doi.org/10.1109/ICCVW.2019.00213Google ScholarCross Ref
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems (NeurIPS). ACM, New York, NY, USA, 2672–2680. https://doi.org/10.5555/2969033.2969125Google ScholarDigital Library
- GoogleAIBlog. 2019. Contributing Data to Deepfake Detection Research. https://ai.googleblog.com/2019/09/contributing-data-to-deepfake-detection.htmlGoogle Scholar
- D. Güera and E. J. Delp. 2018. Deepfake video detection using recurrent neural networks. In IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, New York, NY, USA, 1–6. https://doi.org/10.1109/AVSS.2018.8639163Google ScholarCross Ref
- Tianchu Guo, Yongchao Liu, Hui Zhang, Xiabing Liu, Youngjun Kwak, Byung In Yoo, Jae-Joon Han, and Changkyu Choi. 2019. A Generalized and Robust Method Towards Practical Gaze Estimation on Smart Phone. In IEEE International Conference on Computer Vision Workshops (ICCVW). IEEE, New York, NY, USA, 1131–1139. https://doi.org/10.1109/ICCVW.2019.00144Google ScholarCross Ref
- Parul Gupta, Komal Chugh, Abhinav Dhall, and Ramanathan Subramanian. 2020. The eyes know it: FakeET-An Eye-tracking Database to Understand Deepfake Perception. In International Conference on Multimodal Interaction (ICIM). ACM, New York, NY, USA, 519–527. https://doi.org/10.1145/3382507.3418857Google ScholarDigital Library
- Roy S Hessels, Jeroen S Benjamins, Tim HW Cornelissen, and Ignace TC Hooge. 2018. A validation of automatically-generated Areas-of-Interest in videos of a face for eye-tracking research. Frontiers in psychology 9 (2018), 1367. https://doi.org/10.3389/fpsyg.2018.01367Google ScholarCross Ref
- Stephen J Hinde, Tim J Smith, and Iain D Gilchrist. 2018. Does narrative drive dynamic attention to a prolonged stimulus?Cognitive research: principles and implications 3, 1(2018), 45. https://doi.org/10.1186/s41235-018-0140-5Google ScholarCross Ref
- Janet Hui-wen Hsiao and Garrison Cottrell. 2008. Two fixations suffice in face recognition. Psychological science 19, 10 (2008), 998–1006. https://doi.org/10.1111/j.1467-9280.2008.02191.xGoogle ScholarCross Ref
- Derek M Isaacowitz, Corinna E Löckenhoff, Richard D Lane, Ron Wright, Lee Sechrest, Robert Riedel, and Paul T Costa. 2007. Age differences in recognition of emotion in lexical stimuli and facial expressions. Psychology and aging 22, 1 (2007), 147. https://doi.org/10.1037/0882-7974.22.1.147Google ScholarCross Ref
- Stephen W Janik, A Rodney Wellens, Myron L Goldberg, and Louis F Dell’Osso. 1978. Eyes as the center of focus in the visual examination of human faces. Perceptual and Motor Skills 47, 3 (1978), 857–858. https://doi.org/10.2466/pms.1978.47.3.857Google ScholarCross Ref
- Liming Jiang, Ren Li, Wayne Wu, Chen Qian, and Chen Change Loy. 2020. DeeperForensics-1.0: A Large-Scale Dataset for Real-World Face Forgery Detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, New York, NY, USA, 2886–2895. https://doi.org/10.1109/CVPR42600.2020.00296Google ScholarCross Ref
- Tilke Judd, Krista Ehinger, Frédo Durand, and Antonio Torralba. 2009. Learning to predict where humans look. In IEEE International Conference on Computer Vision (ICCV). IEEE, New York, NY, USA, 2106–2113. https://doi.org/10.1109/ICCV.2009.5459462Google ScholarCross Ref
- Kathrin Kaulard, Douglas W. Cunningham, Heinrich H. Bülthoff, and Christian Wallraven. 2012. The MPI Facial Expression Database — A Validated Database of Emotional and Conversational Facial Expressions. PLoS ONE 7, 3 (03 2012), e32321. https://doi.org/10.1371/journal.pone.0032321Google ScholarCross Ref
- Davis E. King. 2009. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research 10 (2009), 1755–1758. http://www.dlib.net.Google ScholarDigital Library
- I. Korshunova, W. Shi, J. Dambre, and L. Theis. 2017. Fast face-swap using convolutional neural networks. In IEEE International Conference on Computer Vision (ICCV). IEEE, New York, NY, USA, 3677–3685. https://doi.org/10.1109/ICCV.2017.397Google ScholarCross Ref
- Kyle Krafka, Aditya Khosla, Petr Kellnhofer, Harini Kannan, Suchendra Bhandarkar, Wojciech Matusik, and Antonio Torralba. 2016. Eye Tracking for Everyone. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, New York, NY, USA, 2176–2184. https://doi.org/10.1109/CVPR.2016.239.Google ScholarCross Ref
- Charissa R Lansing and George W McConkie. 2003. Word identification and eye fixation locations in visual and visual-plus-auditory presentations of spoken sentences. Perception & psychophysics 65, 4 (2003), 536–552. https://doi.org/10.3758/BF03194581Google ScholarCross Ref
- Steven M. Gillespie Laura J. Wells and Pia Rotshtein. 2016. Identification of emotional facial expressions: Effects of expression, intensity, and sex on eye gaze. PloS ONE 11, 12 (2016). https://doi.org/10.1371/journal.pone.0168307Google ScholarCross Ref
- Yuezun Li, Ming-Ching Chang, and Siwei Lyu. 2018. In ictu oculi: Exposing ai created fake videos by detecting eye blinking. In IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, New York, NY, USA, 1–7. https://doi.org/10.1109/WIFS.2018.8630787Google ScholarCross Ref
- Yuezun Li and Siwei Lyu. 2019. Exposing DeepFake Videos By Detecting Face Warping Artifacts. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vol. 2. IEEE, New York, NY, USA, 46–52.Google Scholar
- Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. 2020. Celeb-DF: A Large-Scale Challenging Dataset for DeepFake Forensics. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, New York, NY, USA, 3207–3216. https://doi.org/10.1109/CVPR42600.2020.00327Google ScholarCross Ref
- Tsoey Wun Man and Peter J Hills. 2016. Eye-tracking the own-gender bias in face recognition: Other-gender faces are viewed differently to own-gender faces. Visual Cognition 24, 9-10 (2016), 447–458. https://doi.org/10.1080/13506285.2017.1301614Google ScholarCross Ref
- Albert Mehrabian. 2008. Communication without words. Communication theory 6(2008), 193–200.Google Scholar
- I Mertens, H Siegmund, and O-J Grüsser. 1993. Gaze motor asymmetries in the perception of faces during a memory task. Neuropsychologia 31, 9 (1993), 989–998. https://doi.org/10.1016/0028-3932(93)90154-RGoogle ScholarCross Ref
- Xiongkuo Min, Guangtao Zhai, Zhongpai Gao, and Chunjia Hu. 2014. Influence of compression artifacts on visual attention. In IEEE International Conference on Multimedia and Expo (ICME). IEEE, New York, NY, USA, 1–6. https://doi.org/10.1109/ICME.2014.6890189Google ScholarCross Ref
- Parag K Mital, Tim J Smith, Robin L Hill, and John M Henderson. 2011. Clustering of gaze during dynamic scene viewing is predicted by motion. Cognitive computation 3, 1 (2011), 5–24. https://doi.org/10.1007/s12559-010-9074-zGoogle ScholarCross Ref
- Nora A Murphy and Derek M Isaacowitz. 2010. Age effects and gaze patterns in recognising emotional expressions: An in-depth look at gaze measures and covariates. Cognition and Emotion 24, 3 (2010), 436–452. https://doi.org/10.1080/02699930802664623Google ScholarCross Ref
- Joao C Neves, Ruben Tolosana, Ruben Vera-Rodriguez, Vasco Lopes, Hugo Proença, and Julian Fierrez. 2020. GANprintR: Improved Fakes and Evaluation of the State of the Art in Face Manipulation Detection. IEEE Journal of Selected Topics in Signal Processing 14, 5(2020), 1038–1048. https://doi.org/10.1109/JSTSP.2020.3007250Google ScholarCross Ref
- Yuval Nirkin, Yosi Keller, and Tal Hassner. 2019. FSGAN: Subject Agnostic Face Swapping and Reenactment. In IEEE International Conference on Computer Vision (ICCV). IEEE, New York, NY, USA, 7183–7192. https://doi.org/10.1109/ICCV.2019.00728Google ScholarCross Ref
- Y. Nirkin, I. Masi, A. T. Tuan, T. Hassner, and G. Medioni. 2018. On face segmentation, face swapping, and face perception. In IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). IEEE, New York, NY, USA, 98–105. https://doi.org/10.1109/FG.2018.00024Google ScholarDigital Library
- Manfred Nusseck, Douglas W. Cunningham, Christian Wallraven, and Heinrich H. Bülthoff. 2008. The contribution of different facial regions to the recognition of conversational expressions. Journal of Vision 8, 8 (06 2008), 1–1. https://doi.org/10.1167/8.8.1 arXiv:https://arvojournals.org/arvo/content_public/journal/jov/933530/jov-8-8-1.pdfGoogle ScholarCross Ref
- Effie J Pereira, Elina Birmingham, and Jelena Ristic. 2020. The eyes do not have it after all? Attention is not automatically biased towards faces and eyes. Psychological research 84, 5 (2020), 1407–1423. https://doi.org/10.1007/s00426-018-1130-4Google ScholarCross Ref
- Rista C Plate, Adrienne Wood, Kristina Woodard, and Seth D Pollak. 2019. Probabilistic learning of emotion categories. Journal of Experimental Psychology: General 148, 10 (2019), 1814. https://doi.org/10.1037/xge0000529Google ScholarCross Ref
- Andreas Rössler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2018. FaceForensics: A Large-scale Video Dataset for Forgery Detection in Human Faces. CoRR abs/1803.09179(2018). arxiv:1803.09179Google Scholar
- Andreas Rössler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. FaceForensics++: Learning to Detect Manipulated Facial Images. In IEEE International Conference on Computer Vision (ICCV). IEEE, New York, NY, USA, 1–11. https://doi.org/10.1109/ICCV.2019.00009Google ScholarCross Ref
- Guillaume A Rousselet, Marc J-M Macé, and Michèle Fabre-Thorpe. 2003. Is it an animal? Is it a human face? Fast processing in upright and inverted natural scenes. Journal of Vision 3, 6 (2003), 5–5. https://doi.org/10.1167/3.6.5Google ScholarCross Ref
- Hannah Scott, Jonathan P Batten, and Gustav Kuhn. 2019. Why are you looking at me? It’s because I’m talking, but mostly because I’m staring or not doing much. Attention, Perception, & Psychophysics 81, 1 (2019), 109–118. https://doi.org/10.3758/s13414-018-1588-6Google ScholarCross Ref
- Jan-Philipp Tauscher, Maryam Mustafa, and Marcus Magnor. 2017. Comparative analysis of three different modalities for perception of artifacts in videos. ACM Transactions on Applied Perception (TAP) 14, 4 (Sep 2017), 1–12. https://doi.org/10.1145/3129289Google ScholarDigital Library
- Cristian Vaccari and Andrew Chadwick. 2020. Deepfakes and disinformation: exploring the impact of synthetic political video on deception, uncertainty, and trust in news. Social Media+ Society 6, 1 (2020), 2056305120903408. https://doi.org/10.1177/2056305120903408Google ScholarCross Ref
- Goedele Van Belle, Meike Ramon, Philippe Lefèvre, and Bruno Rossion. 2010. Fixation patterns during recognition of personally familiar and unfamiliar faces. Frontiers in Psychology 1 (2010), 20. https://doi.org/10.3389/fpsyg.2010.00020Google ScholarCross Ref
- Melissa L-H Võ, Tim J Smith, Parag K Mital, and John M Henderson. 2012. Do the eyes really have it? Dynamic allocation of attention when viewing moving faces. Journal of vision 12, 13 (2012), 3–3. https://doi.org/10.1167/12.13.3Google ScholarCross Ref
- Christian Wallraven, Heinrich H. Bülthoff, Douglas W. Cunningham, Jan Fischer, and Dirk Bartz. 2007. Evaluation of Real-World and Computer-Generated Stylized Facial Expressions. ACM Transactions on Applied Perception (TAP) 4, 3 (Nov. 2007), 16–es. https://doi.org/10.1145/1278387.1278390Google ScholarDigital Library
- Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A Efros. 2020. CNN-generated images are surprisingly easy to spot... for now. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 7. IEEE, New York, NY, USA, 8692–8701. https://doi.org/10.1109/CVPR42600.2020.00872Google ScholarCross Ref
- Wenguan Wang, Jianbing Shen, Jianwen Xie, Ming-Ming Cheng, Haibin Ling, and Ali Borji. 2019. Revisiting video saliency prediction in the deep learning era. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 1(2019), 220–237. https://doi.org/10.1109/TPAMI.2019.2924417Google ScholarDigital Library
- Leslie Wöhler, Jann-Ole Henningson, Susana Castillo, and Marcus Magnor. 2020. PEFS: A Validated Dataset for Perceptual Experiments on Face Swap Portrait Videos. In International Conference on Computer Animation and Social Agents (CASA). Springer, Springer, Cham, 120–127. https://doi.org/10.1007/978-3-030-63426-1_13Google ScholarCross Ref
- Xin Yang, Yuezun Li, and Siwei Lyu. 2019. Exposing deep fakes using inconsistent head poses. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, New York, NY, USA, 8261–8265. https://doi.org/10.1109/ICASSP.2019.8683164Google ScholarCross Ref
- Gregory Zelinsky. 2013. Understanding scene understanding. Frontiers in Psychology 4 (2013), 954. https://doi.org/10.3389/fpsyg.2013.00954Google ScholarCross Ref
- Xucong Zhang, Yusuke Sugano, Mario Fritz, and Andreas Bulling. 2017. It’s written all over your face: Full-face appearance-based gaze estimation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, New York, NY, USA, 51–60. https://doi.org/10.1109/CVPRW.2017.284Google ScholarCross Ref
- Fabian Zimmermann and Matthias Kohring. 2020. Mistrust, disinforming news, and vote choice: A panel survey on the origins and consequences of believing disinformation in the 2017 German parliamentary election. Political Communication 37, 2 (2020), 215–237. https://doi.org/10.1080/10584609.2019.1686095Google ScholarCross Ref
Index Terms
- Towards Understanding Perceptual Differences between Genuine and Face-Swapped Videos
Recommendations
Human Perception of Visual Realism for Photo and Computer-Generated Face Images
Computer-generated (CG) face images are common in video games, advertisements, and other media. CG faces vary in their degree of realism, a factor that impacts viewer reactions. Therefore, efficient control of visual realism of face images is important. ...
Using eye tracking to examine expert-novice differences during simulated surgical training: A case study
AbstractEye tracking data can serve as a unique metric for comparing expert-novice differences by providing insights into attentional processes, which can lead to timely intervention and better instruction. In this study, we used eye tracking ...
Highlights- We use eye-tracking technology to compare expert-novice differences in gaze behaviors.
Saccadic delays on targets while watching videos
ETRA '12: Proceedings of the Symposium on Eye Tracking Research and ApplicationsTo observe whether there is a difference in eye gaze between doing a task, and watching a video of the task, we recorded the gaze of 17 subjects performing a simple surgical eye-hand coordination task. We also recorded eye gaze of the same subjects ...
Comments