skip to main content
10.1145/3548814.3551460acmconferencesArticle/Chapter ViewAbstractPublication PagessapConference Proceedingsconference-collections
research-article

Human Latent Metrics: Perceptual and Cognitive Response Correlates to Distance in GAN Latent Space for Facial Images

Authors Info & Claims
Published:22 September 2022Publication History

ABSTRACT

Generative adversarial networks (GANs) generate high-dimensional vector spaces (latent spaces) that can interchangeably represent vectors as images. Advancements have extended their ability to computationally generate images indistinguishable from real images such as faces, and more importantly, to manipulate images using their inherit vector values in the latent space. This interchangeability of latent vectors has the potential to calculate not only the distance in the latent space, but also the human perceptual and cognitive distance toward images, that is, how humans perceive and recognize images. However, it is still unclear how the distance in the latent space correlates with human perception and cognition. Our studies investigated the relationship between latent vectors and human perception or cognition through psycho-visual experiments that manipulates the latent vectors of face images. In the perception study, a change perception task was used to examine whether participants could perceive visual changes in face images before and after moving an arbitrary distance in the latent space. In the cognition study, a face recognition task was utilized to examine whether participants could recognize a face as the same, even after moving an arbitrary distance in the latent space. Our experiments show that the distance between face images in the latent space correlates with human perception and cognition for visual changes in face imagery, which can be modeled with a logistic function. By utilizing our methodology, it will be possible to interchangeably convert between the distance in the latent space and the metric of human perception and cognition, potentially leading to image processing that better reflects human perception and cognition.

References

  1. Rameen Abdal, Yipeng Qin, and Peter Wonka. 2020. Image2stylegan++: How to edit the embedded images?. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8296–8305.Google ScholarGoogle ScholarCross RefCross Ref
  2. Yuval Alaluf, Or Patashnik, and Daniel Cohen-Or. 2021. Only a matter of style: age transformation using a style-based regression model. ACM Trans. Graph. 40, 4 (July 2021), 1–12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R C Atkinson and R M Shiffrin. 1968. Human memory: A proposed system and its control processes. The psychology of learning and motivation: II. 249 (1968).Google ScholarGoogle Scholar
  4. David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, Joshua B. Tenenbaum, William T. Freeman, and Antonio Torralba. 2018. GAN Dissection: Visualizing and Understanding Generative Adversarial Networks. arxiv:1811.10597 [cs.CV]Google ScholarGoogle Scholar
  5. Amit H. Bermano, Rinon Gal, Yuval Alaluf, Ron Mokady, Yotam Nitzan, Omer Tov, Or Patashnik, and Daniel Cohen-Or. 2022. State-of-the-Art in the Architecture, Methods and Applications of StyleGAN. https://doi.org/10.48550/ARXIV.2202.14020Google ScholarGoogle Scholar
  6. Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques(SIGGRAPH ’99). ACM Press/Addison-Wesley Publishing Co., USA, 187–194.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J Deng, W Dong, R Socher, L Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248–255.Google ScholarGoogle ScholarCross RefCross Ref
  8. Alexey Dosovitskiy and Thomas Brox. 2016. Generating Images with Perceptual Similarity Metrics based on Deep Networks. arXiv (Feb. 2016). arxiv:1602.02644 [cs.LG]Google ScholarGoogle Scholar
  9. Bernhard Egger, William A P Smith, Ayush Tewari, Stefanie Wuhrer, Michael Zollhoefer, Thabo Beeler, Florian Bernard, Timo Bolkart, Adam Kortylewski, Sami Romdhani, Christian Theobalt, Volker Blanz, and Thomas Vetter. 2019. 3D Morphable Face Models – Past, Present and Future. (Sept. 2019). arxiv:1909.01815 [cs.CV]Google ScholarGoogle Scholar
  10. Chris Elsden, David Chatting, Michael Duggan, Andrew Carl Dwyer, and Pip Thornton. 2022. Zoom Obscura: Counterfunctional Design for Video-Conferencing. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 143, 17 pages. https://doi.org/10.1145/3491102.3501973Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Zhenglin Geng, Chen Cao, and Sergey Tulyakov. 2020. Towards Photo-Realistic Facial Expression Manipulation. Int. J. Comput. Vis. 128, 10 (Nov. 2020), 2744–2761.Google ScholarGoogle Scholar
  12. Gillian Rhodes, Andy Calder, Mark Johnson, and James V. Haxby. 2011. Oxford Handbook of Face Perception. In Oxford Handbook of Face Perception(1 ed.). Oxford University Press.Google ScholarGoogle Scholar
  13. Lore Goetschalckx, Alex Andonian, Aude Oliva, and Phillip Isola. 2019. GANalyze: Toward Visual Definitions of Cognitive Image Properties. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 5743–5752.Google ScholarGoogle Scholar
  14. Lore Goetschalckx, Alex Andonian, and Johan Wagemans. 2021. Generative adversarial networks unlock new methods for cognitive science. Trends in Cognitive Sciences 25, 9 (2021), 788–801. https://doi.org/10.1016/j.tics.2021.06.006Google ScholarGoogle ScholarCross RefCross Ref
  15. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. Advances in neural information processing systems 27 (2014).Google ScholarGoogle Scholar
  16. Ralph Norman Haber and L. G. Standing. 1969. Direct Measures of Short-Term Visual Storage. Quarterly Journal of Experimental Psychology 21, 1(1969), 43–54. https://doi.org/10.1080/14640746908400193 arXiv:https://doi.org/10.1080/14640746908400193Google ScholarGoogle ScholarCross RefCross Ref
  17. Phillip Isola, Jianxiong Xiao, Devi Parikh, Antonio Torralba, and Aude Oliva. 2013. What Makes a Photograph Memorable?IEEE transactions on pattern analysis and machine intelligence 36 (10 2013). https://doi.org/10.1109/TPAMI.2013.200Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P Isola, J Xiao, A Torralba, and A Oliva. 2011. What makes an image memorable?CVPR 2011 (2011).Google ScholarGoogle Scholar
  19. Wentao Jiang, Si Liu, Chen Gao, Jie Cao, Ran He, Jiashi Feng, and Shuicheng Yan. 2020. Psgan: Pose and expression robust spatial-aware gan for customizable makeup transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5194–5202.Google ScholarGoogle ScholarCross RefCross Ref
  20. Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. arXiv (March 2016). arxiv:1603.08155 [cs.CV]Google ScholarGoogle Scholar
  21. Kamila M Jozwik, Jonathan O’Keeffe, Katherine R Storrs, Wenxuan Guo, Tal Golan, and Nikolaus Kriegeskorte. 2022. Face dissimilarity judgments are predicted by representational distance in morphable and image-computable models. Proc. Natl. Acad. Sci. U. S. A. 119, 27 (July 2022), e2115047119.Google ScholarGoogle ScholarCross RefCross Ref
  22. Ryota Kanai and Frans A J Verstraten. 2004. Visual transients without feature changes are sufficient for the percept of a change. Vision Res. 44, 19 (2004), 2233–2240.Google ScholarGoogle ScholarCross RefCross Ref
  23. Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2021. Alias-Free Generative Adversarial Networks. In Proc. NeurIPS.Google ScholarGoogle Scholar
  24. Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4401–4410.Google ScholarGoogle ScholarCross RefCross Ref
  25. Shunichi Kasahara and Kazuma Takada. 2021. Stealth Updates of Visual Information by Leveraging Change Blindness and Computational Visual Morphing. ACM Trans. Appl. Percept. 18, 4 (Oct. 2021), 1–17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Cédric Laloyaux, Christel Devue, Stéphane Doyen, Elodie David, and Axel Cleeremans. 2008. Undetected changes in visible stimuli influence subsequent decisions. Conscious. Cogn. 17, 3 (Sept. 2008), 646–656.Google ScholarGoogle ScholarCross RefCross Ref
  27. Steven J Luck and Edward K Vogel. 2013. Visual working memory capacity: from psychophysics and neurobiology to individual differences. Trends Cogn. Sci. 17, 8 (Aug. 2013), 391–400.Google ScholarGoogle ScholarCross RefCross Ref
  28. Yaniv Morgenstern, Frieder Hartmann, Filipp Schmidt, Henning Tiedemann, Eugen Prokott, Guido Maiello, and Roland W Fleming. 2020. An image-computable model of human visual shape similarity. (Jan. 2020), 2020.01.10.901876 pages.Google ScholarGoogle Scholar
  29. Xingang Pan, Xiaohang Zhan, Bo Dai, Dahua Lin, Chen Change Loy, and Ping Luo. 2021. Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation. IEEE Transactions on Pattern Analysis and Machine Intelligence (Early Access) (2021), 1–1. https://doi.org/10.1109/TPAMI.2021.3115428Google ScholarGoogle ScholarCross RefCross Ref
  30. Ronald A Rensink. 2002. Change detection. Annu. Rev. Psychol. 53(2002), 245–277.Google ScholarGoogle ScholarCross RefCross Ref
  31. Ronald A Rensink. 2005. CHAPTER 13 - Change Blindness. In Neurobiology of Attention, Laurent Itti, Geraint Rees, and John K Tsotsos (Eds.). Academic Press, Burlington, 76–81.Google ScholarGoogle Scholar
  32. Ronald A Rensink, J Kevin O’Regan, and James J Clark. 1997. To see or not to see: The need for attention to perceive changes in scenes. Psychol. Sci. 8, 5 (Sept. 1997), 368–373.Google ScholarGoogle ScholarCross RefCross Ref
  33. rolux. 2019. stylegan2encoder. https://github.com/rolux/stylegan2encoder.Google ScholarGoogle Scholar
  34. Mark W Schurgin, John T Wixted, and Timothy F Brady. 2020. Psychophysical scaling reveals a unified theory of visual memory strength. Nat Hum Behav 4, 11 (Nov. 2020), 1156–1172.Google ScholarGoogle ScholarCross RefCross Ref
  35. Yujun Shen, Jinjin Gu, Xiaoou Tang, and Bolei Zhou. 2019. Interpreting the Latent Space of GANs for Semantic Face Editing. arXiv (July 2019). arxiv:1907.10786 [cs.CV]Google ScholarGoogle Scholar
  36. Yujun Shen, Jinjin Gu, Xiaoou Tang, and Bolei Zhou. 2020. Interpreting the Latent Space of GANs for Semantic Face Editing. In CVPR.Google ScholarGoogle Scholar
  37. J. Rafid Siddiqui. 2022. FExGAN-Meta: Facial Expression Generation with Meta Humans. https://doi.org/10.48550/ARXIV.2203.05975Google ScholarGoogle Scholar
  38. Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv (Sept. 2014). arxiv:1409.1556 [cs.CV]Google ScholarGoogle Scholar
  39. Gaeun Son, Dirk B. Walther, and Michael L. Mack. 2021. Scene wheels: Measuring perception and memory of real-world scenes with a continuous stimulus space. bioRxiv (2021). https://doi.org/10.1101/2020.10.09.333708 arXiv:https://www.biorxiv.org/content/early/2021/04/01/2020.10.09.333708.full.pdfGoogle ScholarGoogle Scholar
  40. Tim Valentine, Michael B Lewis, and Peter J Hills. 2016. Face-space: A unifying concept in face recognition research. Q. J. Exp. Psychol. 69, 10 (Oct. 2016), 1996–2019.Google ScholarGoogle ScholarCross RefCross Ref
  41. Evangelos Ververas and Stefanos Zafeiriou. 2020. SliderGAN: Synthesizing Expressive Face Images by Sliding 3D Blendshape Parameters. Int. J. Comput. Vis. 128, 10 (Nov. 2020), 2629–2650.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Yuri Viazovetskyi, Vladimir Ivashkin, and Evgeny Kashin. 2020. StyleGAN2 Distillation for Feed-Forward Image Manipulation. In Computer Vision – ECCV 2020, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer International Publishing, Cham, 170–186.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. 2018. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European conference on computer vision (ECCV). 325–341.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586–595.Google ScholarGoogle ScholarCross RefCross Ref
  45. zllrunning. 2019. face-parsing.PyTorch. https://github.com/zllrunning/face-parsing.PyTorch.Google ScholarGoogle Scholar

Index Terms

  1. Human Latent Metrics: Perceptual and Cognitive Response Correlates to Distance in GAN Latent Space for Facial Images

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SAP '22: ACM Symposium on Applied Perception 2022
      September 2022
      86 pages
      ISBN:9781450394550
      DOI:10.1145/3548814

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 September 2022

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate43of94submissions,46%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format