ABSTRACT
Generative adversarial networks (GANs) generate high-dimensional vector spaces (latent spaces) that can interchangeably represent vectors as images. Advancements have extended their ability to computationally generate images indistinguishable from real images such as faces, and more importantly, to manipulate images using their inherit vector values in the latent space. This interchangeability of latent vectors has the potential to calculate not only the distance in the latent space, but also the human perceptual and cognitive distance toward images, that is, how humans perceive and recognize images. However, it is still unclear how the distance in the latent space correlates with human perception and cognition. Our studies investigated the relationship between latent vectors and human perception or cognition through psycho-visual experiments that manipulates the latent vectors of face images. In the perception study, a change perception task was used to examine whether participants could perceive visual changes in face images before and after moving an arbitrary distance in the latent space. In the cognition study, a face recognition task was utilized to examine whether participants could recognize a face as the same, even after moving an arbitrary distance in the latent space. Our experiments show that the distance between face images in the latent space correlates with human perception and cognition for visual changes in face imagery, which can be modeled with a logistic function. By utilizing our methodology, it will be possible to interchangeably convert between the distance in the latent space and the metric of human perception and cognition, potentially leading to image processing that better reflects human perception and cognition.
- Rameen Abdal, Yipeng Qin, and Peter Wonka. 2020. Image2stylegan++: How to edit the embedded images?. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8296–8305.Google ScholarCross Ref
- Yuval Alaluf, Or Patashnik, and Daniel Cohen-Or. 2021. Only a matter of style: age transformation using a style-based regression model. ACM Trans. Graph. 40, 4 (July 2021), 1–12.Google ScholarDigital Library
- R C Atkinson and R M Shiffrin. 1968. Human memory: A proposed system and its control processes. The psychology of learning and motivation: II. 249 (1968).Google Scholar
- David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, Joshua B. Tenenbaum, William T. Freeman, and Antonio Torralba. 2018. GAN Dissection: Visualizing and Understanding Generative Adversarial Networks. arxiv:1811.10597 [cs.CV]Google Scholar
- Amit H. Bermano, Rinon Gal, Yuval Alaluf, Ron Mokady, Yotam Nitzan, Omer Tov, Or Patashnik, and Daniel Cohen-Or. 2022. State-of-the-Art in the Architecture, Methods and Applications of StyleGAN. https://doi.org/10.48550/ARXIV.2202.14020Google Scholar
- Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques(SIGGRAPH ’99). ACM Press/Addison-Wesley Publishing Co., USA, 187–194.Google ScholarDigital Library
- J Deng, W Dong, R Socher, L Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248–255.Google ScholarCross Ref
- Alexey Dosovitskiy and Thomas Brox. 2016. Generating Images with Perceptual Similarity Metrics based on Deep Networks. arXiv (Feb. 2016). arxiv:1602.02644 [cs.LG]Google Scholar
- Bernhard Egger, William A P Smith, Ayush Tewari, Stefanie Wuhrer, Michael Zollhoefer, Thabo Beeler, Florian Bernard, Timo Bolkart, Adam Kortylewski, Sami Romdhani, Christian Theobalt, Volker Blanz, and Thomas Vetter. 2019. 3D Morphable Face Models – Past, Present and Future. (Sept. 2019). arxiv:1909.01815 [cs.CV]Google Scholar
- Chris Elsden, David Chatting, Michael Duggan, Andrew Carl Dwyer, and Pip Thornton. 2022. Zoom Obscura: Counterfunctional Design for Video-Conferencing. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 143, 17 pages. https://doi.org/10.1145/3491102.3501973Google ScholarDigital Library
- Zhenglin Geng, Chen Cao, and Sergey Tulyakov. 2020. Towards Photo-Realistic Facial Expression Manipulation. Int. J. Comput. Vis. 128, 10 (Nov. 2020), 2744–2761.Google Scholar
- Gillian Rhodes, Andy Calder, Mark Johnson, and James V. Haxby. 2011. Oxford Handbook of Face Perception. In Oxford Handbook of Face Perception(1 ed.). Oxford University Press.Google Scholar
- Lore Goetschalckx, Alex Andonian, Aude Oliva, and Phillip Isola. 2019. GANalyze: Toward Visual Definitions of Cognitive Image Properties. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 5743–5752.Google Scholar
- Lore Goetschalckx, Alex Andonian, and Johan Wagemans. 2021. Generative adversarial networks unlock new methods for cognitive science. Trends in Cognitive Sciences 25, 9 (2021), 788–801. https://doi.org/10.1016/j.tics.2021.06.006Google ScholarCross Ref
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. Advances in neural information processing systems 27 (2014).Google Scholar
- Ralph Norman Haber and L. G. Standing. 1969. Direct Measures of Short-Term Visual Storage. Quarterly Journal of Experimental Psychology 21, 1(1969), 43–54. https://doi.org/10.1080/14640746908400193 arXiv:https://doi.org/10.1080/14640746908400193Google ScholarCross Ref
- Phillip Isola, Jianxiong Xiao, Devi Parikh, Antonio Torralba, and Aude Oliva. 2013. What Makes a Photograph Memorable?IEEE transactions on pattern analysis and machine intelligence 36 (10 2013). https://doi.org/10.1109/TPAMI.2013.200Google ScholarDigital Library
- P Isola, J Xiao, A Torralba, and A Oliva. 2011. What makes an image memorable?CVPR 2011 (2011).Google Scholar
- Wentao Jiang, Si Liu, Chen Gao, Jie Cao, Ran He, Jiashi Feng, and Shuicheng Yan. 2020. Psgan: Pose and expression robust spatial-aware gan for customizable makeup transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5194–5202.Google ScholarCross Ref
- Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. arXiv (March 2016). arxiv:1603.08155 [cs.CV]Google Scholar
- Kamila M Jozwik, Jonathan O’Keeffe, Katherine R Storrs, Wenxuan Guo, Tal Golan, and Nikolaus Kriegeskorte. 2022. Face dissimilarity judgments are predicted by representational distance in morphable and image-computable models. Proc. Natl. Acad. Sci. U. S. A. 119, 27 (July 2022), e2115047119.Google ScholarCross Ref
- Ryota Kanai and Frans A J Verstraten. 2004. Visual transients without feature changes are sufficient for the percept of a change. Vision Res. 44, 19 (2004), 2233–2240.Google ScholarCross Ref
- Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2021. Alias-Free Generative Adversarial Networks. In Proc. NeurIPS.Google Scholar
- Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4401–4410.Google ScholarCross Ref
- Shunichi Kasahara and Kazuma Takada. 2021. Stealth Updates of Visual Information by Leveraging Change Blindness and Computational Visual Morphing. ACM Trans. Appl. Percept. 18, 4 (Oct. 2021), 1–17.Google ScholarDigital Library
- Cédric Laloyaux, Christel Devue, Stéphane Doyen, Elodie David, and Axel Cleeremans. 2008. Undetected changes in visible stimuli influence subsequent decisions. Conscious. Cogn. 17, 3 (Sept. 2008), 646–656.Google ScholarCross Ref
- Steven J Luck and Edward K Vogel. 2013. Visual working memory capacity: from psychophysics and neurobiology to individual differences. Trends Cogn. Sci. 17, 8 (Aug. 2013), 391–400.Google ScholarCross Ref
- Yaniv Morgenstern, Frieder Hartmann, Filipp Schmidt, Henning Tiedemann, Eugen Prokott, Guido Maiello, and Roland W Fleming. 2020. An image-computable model of human visual shape similarity. (Jan. 2020), 2020.01.10.901876 pages.Google Scholar
- Xingang Pan, Xiaohang Zhan, Bo Dai, Dahua Lin, Chen Change Loy, and Ping Luo. 2021. Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation. IEEE Transactions on Pattern Analysis and Machine Intelligence (Early Access) (2021), 1–1. https://doi.org/10.1109/TPAMI.2021.3115428Google ScholarCross Ref
- Ronald A Rensink. 2002. Change detection. Annu. Rev. Psychol. 53(2002), 245–277.Google ScholarCross Ref
- Ronald A Rensink. 2005. CHAPTER 13 - Change Blindness. In Neurobiology of Attention, Laurent Itti, Geraint Rees, and John K Tsotsos (Eds.). Academic Press, Burlington, 76–81.Google Scholar
- Ronald A Rensink, J Kevin O’Regan, and James J Clark. 1997. To see or not to see: The need for attention to perceive changes in scenes. Psychol. Sci. 8, 5 (Sept. 1997), 368–373.Google ScholarCross Ref
- rolux. 2019. stylegan2encoder. https://github.com/rolux/stylegan2encoder.Google Scholar
- Mark W Schurgin, John T Wixted, and Timothy F Brady. 2020. Psychophysical scaling reveals a unified theory of visual memory strength. Nat Hum Behav 4, 11 (Nov. 2020), 1156–1172.Google ScholarCross Ref
- Yujun Shen, Jinjin Gu, Xiaoou Tang, and Bolei Zhou. 2019. Interpreting the Latent Space of GANs for Semantic Face Editing. arXiv (July 2019). arxiv:1907.10786 [cs.CV]Google Scholar
- Yujun Shen, Jinjin Gu, Xiaoou Tang, and Bolei Zhou. 2020. Interpreting the Latent Space of GANs for Semantic Face Editing. In CVPR.Google Scholar
- J. Rafid Siddiqui. 2022. FExGAN-Meta: Facial Expression Generation with Meta Humans. https://doi.org/10.48550/ARXIV.2203.05975Google Scholar
- Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv (Sept. 2014). arxiv:1409.1556 [cs.CV]Google Scholar
- Gaeun Son, Dirk B. Walther, and Michael L. Mack. 2021. Scene wheels: Measuring perception and memory of real-world scenes with a continuous stimulus space. bioRxiv (2021). https://doi.org/10.1101/2020.10.09.333708 arXiv:https://www.biorxiv.org/content/early/2021/04/01/2020.10.09.333708.full.pdfGoogle Scholar
- Tim Valentine, Michael B Lewis, and Peter J Hills. 2016. Face-space: A unifying concept in face recognition research. Q. J. Exp. Psychol. 69, 10 (Oct. 2016), 1996–2019.Google ScholarCross Ref
- Evangelos Ververas and Stefanos Zafeiriou. 2020. SliderGAN: Synthesizing Expressive Face Images by Sliding 3D Blendshape Parameters. Int. J. Comput. Vis. 128, 10 (Nov. 2020), 2629–2650.Google ScholarDigital Library
- Yuri Viazovetskyi, Vladimir Ivashkin, and Evgeny Kashin. 2020. StyleGAN2 Distillation for Feed-Forward Image Manipulation. In Computer Vision – ECCV 2020, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer International Publishing, Cham, 170–186.Google ScholarDigital Library
- Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. 2018. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European conference on computer vision (ECCV). 325–341.Google ScholarDigital Library
- Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586–595.Google ScholarCross Ref
- zllrunning. 2019. face-parsing.PyTorch. https://github.com/zllrunning/face-parsing.PyTorch.Google Scholar
Index Terms
- Human Latent Metrics: Perceptual and Cognitive Response Correlates to Distance in GAN Latent Space for Facial Images
Recommendations
Latent Dirichlet learning for document summarization
ICASSP '09: Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal ProcessingAutomatic summarization is developed to extract the representative contents or sentences from a large corpus of documents. This paper presents a new hierarchical representation of words, sentences and documents in a corpus, and infers the Dirichlet ...
Multiobjective evolutionary search of the latent space of Generative Adversarial Networks for human face generation
GECCO '23 Companion: Proceedings of the Companion Conference on Genetic and Evolutionary ComputationThis article presents an explicit multiobjective evolutionary approach for synthetic human face image generation, exploring the latent space of generative adversarial networks. The approach considers the similarity to a target image and the race ...
Discovering Latent Topics With Saliency-Weighted LDA for Image Scene Understanding
In the latent Dirichlet allocation (LDA) model, each image is represented by word distributions with their latent topics. Since the previous LDA-based models are not capable of dealing with the spatial information of visual words in images, this paper ...
Comments