research-article

Human Latent Metrics: Perceptual and Cognitive Response Correlates to Distance in GAN Latent Space for Facial Images

Authors:
Kye Shimizu

Sony Computer Science Laboratories, Inc, Japan

Sony Computer Science Laboratories, Inc, Japan
View Profile

,
Naoto Ienaga

University of Tsukuba, Japan

University of Tsukuba, Japan
View Profile

,
Kazuma Takada

Sony Computer Science Laboratories, Inc, Japan and Okinawa Institute of Science and Technology Graduate University, Japan

Sony Computer Science Laboratories, Inc, Japan and Okinawa Institute of Science and Technology Graduate University, Japan
View Profile

,
Maki Sugimoto

Keio University, Japan

Keio University, Japan
View Profile

,
Shunichi Kasahara

Sony Computer Science Laboratories, Inc, Japan

Sony Computer Science Laboratories, Inc, Japan
View Profile

SAP '22: ACM Symposium on Applied Perception 2022September 2022Article No.: 3Pages 1–10https://doi.org/10.1145/3548814.3551460

Published:22 September 2022Publication History

SAP '22: ACM Symposium on Applied Perception 2022

Pages 1–10

ABSTRACT

Generative adversarial networks (GANs) generate high-dimensional vector spaces (latent spaces) that can interchangeably represent vectors as images. Advancements have extended their ability to computationally generate images indistinguishable from real images such as faces, and more importantly, to manipulate images using their inherit vector values in the latent space. This interchangeability of latent vectors has the potential to calculate not only the distance in the latent space, but also the human perceptual and cognitive distance toward images, that is, how humans perceive and recognize images. However, it is still unclear how the distance in the latent space correlates with human perception and cognition. Our studies investigated the relationship between latent vectors and human perception or cognition through psycho-visual experiments that manipulates the latent vectors of face images. In the perception study, a change perception task was used to examine whether participants could perceive visual changes in face images before and after moving an arbitrary distance in the latent space. In the cognition study, a face recognition task was utilized to examine whether participants could recognize a face as the same, even after moving an arbitrary distance in the latent space. Our experiments show that the distance between face images in the latent space correlates with human perception and cognition for visual changes in face imagery, which can be modeled with a logistic function. By utilizing our methodology, it will be possible to interchangeably convert between the distance in the latent space and the metric of human perception and cognition, potentially leading to image processing that better reflects human perception and cognition.

References

Rameen Abdal, Yipeng Qin, and Peter Wonka. 2020. Image2stylegan++: How to edit the embedded images?. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8296–8305.Google ScholarCross Ref
Yuval Alaluf, Or Patashnik, and Daniel Cohen-Or. 2021. Only a matter of style: age transformation using a style-based regression model. ACM Trans. Graph. 40, 4 (July 2021), 1–12.Google ScholarDigital Library
R C Atkinson and R M Shiffrin. 1968. Human memory: A proposed system and its control processes. The psychology of learning and motivation: II. 249 (1968).Google Scholar
David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, Joshua B. Tenenbaum, William T. Freeman, and Antonio Torralba. 2018. GAN Dissection: Visualizing and Understanding Generative Adversarial Networks. arxiv:1811.10597 [cs.CV]Google Scholar
Amit H. Bermano, Rinon Gal, Yuval Alaluf, Ron Mokady, Yotam Nitzan, Omer Tov, Or Patashnik, and Daniel Cohen-Or. 2022. State-of-the-Art in the Architecture, Methods and Applications of StyleGAN. https://doi.org/10.48550/ARXIV.2202.14020Google Scholar
Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques(SIGGRAPH ’99). ACM Press/Addison-Wesley Publishing Co., USA, 187–194.Google ScholarDigital Library
J Deng, W Dong, R Socher, L Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248–255.Google ScholarCross Ref
Alexey Dosovitskiy and Thomas Brox. 2016. Generating Images with Perceptual Similarity Metrics based on Deep Networks. arXiv (Feb. 2016). arxiv:1602.02644 [cs.LG]Google Scholar
Bernhard Egger, William A P Smith, Ayush Tewari, Stefanie Wuhrer, Michael Zollhoefer, Thabo Beeler, Florian Bernard, Timo Bolkart, Adam Kortylewski, Sami Romdhani, Christian Theobalt, Volker Blanz, and Thomas Vetter. 2019. 3D Morphable Face Models – Past, Present and Future. (Sept. 2019). arxiv:1909.01815 [cs.CV]Google Scholar
Chris Elsden, David Chatting, Michael Duggan, Andrew Carl Dwyer, and Pip Thornton. 2022. Zoom Obscura: Counterfunctional Design for Video-Conferencing. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 143, 17 pages. https://doi.org/10.1145/3491102.3501973Google ScholarDigital Library
Zhenglin Geng, Chen Cao, and Sergey Tulyakov. 2020. Towards Photo-Realistic Facial Expression Manipulation. Int. J. Comput. Vis. 128, 10 (Nov. 2020), 2744–2761.Google Scholar
Gillian Rhodes, Andy Calder, Mark Johnson, and James V. Haxby. 2011. Oxford Handbook of Face Perception. In Oxford Handbook of Face Perception(1 ed.). Oxford University Press.Google Scholar
Lore Goetschalckx, Alex Andonian, Aude Oliva, and Phillip Isola. 2019. GANalyze: Toward Visual Definitions of Cognitive Image Properties. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 5743–5752.Google Scholar
Lore Goetschalckx, Alex Andonian, and Johan Wagemans. 2021. Generative adversarial networks unlock new methods for cognitive science. Trends in Cognitive Sciences 25, 9 (2021), 788–801. https://doi.org/10.1016/j.tics.2021.06.006Google ScholarCross Ref
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. Advances in neural information processing systems 27 (2014).Google Scholar
Ralph Norman Haber and L. G. Standing. 1969. Direct Measures of Short-Term Visual Storage. Quarterly Journal of Experimental Psychology 21, 1(1969), 43–54. https://doi.org/10.1080/14640746908400193 arXiv:https://doi.org/10.1080/14640746908400193Google ScholarCross Ref
Phillip Isola, Jianxiong Xiao, Devi Parikh, Antonio Torralba, and Aude Oliva. 2013. What Makes a Photograph Memorable?IEEE transactions on pattern analysis and machine intelligence 36 (10 2013). https://doi.org/10.1109/TPAMI.2013.200Google ScholarDigital Library
P Isola, J Xiao, A Torralba, and A Oliva. 2011. What makes an image memorable?CVPR 2011 (2011).Google Scholar
Wentao Jiang, Si Liu, Chen Gao, Jie Cao, Ran He, Jiashi Feng, and Shuicheng Yan. 2020. Psgan: Pose and expression robust spatial-aware gan for customizable makeup transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5194–5202.Google ScholarCross Ref
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. arXiv (March 2016). arxiv:1603.08155 [cs.CV]Google Scholar
Kamila M Jozwik, Jonathan O’Keeffe, Katherine R Storrs, Wenxuan Guo, Tal Golan, and Nikolaus Kriegeskorte. 2022. Face dissimilarity judgments are predicted by representational distance in morphable and image-computable models. Proc. Natl. Acad. Sci. U. S. A. 119, 27 (July 2022), e2115047119.Google ScholarCross Ref
Ryota Kanai and Frans A J Verstraten. 2004. Visual transients without feature changes are sufficient for the percept of a change. Vision Res. 44, 19 (2004), 2233–2240.Google ScholarCross Ref
Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2021. Alias-Free Generative Adversarial Networks. In Proc. NeurIPS.Google Scholar
Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4401–4410.Google ScholarCross Ref
Shunichi Kasahara and Kazuma Takada. 2021. Stealth Updates of Visual Information by Leveraging Change Blindness and Computational Visual Morphing. ACM Trans. Appl. Percept. 18, 4 (Oct. 2021), 1–17.Google ScholarDigital Library
Cédric Laloyaux, Christel Devue, Stéphane Doyen, Elodie David, and Axel Cleeremans. 2008. Undetected changes in visible stimuli influence subsequent decisions. Conscious. Cogn. 17, 3 (Sept. 2008), 646–656.Google ScholarCross Ref
Steven J Luck and Edward K Vogel. 2013. Visual working memory capacity: from psychophysics and neurobiology to individual differences. Trends Cogn. Sci. 17, 8 (Aug. 2013), 391–400.Google ScholarCross Ref
Yaniv Morgenstern, Frieder Hartmann, Filipp Schmidt, Henning Tiedemann, Eugen Prokott, Guido Maiello, and Roland W Fleming. 2020. An image-computable model of human visual shape similarity. (Jan. 2020), 2020.01.10.901876 pages.Google Scholar
Xingang Pan, Xiaohang Zhan, Bo Dai, Dahua Lin, Chen Change Loy, and Ping Luo. 2021. Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation. IEEE Transactions on Pattern Analysis and Machine Intelligence (Early Access) (2021), 1–1. https://doi.org/10.1109/TPAMI.2021.3115428Google ScholarCross Ref
Ronald A Rensink. 2002. Change detection. Annu. Rev. Psychol. 53(2002), 245–277.Google ScholarCross Ref
Ronald A Rensink. 2005. CHAPTER 13 - Change Blindness. In Neurobiology of Attention, Laurent Itti, Geraint Rees, and John K Tsotsos (Eds.). Academic Press, Burlington, 76–81.Google Scholar
Ronald A Rensink, J Kevin O’Regan, and James J Clark. 1997. To see or not to see: The need for attention to perceive changes in scenes. Psychol. Sci. 8, 5 (Sept. 1997), 368–373.Google ScholarCross Ref
rolux. 2019. stylegan2encoder. https://github.com/rolux/stylegan2encoder.Google Scholar
Mark W Schurgin, John T Wixted, and Timothy F Brady. 2020. Psychophysical scaling reveals a unified theory of visual memory strength. Nat Hum Behav 4, 11 (Nov. 2020), 1156–1172.Google ScholarCross Ref
Yujun Shen, Jinjin Gu, Xiaoou Tang, and Bolei Zhou. 2019. Interpreting the Latent Space of GANs for Semantic Face Editing. arXiv (July 2019). arxiv:1907.10786 [cs.CV]Google Scholar
Yujun Shen, Jinjin Gu, Xiaoou Tang, and Bolei Zhou. 2020. Interpreting the Latent Space of GANs for Semantic Face Editing. In CVPR.Google Scholar
J. Rafid Siddiqui. 2022. FExGAN-Meta: Facial Expression Generation with Meta Humans. https://doi.org/10.48550/ARXIV.2203.05975Google Scholar
Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv (Sept. 2014). arxiv:1409.1556 [cs.CV]Google Scholar
Gaeun Son, Dirk B. Walther, and Michael L. Mack. 2021. Scene wheels: Measuring perception and memory of real-world scenes with a continuous stimulus space. bioRxiv (2021). https://doi.org/10.1101/2020.10.09.333708 arXiv:https://www.biorxiv.org/content/early/2021/04/01/2020.10.09.333708.full.pdfGoogle Scholar
Tim Valentine, Michael B Lewis, and Peter J Hills. 2016. Face-space: A unifying concept in face recognition research. Q. J. Exp. Psychol. 69, 10 (Oct. 2016), 1996–2019.Google ScholarCross Ref
Evangelos Ververas and Stefanos Zafeiriou. 2020. SliderGAN: Synthesizing Expressive Face Images by Sliding 3D Blendshape Parameters. Int. J. Comput. Vis. 128, 10 (Nov. 2020), 2629–2650.Google ScholarDigital Library
Yuri Viazovetskyi, Vladimir Ivashkin, and Evgeny Kashin. 2020. StyleGAN2 Distillation for Feed-Forward Image Manipulation. In Computer Vision – ECCV 2020, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer International Publishing, Cham, 170–186.Google ScholarDigital Library
Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. 2018. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European conference on computer vision (ECCV). 325–341.Google ScholarDigital Library
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586–595.Google ScholarCross Ref
zllrunning. 2019. face-parsing.PyTorch. https://github.com/zllrunning/face-parsing.PyTorch.Google Scholar

Index Terms

Human Latent Metrics: Perceptual and Cognitive Response Correlates to Distance in GAN Latent Space for Facial Images
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction devices
      1. Displays and imagers

Recommendations

Latent Dirichlet learning for document summarization
ICASSP '09: Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing

Automatic summarization is developed to extract the representative contents or sentences from a large corpus of documents. This paper presents a new hierarchical representation of words, sentences and documents in a corpus, and infers the Dirichlet ...
Read More
Multiobjective evolutionary search of the latent space of Generative Adversarial Networks for human face generation
GECCO '23 Companion: Proceedings of the Companion Conference on Genetic and Evolutionary Computation

This article presents an explicit multiobjective evolutionary approach for synthetic human face image generation, exploring the latent space of generative adversarial networks. The approach considers the similarity to a target image and the race ...
Read More
Discovering Latent Topics With Saliency-Weighted LDA for Image Scene Understanding
In the latent Dirichlet allocation (LDA) model, each image is represented by word distributions with their latent topics. Since the previous LDA-based models are not capable of dealing with the spatial information of visual words in images, this paper ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAP '22: ACM Symposium on Applied Perception 2022
September 2022
86 pages
ISBN:9781450394550
DOI:10.1145/3548814
Editors:
Manfred Lau
City University of Hong Kong, Hong Kong
,
Andrew Robb
Clemson University, USA
,
Michael Barnett-Cowan
University of Waterloo, Canada
,
Ana Serrano
Universidad de Zaragoza, Spain
,
Sandra Malpica
Universidad de Zaragoza, Spain
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 September 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
change perception
face cognition
generative adversarial networks
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate43of94submissions,46%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 151
  Total Downloads
- Downloads (Last 12 months)81
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Human Latent Metrics: Perceptual and Cognitive Response Correlates to Distance in GAN Latent Space for Facial Images

SAP '22: ACM Symposium on Applied Perception 2022

ABSTRACT

References

Cited By

Index Terms

Recommendations

Latent Dirichlet learning for document summarization

Multiobjective evolutionary search of the latent space of Generative Adversarial Networks for human face generation

Discovering Latent Topics With Saliency-Weighted LDA for Image Scene Understanding

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Human Latent Metrics: Perceptual and Cognitive Response Correlates to Distance in GAN Latent Space for Facial Images

SAP '22: ACM Symposium on Applied Perception 2022

ABSTRACT

References

Cited By

Index Terms

Recommendations

Latent Dirichlet learning for document summarization

Multiobjective evolutionary search of the latent space of Generative Adversarial Networks for human face generation

Discovering Latent Topics With Saliency-Weighted LDA for Image Scene Understanding

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media