How Video Meetings Change Your Expression

Sarin, Sumit; Mall, Utkarsh; Tendulkar, Purva; Vondrick, Carl

doi:10.1007/978-3-031-72643-9_10

Sumit Sarin¹³,
Utkarsh Mall¹³,
Purva Tendulkar¹³ &
…
Carl Vondrick¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15075))

Included in the following conference series:

European Conference on Computer Vision

208 Accesses

Abstract

Do our facial expressions change when we speak over video calls? Given two unpaired sets of videos of people, we seek to automatically find spatio-temporal patterns that are distinctive of each set. Existing methods use discriminative approaches and perform post-hoc explainability analysis. Such methods are insufficient as they are unable to provide insights beyond obvious dataset biases, and the explanations are useful only if humans themselves are good at the task. Instead, we tackle the problem through the lens of generative domain translation: our method generates a detailed report of learned, input-dependent spatio-temporal features and the extent to which they vary between the domains. We demonstrate that our method can discover behavioral differences between conversing face-to-face (F2F) and on video-calls (VCs). We also show the applicability of our method on discovering differences in presidential communication styles. Additionally, we are able to predict temporal change-points in videos that decouple expressions in an unsupervised way, and increase the interpretability and usefulness of our model. Finally, our method, being generative, can be used to transform a video call to appear as if it were recorded in a F2F setting. Experiments and visualizations show our approach is able to discover a range of behaviors, taking a step towards deeper understanding of human behaviors. Video results, code and data can be found at facet.cs.columbia.edu.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

MTVE: Magdeburg tool for video experiments

Article Open access 02 April 2024

Can deepfakes be used to study emotion perception? A comparison of dynamic face stimuli

Article Open access 04 June 2024

Facial signals shape predictions about the nature of upcoming conversational responses

Article Open access 09 January 2025

References

Zhao, N., Zhang, X., Noah, J.A., Tiede, M., Hirsch, J.: Separable processes for live “in-person” and live “zoom-like” faces. Imaging Neurosci. (2023)
Google Scholar
Balters, S., Miller, J.G., Li, R., Hawthorne, G., Reiss, A.L.: Virtual (Zoom) Interactions Alter Conversational Behavior and Inter-Brain Coherence. bioRxiv (2023)
Google Scholar
Matz, S., Harari, G.: Personality–place transactions: mapping the relationships between big five personality traits, states, and daily places. J. Personal. Soc. Psychol. (2020)
Google Scholar
Khan, M.R.: A review of the effects of virtual communication on performance and satisfaction across the last ten years of research. J. Appl. Behav. Anal. (2021)
Google Scholar
Archibald, M., Ambagtsheer, R., Casey, M., Lawless, M.: Using zoom videoconferencing for qualitative data collection: perceptions and experiences of researchers and participants. Int. J. Qualit. Methods (2019)
Google Scholar
Nesher Shoshan, H., Wehrt, W.: Understanding “zoom fatigue”: a mixed-method approach. Appl. Psychol. (2022)
Google Scholar
Fauville, G., Luo, M., Queiroz, A.C.M., Bailenson, J.N., Hancock, J.: Zoom Exhaustion and Fatigue Scale. Comput. Human Behav. Rep. (2021)
Google Scholar
Bailenson, J.N.: Nonverbal Overload: A Theoretical Argument for the Causes of Zoom Fatigue, Mind, and Behavior, Technology (2021)
Google Scholar
Boland, J., Fonseca, P., Mermelstein, I., Williamson, M.: Zoom disrupts the rhythm of conversation. J. Exp. Psychol. Gen. (2021)
Google Scholar
Fauville, G., Luo, M., Queiroz, A.C., Bailenson, J., Hancock, J.: Zoom exhaustion and fatigue scale. SSRN Electron. J. (2021)
Google Scholar
Hoehe, M., Thibaut, F.: Going digital: how technology use may influence human brains and behavior. Dialog. Clin. Neurosci. (2020)
Google Scholar
Numata, T., et al.: Achieving affective human–virtual agent communication by enabling virtual agents to imitate positive expressions. Sci. Rep. (2020)
Google Scholar
Smith, H.J., Neff, M.: Communication behavior in embodied virtual reality. In: ACM CHI (2018)
Google Scholar
Geng, S., Teotia, R., Tendulkar, P., Menon, S., Vondrick, C.: Affective faces for goal-driven dyadic communication. CoRR (2023)
Google Scholar
Fong, R., Patrick, M., Vedaldi, A.: Understanding deep networks via extremal perturbations and smooth masks. In: ICCV (2019)
Google Scholar
Petsiuk, V., Das, A., Saenko, K.: Rise: randomized input sampling for explanation of black-box models. CoRR (2018)
Google Scholar
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: ICCV (2017)
Google Scholar
Shitole, V., Li, F., Kahng, M., Tadepalli, P., Fern, A.: One explanation is not enough: structured attention graphs for image classification. In: NeurIPS (2021)
Google Scholar
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: ECCV (2014)
Google Scholar
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: CVPR (2016)
Google Scholar
Gunning, D., Aha, D.: Darpa’s explainable artificial intelligence (XAI) Program. AI Magazine (2019)
Google Scholar
Goyal, Y., Wu, Z., Ernst, J., Batra, D., Parikh, D., Lee, S.: Counterfactual visual explanations. In: ICML (2019)
Google Scholar
Vandenhende, S., Mahajan, D., Radenovic, F., Ghadiyaram, D.: Making heads or tails: towards semantically consistent visual counterfactuals. In: ECCV (2022)
Google Scholar
Wang, P., Vasconcelos, N.: Scout: self-aware discriminant counterfactual explanations. In: CVPR (2020)
Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should i trust you?” explaining the predictions of any classifier. In: SIGKDD (2016)
Google Scholar
Koh, P.W., Liang, P.: Understanding black-box predictions via influence functions. In: ICML (2017)
Google Scholar
Yeh, C.-K., Kim, J., Yen, I. E.-H., Ravikumar, P.K.: Representer point selection for explaining deep neural networks. In: NeurIPS (2018)
Google Scholar
Tsai, C.-P., Yeh, C.-K., Ravikumar, P.: Sample based explanations via generalized representers. In: CoRR (2023)
Google Scholar
Sui, Y., Wu, G., Sanner, S.: Representer point selection via local Jacobian expansion for post-hoc classifier explanation of deep neural networks and ensemble models. In: NeurIPS (2021)
Google Scholar
Pruthi, G., Liu, F., Sundararajan, M., Kale, S.: Estimating training data influence by tracking gradient descent. CoRR (2020)
Google Scholar
Silva, A., Chopra, R., Gombolay, M.C.: Cross-loss influence functions to explain deep network representations. In: AISTATS (2020)
Google Scholar
Guo, H., Rajani, N., Hase, P., Bansal, M., Xiong, C.: “Fastif: scalable influence functions for efficient model interpretation and debugging. CoRR (2020)
Google Scholar
Pan, W., Cui, S., Bian, J., Zhang, C., Wang, F.: Explaining algorithmic fairness through fairness-aware causal path decomposition. In: SIGKDD (2021)
Google Scholar
Pradhan, R., Zhu, J., Glavic, B., Salimi, B.: Interpretable data-based explanations for fairness debugging. In: SIGMOD (2022)
Google Scholar
Meng, C., Trinh, L., Xu, N., Enouen, J., Liu, Y.: Interpretability and fairness evaluation of deep learning models on mimic-iv dataset. Sci. Rep. (2022)
Google Scholar
Alelyani, S.: Detection and evaluation of machine learning bias. Appl. Sci. (2021)
Google Scholar
Gilpin, L.H., Bau, D., Yuan, B.Z., Bajwa, A., Specter, M., Kagal, L.: Explaining explanations: an overview of interpretability of machine learning. In: DSAA (2018)
Google Scholar
Kim, S.S., Meister, N., Ramaswamy, V.V., Fong, R., Russakovsky, O.: Hive: evaluating the human interpretability of visual explanations. In: ECCV (2022)
Google Scholar
Selvaraju, R.R., et al.: Squinting at VGA models: introspecting VGA models with sub-questions. In: CVPR (2020)
Google Scholar
Das, A., Agrawal, H., Zitnick, L., Parikh, D., Batra, D.: Human attention in visual question answering: do humans and deep networks look at the same regions? In: Computer Vision and Image Understanding (2017)
Google Scholar
Brendel, W., Bethge, M.: Approximating CNNs with bag-of-local-features models works surprisingly well on imagenet. CoRR (2019)
Google Scholar
Bohle, M., Fritz, M., Schiele, B.: Convolutional dynamic alignment networks for interpretable classifications. In: CVPR (2021)
Google Scholar
Böhle, M., Fritz, M., Schiele, B.: B-cos networks: alignment is all we need for interpretability. In: CVPR (2022)
Google Scholar
Chen, C., Li, O., Tao, D., Barnett, A., Rudin, C., Su, J.K.: This looks like that: deep learning for interpretable image recognition. In: NeurIPS (2019)
Google Scholar
Donnelly, J., Barnett, A.J., Chen, C.: Deformable protopnet: an interpretable image classifier using deformable prototypes. In: CVPR (2022)
Google Scholar
Koh, P.W., et al.: Concept bottleneck models. In: ICML (2020)
Google Scholar
Hastie, T., Tibshirani, R.: Generalized additive models. Statist. Sci. (1986)
Google Scholar
Lou, Y., Caruana, R., Gehrke, J., Hooker, G.: Accurate intelligible models with pairwise interactions. In: SIGKDD (2013)
Google Scholar
Dubey, A., Radenovic, F., Mahajan, D.: Scalable interpretability via polynomials. In: NeurIPS (2022)
Google Scholar
Radenovic, F., Dubey, A., Mahajan, D.: Neural basis models for interpretability. In: NeurIPS (2022)
Google Scholar
Chang, C.-H., Caruana, R., Goldenberg, A.: Node-gam: neural generalized additive model for interpretable deep learning. In: ICLR (2022)
Google Scholar
Agarwal, R., et al.: Neural additive models: interpretable machine learning with neural nets. In: NeurIPS (2021)
Google Scholar
Burgess, C.P., et al.: Understanding disentangling in $\beta $-vae. In: CoRR (2018)
Google Scholar
Zhu, Z., Luo, P., Wang, X., Tang, X.: Multi-view perceptron: a deep model for learning face identity and view representations. In: NeurIPS (2014)
Google Scholar
Reed, S.E., Sohn, K., Zhang, Y., Lee, H.: Learning to disentangle factors of variation with manifold interaction. In: ICML (2014)
Google Scholar
Whitney, W.F., Chang, M., Kulkarni, T.D., Tenenbaum, J.B.: Understanding visual concepts with continuation learning. CoRR (2016)
Google Scholar
Cheung, B., Livezey, J.A., Bansal, A.K., Olshausen, B.A.: Discovering hidden factors of variation in deep networks. CoRR (2014)
Google Scholar
Lin, Z., Thekumparampil, K.K., Fanti, G.C., Oh, S.: Infogan-cr: disentangling generative adversarial networks with contrastive regularizers. CoRR (2019)
Google Scholar
Jeon, I., Lee, W., Pyeon, M., Kim, G.: IB-GAN: disentangled representation learning with information bottleneck generative adversarial networks. In: AAAI (2021)
Google Scholar
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: Infogan: interpretable representation learning by information maximizing generative adversarial nets. In: NeurIPS (2016)
Google Scholar
Ramesh, A., Choi, Y., LeCun, Y.: A spectral regularizer for unsupervised disentanglement. CoRR (2018)
Google Scholar
Dalva, Y., Altındiş, S. F., Dundar, A.: Vecgan: image-to-image translation with interpretable latent directions. In: ECCV (2022)
Google Scholar
Dalva, Y., Pehlivan, H., Moran, C., Hatipoğlu, Ö.I., Dündar, A.: Face attribute editing with disentangled latent vectors. CoRR (2023)
Google Scholar
Higgins, I., et al.: beta-VAE: learning basic visual concepts with a constrained variational framework. In: ICLR (2017)
Google Scholar
Kim, H., Mnih, A.: Disentangling by factorising. In: ICML (2018)
Google Scholar
Chen, T.Q., Li, X., Grosse, R.B., Duvenaud, D.K.: Isolating sources of disentanglement in variational autoencoders. CoRR (2018)
Google Scholar
Jeong, Y., Song, H.O.: Learning discrete and continuous factors of data via alternating disentanglement. In: ICML (2019)
Google Scholar
Kumar, A., Sattigeri, P., Balakrishnan, A.: Variational inference of disentangled latent concepts from unlabeled observations. CoRR (2017)
Google Scholar
Yesu, K., Shandilya, S., Rekharaj, N., Ankit, K., Sairam, P.S.: Big five personality traits inference from five facial shapes using CNN. In: International Conference on Computing, Power and Communication Technologies (GUCON) (2021)
Google Scholar
Knyazev, G.G., Bocharov, A.V., Slobodskaya, H.R., Ryabichenko, T.I.: Personality-linked biases in perception of emotional facial expressions. In: Personality and Individual Differences (2008)
Google Scholar
Kachur, A., Osin, E., Davydov, D., Shutilov, K., Novokshonov, A.: Assessing the big five personality traits using real-life static facial images. Sci. Rep. (2020)
Google Scholar
Büdenbender, B., Höfling, T.T.A., Gerdes, A.B.M., Alpers, G.W.: Training machine learning algorithms for automatic facial coding: the role of emotional facial expressions prototypicality. PLOS One (2023)
Google Scholar
Stahelski, A., Anderson, A., Browitt, N., Radeke, M.: Facial expressions and emotion labels are separate initiators of trait inferences from the face. Front. Psychol. (2021)
Google Scholar
Snoek, L., et al.: Testing, explaining, and exploring models of facial expressions of emotions. Sci. Adv. (2023)
Google Scholar
Straulino, E., Scarpazza, C., Sartori, L.: What is missing in the study of emotion expression? Front. Psychol. (2023)
Google Scholar
Du, S., Tao, Y., Martinez, A.M.: Compound facial expressions of emotion. PNAS (2014)
Google Scholar
Minetaki, K.: Facial expression and description of personality. In: ACM MISNC (2023)
Google Scholar
Jonell, P., Kucherenko, T., Henter, G.E., Beskow, J.: Let’s face it: probabilistic multi-modal interlocutor-aware generation of facial gestures in dyadic settings. In: ACM IVA (2020)
Google Scholar
Ng, E., Subramanian, S., Klein, D., Kanazawa, A., Darrell, T., Ginosar, S.: Can language models learn to listen? In: ICCV (2023)
Google Scholar
Ng, E., et al.: Learning to listen: modeling non-deterministic dyadic facial motion. In: CVPR (2022)
Google Scholar
Burgess, C.P., et al.: Understanding disentangling in $\beta $-vae. (2018)
Google Scholar
Li, Z., Liu, H.: Beta-VAE has 2 behaviors: PCA or ICA? (2023)
Google Scholar
García de Herreros García, P.: Towards latent space disentanglement of variational autoencoders for language (2022)
Google Scholar
Pastrana, R.: Disentangling variational autoencoders. CoRR (2022)
Google Scholar
Higgins, I., et al.: Unsupervised deep learning identifies semantic disentanglement in single inferotemporal face patch neurons. Nat. Commun. (2021)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: NeurIPS (2014)
Google Scholar
Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)
Google Scholar
Chakrabarty, A., Das, S.: On translation and reconstruction guarantees of the cycle-consistent generative adversarial networks. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems, vol. 35, pp. 23607–23620. Curran Associates, Inc. (2022)
Google Scholar
Shen, Z., Zhou, S.K., Chen, Y., Georgescu, B., Liu, X., Huang, T.: One-to-one mapping for unpaired image-to-image translation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1170–1179 (2020)
Google Scholar
Wang, T.-C., et al.: Video-to-video synthesis. In: NeurIPS (2018)
Google Scholar
Kuster, C., Popa, T., Bazin, J.-C., Gotsman, C., Gross, M.: Gaze correction for home video conferencing. In: ACM TOG (2012)
Google Scholar
Hill, F.: The gesture that encapsulates remote-work life. The Atlantic, 20 July (2023)
Google Scholar
Schwarz, G.: Estimating the dimension of a model. The Annals of Statistics (1978)
Google Scholar
Denby, D.: The three faces of Trump. The New Yorker, August (2015)
Google Scholar
Collett, P.: The seven faces of Donald Trump—a psychologist’s view. The Guardian, January (2017)
Google Scholar
Golshan, T.: Donald Trump’s unique speaking style, explained by linguists. Vox, January (2017)
Google Scholar
Locatello, F., et al.: Challenging common assumptions in the unsupervised learning of disentangled representations. In: ICML (2019)
Google Scholar

Download references

Acknowledgements

This research is based on work partially supported by the DARPA CCU program under contract HR001122C0034 and the National Science Foundation AI Institute for Artificial and Natural Intelligence (ARNI). PT is supported by the Apple PhD fellowship.

Author information

Authors and Affiliations

Columbia University, New York, USA
Sumit Sarin, Utkarsh Mall, Purva Tendulkar & Carl Vondrick

Authors

Sumit Sarin
View author publications
You can also search for this author in PubMed Google Scholar
Utkarsh Mall
View author publications
You can also search for this author in PubMed Google Scholar
Purva Tendulkar
View author publications
You can also search for this author in PubMed Google Scholar
Carl Vondrick
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sumit Sarin .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 502 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sarin, S., Mall, U., Tendulkar, P., Vondrick, C. (2025). How Video Meetings Change Your Expression. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15075. Springer, Cham. https://doi.org/10.1007/978-3-031-72643-9_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-72643-9_10
Published: 22 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72642-2
Online ISBN: 978-3-031-72643-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics