Skip to main content

How Video Meetings Change Your Expression

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15075))

Included in the following conference series:

  • 208 Accesses

Abstract

Do our facial expressions change when we speak over video calls? Given two unpaired sets of videos of people, we seek to automatically find spatio-temporal patterns that are distinctive of each set. Existing methods use discriminative approaches and perform post-hoc explainability analysis. Such methods are insufficient as they are unable to provide insights beyond obvious dataset biases, and the explanations are useful only if humans themselves are good at the task. Instead, we tackle the problem through the lens of generative domain translation: our method generates a detailed report of learned, input-dependent spatio-temporal features and the extent to which they vary between the domains. We demonstrate that our method can discover behavioral differences between conversing face-to-face (F2F) and on video-calls (VCs). We also show the applicability of our method on discovering differences in presidential communication styles. Additionally, we are able to predict temporal change-points in videos that decouple expressions in an unsupervised way, and increase the interpretability and usefulness of our model. Finally, our method, being generative, can be used to transform a video call to appear as if it were recorded in a F2F setting. Experiments and visualizations show our approach is able to discover a range of behaviors, taking a step towards deeper understanding of human behaviors. Video results, code and data can be found at facet.cs.columbia.edu.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Zhao, N., Zhang, X., Noah, J.A., Tiede, M., Hirsch, J.: Separable processes for live “in-person” and live “zoom-like” faces. Imaging Neurosci. (2023)

    Google Scholar 

  2. Balters, S., Miller, J.G., Li, R., Hawthorne, G., Reiss, A.L.: Virtual (Zoom) Interactions Alter Conversational Behavior and Inter-Brain Coherence. bioRxiv (2023)

    Google Scholar 

  3. Matz, S., Harari, G.: Personality–place transactions: mapping the relationships between big five personality traits, states, and daily places. J. Personal. Soc. Psychol. (2020)

    Google Scholar 

  4. Khan, M.R.: A review of the effects of virtual communication on performance and satisfaction across the last ten years of research. J. Appl. Behav. Anal. (2021)

    Google Scholar 

  5. Archibald, M., Ambagtsheer, R., Casey, M., Lawless, M.: Using zoom videoconferencing for qualitative data collection: perceptions and experiences of researchers and participants. Int. J. Qualit. Methods (2019)

    Google Scholar 

  6. Nesher Shoshan, H., Wehrt, W.: Understanding “zoom fatigue”: a mixed-method approach. Appl. Psychol. (2022)

    Google Scholar 

  7. Fauville, G., Luo, M., Queiroz, A.C.M., Bailenson, J.N., Hancock, J.: Zoom Exhaustion and Fatigue Scale. Comput. Human Behav. Rep. (2021)

    Google Scholar 

  8. Bailenson, J.N.: Nonverbal Overload: A Theoretical Argument for the Causes of Zoom Fatigue, Mind, and Behavior, Technology (2021)

    Google Scholar 

  9. Boland, J., Fonseca, P., Mermelstein, I., Williamson, M.: Zoom disrupts the rhythm of conversation. J. Exp. Psychol. Gen. (2021)

    Google Scholar 

  10. Fauville, G., Luo, M., Queiroz, A.C., Bailenson, J., Hancock, J.: Zoom exhaustion and fatigue scale. SSRN Electron. J. (2021)

    Google Scholar 

  11. Hoehe, M., Thibaut, F.: Going digital: how technology use may influence human brains and behavior. Dialog. Clin. Neurosci. (2020)

    Google Scholar 

  12. Numata, T., et al.: Achieving affective human–virtual agent communication by enabling virtual agents to imitate positive expressions. Sci. Rep. (2020)

    Google Scholar 

  13. Smith, H.J., Neff, M.: Communication behavior in embodied virtual reality. In: ACM CHI (2018)

    Google Scholar 

  14. Geng, S., Teotia, R., Tendulkar, P., Menon, S., Vondrick, C.: Affective faces for goal-driven dyadic communication. CoRR (2023)

    Google Scholar 

  15. Fong, R., Patrick, M., Vedaldi, A.: Understanding deep networks via extremal perturbations and smooth masks. In: ICCV (2019)

    Google Scholar 

  16. Petsiuk, V., Das, A., Saenko, K.: Rise: randomized input sampling for explanation of black-box models. CoRR (2018)

    Google Scholar 

  17. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: ICCV (2017)

    Google Scholar 

  18. Shitole, V., Li, F., Kahng, M., Tadepalli, P., Fern, A.: One explanation is not enough: structured attention graphs for image classification. In: NeurIPS (2021)

    Google Scholar 

  19. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: ECCV (2014)

    Google Scholar 

  20. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: CVPR (2016)

    Google Scholar 

  21. Gunning, D., Aha, D.: Darpa’s explainable artificial intelligence (XAI) Program. AI Magazine (2019)

    Google Scholar 

  22. Goyal, Y., Wu, Z., Ernst, J., Batra, D., Parikh, D., Lee, S.: Counterfactual visual explanations. In: ICML (2019)

    Google Scholar 

  23. Vandenhende, S., Mahajan, D., Radenovic, F., Ghadiyaram, D.: Making heads or tails: towards semantically consistent visual counterfactuals. In: ECCV (2022)

    Google Scholar 

  24. Wang, P., Vasconcelos, N.: Scout: self-aware discriminant counterfactual explanations. In: CVPR (2020)

    Google Scholar 

  25. Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should i trust you?” explaining the predictions of any classifier. In: SIGKDD (2016)

    Google Scholar 

  26. Koh, P.W., Liang, P.: Understanding black-box predictions via influence functions. In: ICML (2017)

    Google Scholar 

  27. Yeh, C.-K., Kim, J., Yen, I. E.-H., Ravikumar, P.K.: Representer point selection for explaining deep neural networks. In: NeurIPS (2018)

    Google Scholar 

  28. Tsai, C.-P., Yeh, C.-K., Ravikumar, P.: Sample based explanations via generalized representers. In: CoRR (2023)

    Google Scholar 

  29. Sui, Y., Wu, G., Sanner, S.: Representer point selection via local Jacobian expansion for post-hoc classifier explanation of deep neural networks and ensemble models. In: NeurIPS (2021)

    Google Scholar 

  30. Pruthi, G., Liu, F., Sundararajan, M., Kale, S.: Estimating training data influence by tracking gradient descent. CoRR (2020)

    Google Scholar 

  31. Silva, A., Chopra, R., Gombolay, M.C.: Cross-loss influence functions to explain deep network representations. In: AISTATS (2020)

    Google Scholar 

  32. Guo, H., Rajani, N., Hase, P., Bansal, M., Xiong, C.: “Fastif: scalable influence functions for efficient model interpretation and debugging. CoRR (2020)

    Google Scholar 

  33. Pan, W., Cui, S., Bian, J., Zhang, C., Wang, F.: Explaining algorithmic fairness through fairness-aware causal path decomposition. In: SIGKDD (2021)

    Google Scholar 

  34. Pradhan, R., Zhu, J., Glavic, B., Salimi, B.: Interpretable data-based explanations for fairness debugging. In: SIGMOD (2022)

    Google Scholar 

  35. Meng, C., Trinh, L., Xu, N., Enouen, J., Liu, Y.: Interpretability and fairness evaluation of deep learning models on mimic-iv dataset. Sci. Rep. (2022)

    Google Scholar 

  36. Alelyani, S.: Detection and evaluation of machine learning bias. Appl. Sci. (2021)

    Google Scholar 

  37. Gilpin, L.H., Bau, D., Yuan, B.Z., Bajwa, A., Specter, M., Kagal, L.: Explaining explanations: an overview of interpretability of machine learning. In: DSAA (2018)

    Google Scholar 

  38. Kim, S.S., Meister, N., Ramaswamy, V.V., Fong, R., Russakovsky, O.: Hive: evaluating the human interpretability of visual explanations. In: ECCV (2022)

    Google Scholar 

  39. Selvaraju, R.R., et al.: Squinting at VGA models: introspecting VGA models with sub-questions. In: CVPR (2020)

    Google Scholar 

  40. Das, A., Agrawal, H., Zitnick, L., Parikh, D., Batra, D.: Human attention in visual question answering: do humans and deep networks look at the same regions? In: Computer Vision and Image Understanding (2017)

    Google Scholar 

  41. Brendel, W., Bethge, M.: Approximating CNNs with bag-of-local-features models works surprisingly well on imagenet. CoRR (2019)

    Google Scholar 

  42. Bohle, M., Fritz, M., Schiele, B.: Convolutional dynamic alignment networks for interpretable classifications. In: CVPR (2021)

    Google Scholar 

  43. Böhle, M., Fritz, M., Schiele, B.: B-cos networks: alignment is all we need for interpretability. In: CVPR (2022)

    Google Scholar 

  44. Chen, C., Li, O., Tao, D., Barnett, A., Rudin, C., Su, J.K.: This looks like that: deep learning for interpretable image recognition. In: NeurIPS (2019)

    Google Scholar 

  45. Donnelly, J., Barnett, A.J., Chen, C.: Deformable protopnet: an interpretable image classifier using deformable prototypes. In: CVPR (2022)

    Google Scholar 

  46. Koh, P.W., et al.: Concept bottleneck models. In: ICML (2020)

    Google Scholar 

  47. Hastie, T., Tibshirani, R.: Generalized additive models. Statist. Sci. (1986)

    Google Scholar 

  48. Lou, Y., Caruana, R., Gehrke, J., Hooker, G.: Accurate intelligible models with pairwise interactions. In: SIGKDD (2013)

    Google Scholar 

  49. Dubey, A., Radenovic, F., Mahajan, D.: Scalable interpretability via polynomials. In: NeurIPS (2022)

    Google Scholar 

  50. Radenovic, F., Dubey, A., Mahajan, D.: Neural basis models for interpretability. In: NeurIPS (2022)

    Google Scholar 

  51. Chang, C.-H., Caruana, R., Goldenberg, A.: Node-gam: neural generalized additive model for interpretable deep learning. In: ICLR (2022)

    Google Scholar 

  52. Agarwal, R., et al.: Neural additive models: interpretable machine learning with neural nets. In: NeurIPS (2021)

    Google Scholar 

  53. Burgess, C.P., et al.: Understanding disentangling in \(\beta \)-vae. In: CoRR (2018)

    Google Scholar 

  54. Zhu, Z., Luo, P., Wang, X., Tang, X.: Multi-view perceptron: a deep model for learning face identity and view representations. In: NeurIPS (2014)

    Google Scholar 

  55. Reed, S.E., Sohn, K., Zhang, Y., Lee, H.: Learning to disentangle factors of variation with manifold interaction. In: ICML (2014)

    Google Scholar 

  56. Whitney, W.F., Chang, M., Kulkarni, T.D., Tenenbaum, J.B.: Understanding visual concepts with continuation learning. CoRR (2016)

    Google Scholar 

  57. Cheung, B., Livezey, J.A., Bansal, A.K., Olshausen, B.A.: Discovering hidden factors of variation in deep networks. CoRR (2014)

    Google Scholar 

  58. Lin, Z., Thekumparampil, K.K., Fanti, G.C., Oh, S.: Infogan-cr: disentangling generative adversarial networks with contrastive regularizers. CoRR (2019)

    Google Scholar 

  59. Jeon, I., Lee, W., Pyeon, M., Kim, G.: IB-GAN: disentangled representation learning with information bottleneck generative adversarial networks. In: AAAI (2021)

    Google Scholar 

  60. Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: Infogan: interpretable representation learning by information maximizing generative adversarial nets. In: NeurIPS (2016)

    Google Scholar 

  61. Ramesh, A., Choi, Y., LeCun, Y.: A spectral regularizer for unsupervised disentanglement. CoRR (2018)

    Google Scholar 

  62. Dalva, Y., Altındiş, S. F., Dundar, A.: Vecgan: image-to-image translation with interpretable latent directions. In: ECCV (2022)

    Google Scholar 

  63. Dalva, Y., Pehlivan, H., Moran, C., Hatipoğlu, Ö.I., Dündar, A.: Face attribute editing with disentangled latent vectors. CoRR (2023)

    Google Scholar 

  64. Higgins, I., et al.: beta-VAE: learning basic visual concepts with a constrained variational framework. In: ICLR (2017)

    Google Scholar 

  65. Kim, H., Mnih, A.: Disentangling by factorising. In: ICML (2018)

    Google Scholar 

  66. Chen, T.Q., Li, X., Grosse, R.B., Duvenaud, D.K.: Isolating sources of disentanglement in variational autoencoders. CoRR (2018)

    Google Scholar 

  67. Jeong, Y., Song, H.O.: Learning discrete and continuous factors of data via alternating disentanglement. In: ICML (2019)

    Google Scholar 

  68. Kumar, A., Sattigeri, P., Balakrishnan, A.: Variational inference of disentangled latent concepts from unlabeled observations. CoRR (2017)

    Google Scholar 

  69. Yesu, K., Shandilya, S., Rekharaj, N., Ankit, K., Sairam, P.S.: Big five personality traits inference from five facial shapes using CNN. In: International Conference on Computing, Power and Communication Technologies (GUCON) (2021)

    Google Scholar 

  70. Knyazev, G.G., Bocharov, A.V., Slobodskaya, H.R., Ryabichenko, T.I.: Personality-linked biases in perception of emotional facial expressions. In: Personality and Individual Differences (2008)

    Google Scholar 

  71. Kachur, A., Osin, E., Davydov, D., Shutilov, K., Novokshonov, A.: Assessing the big five personality traits using real-life static facial images. Sci. Rep. (2020)

    Google Scholar 

  72. Büdenbender, B., Höfling, T.T.A., Gerdes, A.B.M., Alpers, G.W.: Training machine learning algorithms for automatic facial coding: the role of emotional facial expressions prototypicality. PLOS One (2023)

    Google Scholar 

  73. Stahelski, A., Anderson, A., Browitt, N., Radeke, M.: Facial expressions and emotion labels are separate initiators of trait inferences from the face. Front. Psychol. (2021)

    Google Scholar 

  74. Snoek, L., et al.: Testing, explaining, and exploring models of facial expressions of emotions. Sci. Adv. (2023)

    Google Scholar 

  75. Straulino, E., Scarpazza, C., Sartori, L.: What is missing in the study of emotion expression? Front. Psychol. (2023)

    Google Scholar 

  76. Du, S., Tao, Y., Martinez, A.M.: Compound facial expressions of emotion. PNAS (2014)

    Google Scholar 

  77. Minetaki, K.: Facial expression and description of personality. In: ACM MISNC (2023)

    Google Scholar 

  78. Jonell, P., Kucherenko, T., Henter, G.E., Beskow, J.: Let’s face it: probabilistic multi-modal interlocutor-aware generation of facial gestures in dyadic settings. In: ACM IVA (2020)

    Google Scholar 

  79. Ng, E., Subramanian, S., Klein, D., Kanazawa, A., Darrell, T., Ginosar, S.: Can language models learn to listen? In: ICCV (2023)

    Google Scholar 

  80. Ng, E., et al.: Learning to listen: modeling non-deterministic dyadic facial motion. In: CVPR (2022)

    Google Scholar 

  81. Burgess, C.P., et al.: Understanding disentangling in \(\beta \)-vae. (2018)

    Google Scholar 

  82. Li, Z., Liu, H.: Beta-VAE has 2 behaviors: PCA or ICA? (2023)

    Google Scholar 

  83. García de Herreros García, P.: Towards latent space disentanglement of variational autoencoders for language (2022)

    Google Scholar 

  84. Pastrana, R.: Disentangling variational autoencoders. CoRR (2022)

    Google Scholar 

  85. Higgins, I., et al.: Unsupervised deep learning identifies semantic disentanglement in single inferotemporal face patch neurons. Nat. Commun. (2021)

    Google Scholar 

  86. Goodfellow, I., et al.: Generative adversarial nets. In: NeurIPS (2014)

    Google Scholar 

  87. Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)

    Google Scholar 

  88. Chakrabarty, A., Das, S.: On translation and reconstruction guarantees of the cycle-consistent generative adversarial networks. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems, vol. 35, pp. 23607–23620. Curran Associates, Inc. (2022)

    Google Scholar 

  89. Shen, Z., Zhou, S.K., Chen, Y., Georgescu, B., Liu, X., Huang, T.: One-to-one mapping for unpaired image-to-image translation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1170–1179 (2020)

    Google Scholar 

  90. Wang, T.-C., et al.: Video-to-video synthesis. In: NeurIPS (2018)

    Google Scholar 

  91. Kuster, C., Popa, T., Bazin, J.-C., Gotsman, C., Gross, M.: Gaze correction for home video conferencing. In: ACM TOG (2012)

    Google Scholar 

  92. Hill, F.: The gesture that encapsulates remote-work life. The Atlantic, 20 July (2023)

    Google Scholar 

  93. Schwarz, G.: Estimating the dimension of a model. The Annals of Statistics (1978)

    Google Scholar 

  94. Denby, D.: The three faces of Trump. The New Yorker, August (2015)

    Google Scholar 

  95. Collett, P.: The seven faces of Donald Trump—a psychologist’s view. The Guardian, January (2017)

    Google Scholar 

  96. Golshan, T.: Donald Trump’s unique speaking style, explained by linguists. Vox, January (2017)

    Google Scholar 

  97. Locatello, F., et al.: Challenging common assumptions in the unsupervised learning of disentangled representations. In: ICML (2019)

    Google Scholar 

Download references

Acknowledgements

This research is based on work partially supported by the DARPA CCU program under contract HR001122C0034 and the National Science Foundation AI Institute for Artificial and Natural Intelligence (ARNI). PT is supported by the Apple PhD fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sumit Sarin .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 502 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sarin, S., Mall, U., Tendulkar, P., Vondrick, C. (2025). How Video Meetings Change Your Expression. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15075. Springer, Cham. https://doi.org/10.1007/978-3-031-72643-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72643-9_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72642-2

  • Online ISBN: 978-3-031-72643-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics