skip to main content
10.1145/3596671.3598578acmotherconferencesArticle/Chapter ViewAbstractPublication PageschiworkConference Proceedingsconference-collections
research-article

Hear We Are: Spatial Audio Benefits Perceptions of Turn-Taking and Social Presence in Video Meetings

Published: 20 September 2023 Publication History

Abstract

Relative to in-person meetings, conversations in video meetings have long been reported as stilted. Spatial audio in video meetings can simulate the way we hear the world by separating audio streams based on speakers’ virtual locations. We report on a within-subject experiment in which 75 employees of a global technology company completed two group survival tasks with spatial audio enabled or disabled. Spatial audio increased perceptions of interactivity, shared space, and ease of understanding. Women experienced effects for social presence while men experienced effects for turn-taking. We discuss implications for inclusion, task performance, fatigue, and future research.

References

[1]
Jeremy N. Bailenson. 2021. Nonverbal Overload: A Theoretical Argument for the Causes of Zoom Fatigue. Technology, Mind, and Behavior 2, 1 (Feb. 2021). https://doi.org/10.1037/tmb0000030
[2]
Jessica J. Baldis. 2001. Effects of spatial audio on memory, comprehension, and preference during desktop conferences. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems(CHI ’01). Association for Computing Machinery, New York, NY, USA, 166–173. https://doi.org/10.1145/365024.365092
[3]
Christopher C. Berger, Mar Gonzalez-Franco, Ana Tajadura-Jiménez, Dinei Florencio, and Zhengyou Zhang. 2018. Generic HRTFs May be Good Enough in Virtual Reality. Improving Source Localization through Cross-Modal Plasticity. Frontiers in Neuroscience 12 (2018). https://www.frontiersin.org/articles/10.3389/fnins.2018.00021
[4]
Frank Biocca. 1997. The Cyborg’s Dilemma: Progressive Embodiment in Virtual Environments [1]. Journal of Computer-Mediated Communication 3, 2 (Sept. 1997), JCMC324. https://doi.org/10.1111/j.1083-6101.1997.tb00070.x
[5]
Frank Biocca and Chad Harms. 2002. Defining and measuring social presence: Contribution to the networked minds theory and measure. In Proceedings of the Fifth Annual International Workshop on Presence.
[6]
Anita L. Blanchard, Andrew G. McBride, and Joseph A. Allen. 2022. Perceiving meetings as groups: How entitativity links meeting characteristics to meeting success. Psychology of Leaders and Leadership 25 (2022), 90–113. https://doi.org/10.1037/mgr0000124
[7]
Jens Blauert. 1996. Spatial Hearing: The Psychophysics of Human Sound Localization. https://doi.org/10.7551/mitpress/6391.001.0001
[8]
Judee K. Burgoon, Joseph A. Bonito, Artemio Ramirez, Jr., Norah E. Dunbar, Karadeen Kam, and Jenna Fischer. 2002. Testing the Interactivity Principle: Effects of Mediation, Propinquity, and Verbal and Nonverbal Modalities in Interpersonal Interaction. Journal of Communication 52, 3 (Sept. 2002), 657–677. https://doi.org/10.1111/j.1460-2466.2002.tb02567.x
[9]
E. Colin Cherry. 1953. Some Experiments on the Recognition of Speech, with One and with Two Ears. The Journal of the Acoustical Society of America 25, 5 (Sept. 1953), 975–979. https://doi.org/10.1121/1.1907229
[10]
Gregory D. Clemenson, Antonella Maselli, Alexander J. Fiannaca, Amos Miller, and Mar Gonzalez-Franco. 2021. Rethinking GPS navigation: creating cognitive maps through auditory clues. Scientific Reports 11, 1 (April 2021), 7764. https://doi.org/10.1038/s41598-021-87148-4
[11]
James J. Cummings and Jeremy N. Bailenson. 2016. How Immersive Is Enough? A Meta-Analysis of the Effect of Immersive Technology on User Presence. Media Psychology 19, 2 (April 2016), 272–309. https://doi.org/10.1080/15213269.2015.1015740
[12]
James J. Cummings and Blake Wertz. 2018. Technological predictors of social presence: a foundation for a meta-analytic review and empirical concept explication. In Proceedings of the 10th Annual International Workshop on Presence (Prague).
[13]
Natasha Dhawan, Molly Carnes, Angela Byars-Winston, and Narjust Duma. 2021. Videoconferencing Etiquette: Promoting Gender Equity During Virtual Meetings. Journal of Women’s Health 30, 4 (April 2021), 460–465. https://doi.org/10.1089/jwh.2020.8881
[14]
Christina Dicke, Viljakaisa Aaltonen, Anssi Rämö, and Miikka Vilermo. 2010. Talk to me: The Influence of Audio Quality on the Perception of Social Presence. BCS Learning & Development. https://doi.org/10.14236/ewic/HCI2010.36
[15]
Mark A. Ericson, Douglas S. Brungart, and Brian D. Simpson. 2004. Factors That Influence Intelligibility in Multitalker Speech Displays. The International Journal of Aviation Psychology 14, 3 (June 2004), 313–334. https://doi.org/10.1207/s15327108ijap1403_6
[16]
Justin T Fleming, Ross K Maddox, and Barbara G Shinn-Cunningham. 2021. Spatial alignment between faces and voices improves selective attention to audio-visual speech. The Journal of the Acoustical Society of America 150, 4 (2021), 3085–3100.
[17]
Maria Frank, Ghassem Tofighi, Haisong Gu, and Renate Fruchter. 2016. Engagement Detection in Meetings. https://doi.org/10.48550/arXiv.1608.08711
[18]
Michael Gibbs, Friederike Mengel, and Christoph Siemroth. 2021. Work from Home & Productivity: Evidence from Personnel & Analytics Data on IT Professionals. https://doi.org/10.2139/ssrn.3843197
[19]
Jay Hall and W. H. Watson. 1970. The Effects of a Normative Intervention on Group Decision-Making Performance. Human Relations 23, 4 (Aug. 1970), 299–317. https://doi.org/10.1177/001872677002300404
[20]
Chad Harms and Frank Biocca. 2004. Internal Consistency and Reliability of the Networked Minds Social Presence Measure. (2004).
[21]
J. Hauber, H. Regenbrecht, A. Hills, A. Cockburn, and Mark Billinghurst. 2005. Social Presence in Two- and Three-Dimensional Videoconferencing. In Presence 2005: The 8th Annual International Workshop on Presence. London, UK, 198–198.
[22]
Claudia Hendrix and Woodrow Barfield. 1996. The Sense of Presence within Auditory Virtual Environments. Presence: Teleoperators and Virtual Environments 5, 3 (Aug. 1996), 290–301. https://doi.org/10.1162/pres.1996.5.3.290
[23]
Jim Hollan and Scott Stornetta. 1992. Beyond being there. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems(CHI ’92). Association for Computing Machinery, New York, NY, USA, 119–125. https://doi.org/10.1145/142750.142769
[24]
Gijs A. Holleman, Ignace T. C. Hooge, Chantal Kemner, and Roy S. Hessels. 2020. The ‘Real-World Approach’ and Its Problems: A Critique of the Term Ecological Validity. Frontiers in Psychology 11 (2020). https://www.frontiersin.org/articles/10.3389/fpsyg.2020.00721
[25]
Kori Inkpen, Rajesh Hegde, Mary Czerwinski, and Zhengyou Zhang. 2010. Exploring Spatialized Audio & Video for Distributed Conversations. In Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work(CSCW ’10). Association for Computing Machinery, New York, NY, USA, 95–98. https://doi.org/10.1145/1718918.1718936
[26]
Angelika C. Kern and Wolfgang Ellermeier. 2020. Audio in VR: Effects of a Soundscape and Movement-Triggered Step Sounds on Presence. Frontiers in Robotics and AI 7 (2020). https://www.frontiersin.org/articles/10.3389/frobt.2020.00020
[27]
Ryan Kilgore, Mark H. Chignell, and Paul W. Smith. 2003. Spatialized audioconferencing: what are the benefits?. In Proceedings of the 2003 conference of the Centre for Advanced Studies on Collaborative Research, October 6-9, 2003, Toronto, Ontario, Canada, Darlene A. Stewart (Ed.). IBM, 135–144. https://dl.acm.org/citation.cfm?id=961345
[28]
Olga Kulyk, Jimmy Wang, and Jacques Terken. 2006. Real-Time Feedback on Nonverbal Behaviour to Enhance Social Dynamics in Small Group Meetings. In Machine Learning for Multimodal Interaction(Lecture notes in Computer Science), Steve Renals and Samy Bengio (Eds.). Springer, Berlin, Heidelberg, 150–161. https://doi.org/10.1007/11677482_13
[29]
Kwan Min Lee. 2004. Presence, Explicated. Communication Theory 14, 1 (Feb. 2004), 27–50. https://doi.org/10.1111/j.1468-2885.2004.tb00302.x
[30]
Daniel Levi and David A Askay. 2020. Group dynamics for teams. Sage Publications.
[31]
Stephen C. Levinson. 2016. Turn-taking in Human Communication – Origins and Implications for Language Processing. Trends in Cognitive Sciences 20, 1 (Jan. 2016), 6–14. https://doi.org/10.1016/j.tics.2015.10.010
[32]
Hannah B. Love, Bailey K. Fosdick, Jennifer E. Cross, Meghan Suter, Dinaida Egan, Elizabeth Tofany, and Ellen R. Fisher. 2022. Towards understanding the characteristics of successful and unsuccessful collaborations: a case-based team science study. Humanities and Social Sciences Communications 9, 1 (Oct. 2022), 1–11. https://doi.org/10.1057/s41599-022-01388-x
[33]
Matthew Lombard, Theresa B. Ditton, and Lisa Weinstein. 2004. Measuring presence: The Temple Presence Inventory. http://matthewlombard.com/research/p2_ab.html
[34]
Radha Nila Meghanathan, Patrick Ruediger-Flore, Felix Hekele, Jan Spilski, Achim Ebert, and Thomas Lachmann. 2021. Spatial Sound in a 3D Virtual Environment: All Bark and No Bite?Big Data and Cognitive Computing 5, 4 (Dec. 2021), 79. https://doi.org/10.3390/bdcc5040079
[35]
Neville Moray. 1959. Attention in Dichotic Listening: Affective Cues and the Influence of Instructions. Quarterly Journal of Experimental Psychology 11, 1 (Feb. 1959), 56–60. https://doi.org/10.1080/17470215908416289
[36]
Robby Nadler. 2020. Understanding “Zoom fatigue”: Theorizing spatial dynamics as third skins in computer-mediated communication. Computers and Composition 58 (Dec. 2020), 102613. https://doi.org/10.1016/j.compcom.2020.102613
[37]
NASA. 2009. Exploration: Then and Now - Survival! Lesson. http://www.nasa.gov/stem-ed-resources/jamestown-survival.html
[38]
Ulric Neisser. 1976. Cognition and reality: Principles and implications of cognitive psychology.W H Freeman/Times Books/ Henry Holt & Co, New York, NY, US.
[39]
David T. Nguyen and John Canny. 2007. Multiview: improving trust in group video conferencing through spatial faithfulness. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems(CHI ’07). Association for Computing Machinery, New York, NY, USA, 1465–1474. https://doi.org/10.1145/1240624.1240846
[40]
Oliver Niebuhr, Ronald Böck, and Joseph A. Allen. 2021. On the Sound of Successful Meetings: How Speech Prosody Predicts Meeting Performance. In Companion Publication of the 2021 International Conference on Multimodal Interaction(ICMI ’21 Companion). Association for Computing Machinery, New York, NY, USA, 240–248. https://doi.org/10.1145/3461615.3485412
[41]
Rolf Nordahl and Niels Christian Nilsson. 2014. The Sound of Being There: Presence and Interactive Audio in Immersive Virtual Reality. In The Oxford Handbook of Interactive Audio. Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199797226.013.013
[42]
Catherine S. Oh, Jeremy N. Bailenson, and Gregory F. Welch. 2018. A Systematic Review of Social Presence: Definition, Antecedents, and Implications. Frontiers in Robotics and AI 5 (2018). https://www.frontiersin.org/articles/10.3389/frobt.2018.00114
[43]
Alexander Raake, Markus Fiedler, Katrin Schoenenberg, Katrien De Moor, and Nicola Döring. 2022. Technological Factors Influencing Videoconferencing and Zoom Fatigue. https://doi.org/10.48550/arXiv.2202.01740
[44]
René Riedl. 2022. On the stress potential of videoconferencing: definition and root causes of Zoom fatigue. Electronic Markets 32, 1 (March 2022), 153–177. https://doi.org/10.1007/s12525-021-00501-3
[45]
E. Sean Rintel. 2010. Conversational management of network trouble perturbations in personal videoconferencing. In Proceedings of the 22nd Conference of the Computer-Human Interaction Special Interest Group of Australia on Computer-Human Interaction(OZCHI ’10). Association for Computing Machinery, New York, NY, USA, 304–311. https://doi.org/10.1145/1952222.1952288
[46]
Loïc Rosset, Hamed Alavi, Sailin Zhong, and Denis Lalanne. 2021. Already It Was Hard to Tell Who’s Speaking Over There, and Now Face Masks! Can Binaural Audio Help Remote Participation in Hybrid Meetings?. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems(CHI EA ’21). Association for Computing Machinery, New York, NY, USA, 1–7. https://doi.org/10.1145/3411763.3451802
[47]
Karen Ruhleder and Brigitte Jordan. 2001. Co-Constructing Non-Mutual Realities: Delay-Generated Trouble in Distributed Interaction. Computer Supported Cooperative Work (CSCW) 10, 1 (March 2001), 113–138. https://doi.org/10.1023/A:1011243905593
[48]
Harvey Sacks, Emanuel A Schegloff, and Gail Jefferson. 1974. A simplest systematics for the organization of turn-taking for conversation. Language (1974), 696–735.
[49]
Samiha Samrose, Daniel McDuff, Robert Sim, Jina Suh, Kael Rowan, Javier Hernandez, Sean Rintel, Kevin Moynihan, and Mary Czerwinski. 2021. MeetingCoach: An Intelligent Dashboard for Supporting Effective & Inclusive Meetings. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems(CHI ’21). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3411764.3445615
[50]
Emanuel A. Schegloff. 2000. Overlapping Talk and the Organization of Turn-Taking for Conversation. Language in Society 29, 1 (2000), 1–63.
[51]
Abigail J. Sellen. 1992. Speech patterns in video-mediated conversations. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems(CHI ’92). Association for Computing Machinery, New York, NY, USA, 49–59. https://doi.org/10.1145/142750.142756
[52]
Abigail J. Sellen. 1995. Remote Conversations: The Effects of Mediating Talk With Technology. Human–Computer Interaction 10, 4 (1995), 401–444. https://doi.org/10.1207/s15327051hci1004_2
[53]
Lucas M. Seuren, Joseph Wherton, Trisha Greenhalgh, and Sara E. Shaw. 2021. Whose turn is it anyway? Latency and the organization of turn-taking in video-mediated interaction. Journal of Pragmatics 172 (Jan. 2021), 63–78. https://doi.org/10.1016/j.pragma.2020.11.005
[54]
Jean-Luc Sinclair. 2020. Principles of game audio and sound design: sound design and audio implementation for interactive and immersive media. CRC Press.
[55]
Paul Skalski and Robert Whitbred. 2010. Image versus sound: A comparison of formal feature effects on presence and video game enjoyment. PsychNology Journal 8 (2010), 67–84.
[56]
Gabriel Skantze. 2021. Turn-taking in Conversational Systems and Human-Robot Interaction: A Review. Computer Speech & Language 67 (May 2021), 101178. https://doi.org/10.1016/j.csl.2020.101178
[57]
Janto Skowronek, Alexander Raake, Gunilla H. Berndtsson, Olli S. Rummukainen, Paolino Usai, Simon N. B. Gunkel, Mathias Johanson, Emanuël A. P. Habets, Ludovic Malfait, David Lindero, and Alexander Toet. 2022. Quality of Experience in Telemeetings and Videoconferencing: A Comprehensive Survey. IEEE Access 10 (2022), 63885–63931. https://doi.org/10.1109/ACCESS.2022.3176369
[58]
Willem Standaert, Steve Muylle, and Amit Basu. 2021. How shall we meet? Understanding the importance of meeting mode capabilities for different meeting objectives. Information & Management 58, 1 (Jan. 2021), 103393. https://doi.org/10.1016/j.im.2020.103393
[59]
Willem Standaert and Sophie Thunus. 2022. Virtual Meetings during the Pandemic: Boon or Bane for Gender Inequality.
[60]
Jonathan Steuer. 1992. Defining Virtual Reality: Dimensions Determining Telepresence. Journal of Communication 42, 4 (1992), 73–93. https://doi.org/10.1111/j.1460-2466.1992.tb00812.x
[61]
John Tang. 2021. Understanding the Telework Experience of People with Disabilities. Proceedings of the ACM on Human-Computer Interaction 5, CSCW1 (April 2021), 30:1–30:27. https://doi.org/10.1145/3449104
[62]
Anna Watson and M. Angela Sasse. 2000. The good, the bad, and the muffled: the impact of different degradations on Internet speech. In Proceedings of the eighth ACM international conference on Multimedia(MULTIMEDIA ’00). Association for Computing Machinery, New York, NY, USA, 269–276. https://doi.org/10.1145/354384.354503
[63]
Joseph Williams, Sven Shepstone, and Damian Murphy. 2022. Understanding Immersion in the Context of Films with Spatial Audio. In Audio Engineering Society Conference: AES 2022 International Audio for Virtual and Augmented Reality Conference. http://www.aes.org/e-lib/browse.cfm?elib=21878
[64]
Julie Williamson, Jie Li, Vinoba Vinayagamoorthy, David A. Shamma, and Pablo Cesar. 2021. Proxemics and Social Interactions in an Instrumented Virtual Reality Workshop. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 253, 13 pages. https://doi.org/10.1145/3411764.3445729
[65]
Matthew Wong and Ramani Duraiswami. 2021. Shared-Space: Spatial Audio and Video Layouts for Videoconferencing in a Virtual Room. In 2021 Immersive and 3D Audio: from Architecture to Automotive (I3DA). 1–6. https://doi.org/10.1109/I3DA48870.2021.9610974
[66]
Jing Yang, Yves Frank, and Gábor Sörös. 2019. Hearing Is Believing: Synthesizing Spatial Audio from Everyday Objects to Users. In Proceedings of the 10th Augmented Human International Conference 2019(AH2019). Association for Computing Machinery, New York, NY, USA, 1–9. https://doi.org/10.1145/3311823.3311872
[67]
Jing Yang, Prasanth Sasikumar, Huidong Bai, Amit Barde, Gábor Sörös, and Mark Billinghurst. 2020. The effects of spatial auditory and visual cues on mixed reality remote collaboration. Journal on Multimodal User Interfaces 14, 4 (Dec. 2020), 337–352. https://doi.org/10.1007/s12193-020-00331-1
[68]
Mike Z. Yao and Andrew J. Flanagin. 2006. A self-awareness approach to computer-mediated communication. Computers in Human Behavior 22, 3 (May 2006), 518–544. https://doi.org/10.1016/j.chb.2004.10.008

Cited By

View all
  • (2024)There Is More to Avatars Than Visuals: Investigating Combinations of Visual and Auditory User Representations for Remote Collaboration in Augmented RealityProceedings of the ACM on Human-Computer Interaction10.1145/36981488:ISS(540-568)Online publication date: 24-Oct-2024
  • (2024)Auptimize: Optimal Placement of Spatial Audio Cues for Extended RealityProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676424(1-14)Online publication date: 13-Oct-2024
  • (2024)Comparing the Agency of Hybrid Meeting Remote Users in 2D and 3D Interfaces of the Hybridge SystemExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3651103(1-12)Online publication date: 11-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
CHIWORK '23: Proceedings of the 2nd Annual Meeting of the Symposium on Human-Computer Interaction for Work
June 2023
164 pages
ISBN:9798400708077
DOI:10.1145/3596671
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 September 2023

Permissions

Request permissions for this article.

Check for updates

Badges

  • Honorable Mention

Author Tags

  1. fatigue
  2. gender
  3. social presence
  4. spatial audio
  5. task outcomes
  6. turn-taking
  7. video meetings

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

CHIWORK 2023

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)185
  • Downloads (Last 6 weeks)27
Reflects downloads up to 10 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)There Is More to Avatars Than Visuals: Investigating Combinations of Visual and Auditory User Representations for Remote Collaboration in Augmented RealityProceedings of the ACM on Human-Computer Interaction10.1145/36981488:ISS(540-568)Online publication date: 24-Oct-2024
  • (2024)Auptimize: Optimal Placement of Spatial Audio Cues for Extended RealityProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676424(1-14)Online publication date: 13-Oct-2024
  • (2024)Comparing the Agency of Hybrid Meeting Remote Users in 2D and 3D Interfaces of the Hybridge SystemExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3651103(1-12)Online publication date: 11-May-2024
  • (2024)Desktop‐based virtual reality social platforms versus video conferencing platforms for online synchronous learning in higher education: An experimental study to evaluate students' learning gains and user experienceJournal of Computer Assisted Learning10.1111/jcal.1307540:6(3454-3473)Online publication date: 6-Oct-2024
  • (2024)Evaluating the Effect of Binaural Auralization on Audiovisual Plausibility and Communication Behavior in Virtual Reality2024 IEEE Conference Virtual Reality and 3D User Interfaces (VR)10.1109/VR58804.2024.00104(849-858)Online publication date: 16-Mar-2024
  • (2024)Subjective Evaluation of the Impact of Spatial Audio on Triadic Communication in Virtual Reality2024 16th International Conference on Quality of Multimedia Experience (QoMEX)10.1109/QoMEX61742.2024.10598292(262-265)Online publication date: 18-Jun-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media