skip to main content
10.1145/3544548.3581085acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

Spatialized Audio and Hybrid Video Conferencing: Where Should Voices be Positioned for People in the Room and Remote Headset Users?

Published: 19 April 2023 Publication History

Abstract

Hybrid video calls include attendees in a conference room with loudspeakers and remote attendees using headsets, each with different options for rendering sound spatially. Two studies explored the listener experience with spatial audio in video calls. One study examined the in-room experience using loudspeakers, comparing among spatialization algorithms spreading voices out horizontally. A second study compared varying degrees of horizontal separation of binaurally rendered voices for a remote participant using a headset. In-room participants preferred the widest spatialization over monophonic, stereo, and stereo-binary audio in metrics related to intelligibility and helpfulness. Remote participants preferred different widths of the audio stage depending on the number of voices. In both studies, rendering sound spatially increased performance in speech stream identification. Results indicate spatial audio benefits for in-room and remote attendees in video calls, although the in-room attendees accepted a wider audio stage than remote users.

Supplementary Material

MP4 File (3544548.3581085-talk-video.mp4)
Pre-recorded Video Presentation

References

[1]
Ronald M Aarts. 1993. Enlarging the sweet spot for stereophony by time/intensity trading. In Audio Engineering Society Convention 94. Audio Engineering Society.
[2]
Jessica J Baldis. 2001. Effects of spatial audio on memory, comprehension, and preference during desktop conferences. In Proceedings of the SIGCHI conference on Human factors in computing systems. 166–173.
[3]
John G Beerends and Frank E De Caluwe. 1999. The influence of video quality on perceived audio quality and vice versa. Journal of the Audio Engineering Society 47, 5 (1999), 355–362.
[4]
Augustinus J Berkhout, Diemer de Vries, and Peter Vogel. 1993. Acoustic control by wave field synthesis. The Journal of the Acoustical Society of America 93, 5 (1993), 2764–2778.
[5]
Paul Bertelson and Monique Radeau. 1976. Ventriloquism, sensory interaction, and response bias: Remarks on the paper by Choe, Welch, Gilford, and Juola. Perception & Psychophysics 19, 6 (1976), 531–535.
[6]
Jens Blauert. 1997. Spatial hearing: the psychophysics of human sound localization. MIT press.
[7]
Elizabeth A Boyle, Anne H Anderson, and Alison Newlands. 1994. The effects of visibility on dialogue and performance in a cooperative problem solving task. Language and speech 37, 1 (1994), 1–20.
[8]
Edgar Brunner, Sebastian Domhof, and Frank Langer. 2002. Nonparametric analysis of longitudinal data in factorial experiments. Vol. 373. Wiley-Interscience.
[9]
Jerry Brunner. 2019. Repeated measurement analysis of binary responses.http://www.utstat.toronto.edu/ brunner/workshops/mixed/
[10]
E Colin Cherry. 1953. Some experiments on the recognition of speech, with one and with two ears. The Journal of the acoustical society of America 25, 5 (1953), 975–979.
[11]
Werner Paulus Josephus De Bruijn. 2004. Application of wave field synthesis in videoconferencing. (2004).
[12]
Edina Fintor, Lukas Aspöck, Janina Fels, and Sabine J Schlittmeier. 2022. The role of spatial separation of two talkers’ auditory stimuli in the listener’s memory of running speech: listening effort in a non-noisy conversational setting. International Journal of Audiology 61, 5 (2022), 371–379.
[13]
Justin T Fleming, Ross K Maddox, and Barbara G Shinn-Cunningham. 2021. Spatial alignment between faces and voices improves selective attention to audio-visual speech. The Journal of the Acoustical Society of America 150, 4 (2021), 3085–3100.
[14]
William G Gardner and Keith D Martin. 1995. HRTF measurements of a KEMAR. The Journal of the Acoustical Society of America 97, 6 (1995), 3907–3908.
[15]
Michael A Gerzon. 1985. Ambisonics in multichannel broadcasting and video. Journal of the Audio Engineering Society 33, 11 (1985), 859–871.
[16]
Jackson Montgomery Goode. 2021. Toward a Telepresence of Sound: Video Conferencing in Spatial Audio. Master’s thesis.
[17]
Kori Inkpen, Rajesh Hegde, Mary Czerwinski, and Zhengyou Zhang. 2010. Exploring spatialized audio & video for distributed conversations. In Proceedings of the 2010 ACM conference on Computer supported cooperative work. 95–98.
[18]
Gary L. Jones and Ruth Y. Litovsky. 2011. A cocktail party model of spatial release from masking by both noise and speech interferers. J. Acoust. Soc. Am. 130, 3 (2011), 1463–1474. https://doi.org/10.1121/1.3613928
[19]
Setsu Komiyama. 1989. Subjective evaluation of angular displacement between picture and sound directions for HDTV sound systems. Journal of the Audio Engineering Society 37, 4 (1989), 210–214.
[20]
Ruth Y Litovsky. 2012. Spatial release from masking. Acoust. Today 8, 2 (2012), 18–25.
[21]
Telecommunication Standardization Sector of ITU. 2017. Spatial Audio Meetings Quality Evaluation, Document ITU-T Rec. P.1310. International Telecommunication Union, Geneva, Switzerland.
[22]
Ville Pulkki. 1997. Virtual sound source positioning using vector base amplitude panning. Journal of the audio engineering society 45, 6 (1997), 456–466.
[23]
Alexander Raake, Markus Fiedler, Katrin Schoenenberg, Katrien De Moor, and Nicola Döring. 2022. Technological Factors Influencing Videoconferencing and Zoom Fatigue. https://doi.org/10.48550/ARXIV.2202.01740
[24]
Alexander Raake and Claudia Schlegel. 2008. Auditory assessment of conversational speech quality of traditional and spatialized teleconferences. In ITG conference on voice communication [8. ITG-Fachtagung]. VDE, 1–4.
[25]
Loïc Rosset, Hamed Alavi, Sailin Zhong, and Denis Lalanne. 2021. Already It Was Hard to Tell Who’s Speaking Over There, and Now Face Masks! Can Binaural Audio Help Remote Participation in Hybrid Meetings?. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. 1–7.
[26]
Janto Skowronek and Alexander Raake. 2015. Assessment of cognitive load, speech communication quality and quality of experience for spatial and non-spatial audio conferencing calls. Speech Communication 66(2015), 154–175.
[27]
Janto Skowronek, Alexander Raake, Gunilla Berndtsson, Olli S Rummukainen, Paolino Usai, Simon NB Gunkel, Mathias Johanson, Emanuël AP Habets, Ludovic Malfait, David Lindero, 2022. Quality of Experience in Telemeetings and Videoconferencing: A Comprehensive Survey. IEEE Access (2022).
[28]
Matthew Wong and Ramani Duraiswami. 2021. Shared-Space: Spatial Audio and Video Layouts for Videoconferencing in a Virtual Room. In 2021 Immersive and 3D Audio: from Architecture to Automotive (I3DA). IEEE, 1–6.

Cited By

View all
  • (2024)There Is More to Avatars Than Visuals: Investigating Combinations of Visual and Auditory User Representations for Remote Collaboration in Augmented RealityProceedings of the ACM on Human-Computer Interaction10.1145/36981488:ISS(540-568)Online publication date: 24-Oct-2024
  • (2024)Is Distance a Modality? Multi-Label Learning for Speech-Based Joint Prediction of Attributed Traits and Perceived Distances in 3D Audio Immersive EnvironmentsProceedings of the 26th International Conference on Multimodal Interaction10.1145/3678957.3685740(321-330)Online publication date: 4-Nov-2024
  • (2024)Investigating the Role of Real-Time Chat Summaries in Supporting Live StreamersProceedings of the 50th Graphics Interface Conference10.1145/3670947.3670980(1-12)Online publication date: 3-Jun-2024
  • Show More Cited By

Index Terms

  1. Spatialized Audio and Hybrid Video Conferencing: Where Should Voices be Positioned for People in the Room and Remote Headset Users?

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CHI '23: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems
      April 2023
      14911 pages
      ISBN:9781450394215
      DOI:10.1145/3544548
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 19 April 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. hybrid meetings
      2. spatial audio
      3. teleconferencing

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      CHI '23
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

      Upcoming Conference

      CHI 2025
      ACM CHI Conference on Human Factors in Computing Systems
      April 26 - May 1, 2025
      Yokohama , Japan

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)212
      • Downloads (Last 6 weeks)18
      Reflects downloads up to 20 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)There Is More to Avatars Than Visuals: Investigating Combinations of Visual and Auditory User Representations for Remote Collaboration in Augmented RealityProceedings of the ACM on Human-Computer Interaction10.1145/36981488:ISS(540-568)Online publication date: 24-Oct-2024
      • (2024)Is Distance a Modality? Multi-Label Learning for Speech-Based Joint Prediction of Attributed Traits and Perceived Distances in 3D Audio Immersive EnvironmentsProceedings of the 26th International Conference on Multimodal Interaction10.1145/3678957.3685740(321-330)Online publication date: 4-Nov-2024
      • (2024)Investigating the Role of Real-Time Chat Summaries in Supporting Live StreamersProceedings of the 50th Graphics Interface Conference10.1145/3670947.3670980(1-12)Online publication date: 3-Jun-2024
      • (2024)Auptimize: Optimal Placement of Spatial Audio Cues for Extended RealityProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676424(1-14)Online publication date: 13-Oct-2024
      • (2024)BlendScape: Enabling End-User Customization of Video-Conferencing Environments through Generative AIProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676326(1-19)Online publication date: 13-Oct-2024
      • (2024)“May I Speak?”: Multi-Modal Attention Guidance in Social VR Group ConversationsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.337211930:5(2287-2297)Online publication date: 7-Mar-2024

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      HTML Format

      View this article in HTML Format.

      HTML Format

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media