skip to main content
10.1145/3576914.3589563acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
research-article
Public Access

Spatial Audio Empowered Smart speakers with Xblock - A Pose-Adaptive Crosstalk Cancellation Algorithm for Free-moving Users

Published: 09 May 2023 Publication History

Abstract

Smart IoT Speakers, while connected over a network, currently only produce sounds that come directly from the individual devices. We envision a future where smart speakers collaboratively produce a fabric of spatial audio, capable of perceptually placing sound in a range of locations in physical space. This could provide audio cues in homes, offices and public spaces that are flexibly linked to various positions. The perception of spatialized audio relies on binaural cues, especially the time difference and the level difference of incident sound at a user’s left and right ears. Traditional stereo speakers cannot create the spatialization perception for a user when playing binaural audio due to auditory crosstalk, as each ear hears a combination of both speaker outputs. We present Xblock, a novel time-domain pose-adaptive crosstalk cancellation technique that creates a spatial audio perception over a pair of speakers using knowledge of the user’s head pose and speaker positions. We build a prototype smart speaker IoT system empowered by Xblock, explore the effectiveness of Xblock through signal analysis, and discuss future perceptual user studies and future work.

References

[1]
Anastasios Alexandridis, Anthony Griffin, and Athanasios Mouchtaris. 2013. Capturing and Reproducing Spatial Audio Based on a Circular Microphone Array. Journal of Electrical and Computer Engineering 2013 (21 Mar 2013), 718574. https://doi.org/10.1155/2013/718574
[2]
Mingsian Bai and Chih-Chung Lee. 2006. Development and implementation of cross-talk cancellation system in spatial audio reproduction based on subband filtering. Journal of Sound and Vibration 290 (03 2006), 1269–1289. https://doi.org/10.1016/j.jsv.2005.05.016
[3]
Mingsian R Bai and Chih-Chung Lee. 2006. Objective and subjective analysis of effects of listening angle on crosstalk cancellation in spatial sound reproduction. The Journal of the Acoustical Society of America 120, 4 (October 2006), 1976—1989. https://doi.org/10.1121/1.2257986
[4]
Amit Barde, Matt Ward, Robert Lindeman, and Mark Billinghurst. 2020. The Use of Spatialised Auditory and Visual cues for Target Acqusition in a Search Task. Journal of the Audio Engineering Society (august 2020).
[5]
Benjamin b. Bauer. 1961. Stereophonic Earphones and Binaural Loudspeakers. Journal of the Audio Engineering Society 9, 2 (april 1961), 148–151.
[6]
Valentin Bauer, Anna Nagele, Chris Baume, T. Cowlishaw, H. Cooke, Chris Pike, and P. Healey. 2019. Designing an Interactive and Collaborative Experience in Audio Augmented Reality. In EuroVR.
[7]
Bose. [n. d.]. Wearables by Bose - AR Audio Sunglasses. (2019).https://www.bose.com/en_us/products/smart_products/sp_frames.html Accessed: 2020.
[8]
Duane H. Cooper and Jerald L. Bauck. 1989. Prospects for Transaural Recording. Journal of the Audio Engineering Society 37, 1/2 (january/february 1989), 3–19.
[9]
Etienne Corteel. 2007. Synthesis of Directional Sources Using Wave Field Synthesis, Possibilities, and Limitations. EURASIP Journal on Applied Signal Processing 2007 (01 2007), 188–188. https://doi.org/10.1155/2007/90509
[10]
P. Damaske. 1971. Head‐Related Two‐Channel Stereophony with Loudspeaker Reproduction. The Journal of the Acoustical Society of America 50, 4B (1971), 1109–1115. https://doi.org/10.1121/1.1912742 arXiv:https://doi.org/10.1121/1.1912742
[11]
Matthias Frank, Franz Zotter, and Alois Sontacchi. 2015. Producing 3D Audio in Ambisonics. In Audio Engineering Society Conference: 57th International Conference: The Future of Audio Entertainment Technology – Cinema, Television and the Internet. http://www.aes.org/e-lib/browse.cfm?elib=17605
[12]
William Gardner. 2005. 3-D Audio Using Loudspeakers. (09 2005).
[13]
Samuel Gibbs. [n. d.]. Amazon launches Alexa smart ring, smart glasses and earbuds. (2020).https://www.theguardian.com/technology/2019/sep/26/amazon-launches-alexa-smart-ring-smart-glasses-and-earbuds Accessed: 2020.
[14]
Ralph Glasgal. 2007. 360° Localization via 4.x RACE Processing. In Audio Engineering Society Convention 123. http://www.aes.org/e-lib/browse.cfm?elib=14358
[15]
Marcin Gorzel, Andrew Allen, Ian Kelly, Julius Kammerl, Alper Gungormusler, Hengchin Yeh, and Francis Boland. 2019. Efficient Encoding and Decoding of Binaural Sound with Resonance Audio. In Audio Engineering Society Conference: 2019 AES International Conference on Immersive and Interactive Audio. http://www.aes.org/e-lib/browse.cfm?elib=20446
[16]
Gabriel Haas, Evgeny Stemasov, Michael Rietzler, and Enrico Rukzio. 2020. Interactive Auditory Mediated Reality: Towards User-Defined Personal Soundscapes. In Proceedings of the 2020 ACM Designing Interactive Systems Conference (Eindhoven, Netherlands) (DIS ’20). Association for Computing Machinery, New York, NY, USA, 2035–2050. https://doi.org/10.1145/3357236.3395493
[17]
Yasha Iravantchi, Mayank Goel, and Chris Harrison. 2020. Digital Ventriloquism: Giving Voice to Everyday Objects. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–10. https://doi.org/10.1145/3313831.3376503
[18]
J. Blauert. 1999. Spatial hearing – The psychophysics of human sound localization. The MIT Press.
[19]
Stine Schmieg Johansen and Peter Axel Nielsen. 2019. Personalised Soundscapes in Homes. In Proceedings of the 2019 on Designing Interactive Systems Conference (San Diego, CA, USA) (DIS ’19). Association for Computing Machinery, New York, NY, USA, 813–822. https://doi.org/10.1145/3322276.3322364
[20]
Hansung Kim, Luca Remaggi, Philip J.B. Jackson, and Adrian Hilton. 2019. Immersive Spatial Audio Reproduction for VR/AR Using Room Acoustic Modelling from 360° Images. In 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR). 120–126. https://doi.org/10.1109/VR.2019.8798247
[21]
Taeyoung Kim, Youngsun Kwon, and Sung-Eui Yoon. 2020. Real-time 3-D Mapping with Estimating Acoustic Materials. In 2020 IEEE/SICE International Symposium on System Integration (SII). 646–651. https://doi.org/10.1109/SII46433.2020.9025860
[22]
H. Kurabayashi, M. Otani, K. Itoh, M. Hashimoto, and M. Kayama. 2013. Development of dynamic transaural reproduction system using non-contact head tracking. In 2013 IEEE 2nd Global Conference on Consumer Electronics (GCCE). 12–16.
[23]
Tobias Lentz. 2006. Dynamic Crosstalk Cancellation for Binaural Synthesis in Virtual Reality Environments. J. Audio Eng. Soc 54, 4 (2006), 283–294. http://www.aes.org/e-lib/browse.cfm?elib=13677
[24]
Tobias Lentz, Dirk Schröder, Michael Vorländer, and Ingo Assenmacher. 2007. Virtual Reality System with Integrated Sound Field Simulation and Reproduction. EURASIP J. Adv. Signal Process 2007, 1 (Jan. 2007), 187. https://doi.org/10.1155/2007/70540
[25]
Frank Liu and Robert LiKamWa. 2019. Demo: A Spatial Audio System for the Internet-of-Things. In Proceedings of the 20th International Workshop on Mobile Computing Systems and Applications (Santa Cruz, CA, USA) (HotMobile ’19). Association for Computing Machinery, New York, NY, USA, 183. https://doi.org/10.1145/3301293.3309567
[26]
B. Masiero, J. Fels, and M. Vorländer. 2011. Review of the crosstalk cancellation filter technique.
[27]
B. Masiero and M. Vorländer. 2014. A Framework for the Calculation of Dynamic Crosstalk Cancellation Filters. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22, 9 (2014), 1345–1354.
[28]
Mark McGill, Stephen Brewster, David McGookin, and Graham Wilson. 2020. Acoustic Transparency and the Changing Soundscape of Auditory Mixed Reality. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–16. https://doi.org/10.1145/3313831.3376702
[29]
Mark McGill, Florian Mathis, Mohamed Khamis, and Julie Williamson. 2020. Augmenting TV Viewing Using Acoustically Transparent Auditory Headsets. In ACM International Conference on Interactive Media Experiences (Cornella, Barcelona, Spain) (IMX ’20). Association for Computing Machinery, New York, NY, USA, 34–44. https://doi.org/10.1145/3391614.3393650
[30]
Anca Morar, Alin Moldoveanu, Irina Mocanu, Florica Moldoveanu, Ion Emilian Radoi, Victor Asavei, Alexandru Gradinaru, and Alex Butean. 2020. A Comprehensive Survey of Indoor Localization Methods Based on Computer Vision. Sensors 20, 9 (2020). https://doi.org/10.3390/s20092641
[31]
A. Mouchtaris, P. Reveliotis, and C. Kyriakakis. 2000. Inverse filter design for immersive audio rendering over loudspeakers. IEEE Transactions on Multimedia 2, 2 (2000), 77–87.
[32]
Huthaifa Obeidat, Wafa Shuaieb, Omar Obeidat, and Raed Abd-Alhameed. 2021. A Review of Indoor Localization Techniques and Wireless Technologies. Wireless Personal Communications 119 (07 2021). https://doi.org/10.1007/s11277-021-08209-5
[33]
Oculus. 2020 Retrieved September 8, 2020. Oculus Audio Spatializer. https://developer.oculus.com/downloads/package/oculus-spatializer-unity/
[34]
Ville Pulkki. 1997. Virtual Sound Source Positioning Using Vector Base Amplitude Panning. J. Audio Eng. Soc 45, 6 (1997), 456–466. http://www.aes.org/e-lib/browse.cfm?elib=7853
[35]
M. Song, C. Zhang, D. Florencio, and H. Kang. 2011. An Interactive 3-D Audio System With Loudspeakers. IEEE Transactions on Multimedia 13, 5 (2011), 844–855.
[36]
Sascha Spors, Rudolf Rabenstein, and Jens Ahrens. 2008. The Theory of Wave Field Synthesis Revisited. 1 (01 2008).
[37]
Linas Svilainis. 2021. GetTOFcos(MySignal,RefSignal). https://www.mathworks.com/matlabcentral/fileexchange/65229-gettofcos-mysignal-refsignal
[38]
Zhenyu Tang, Nicholas J. Bryan, Dingzeyu Li, Timothy R. Langlois, and Dinesh Manocha. 2020. Scene-Aware Audio Rendering via Deep Acoustic Analysis. IEEE Transactions on Visualization and Computer Graphics 26, 5 (May 2020), 1991–2001. https://doi.org/10.1109/tvcg.2020.2973058
[39]
ValveSoftware. 2020 Retrieved September 8, 2020. Steam Audio. https://github.com/ValveSoftware/steam-audio
[40]
Chris Welch. [n. d.]. Apple AirPods Pro hands-on: the noise cancellation really works. (2020).https://www.theverge.com/2019/10/29/20938740/apple-airpods-pro-hands-on-noise-cancellation-photos-features Accessed: 2020.
[41]
Franz Zotter, Markus Zaunschirm, Matthias Frank, and Matthias Kronlachner. 2017. A Beamformer to Play with Wall Reflections: The Icosahedral Loudspeaker. Computer Music Journal 41 (09 2017), 50–68. https://doi.org/10.1162/comj_a_00429

Index Terms

  1. Spatial Audio Empowered Smart speakers with Xblock - A Pose-Adaptive Crosstalk Cancellation Algorithm for Free-moving Users

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        CPS-IoT Week '23: Proceedings of Cyber-Physical Systems and Internet of Things Week 2023
        May 2023
        419 pages
        ISBN:9798400700491
        DOI:10.1145/3576914
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 09 May 2023

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. algorithm
        2. crosstalk cancellation
        3. internet of things
        4. spatial audio

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Funding Sources

        Conference

        CPS-IoT Week '23
        Sponsor:

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 184
          Total Downloads
        • Downloads (Last 12 months)121
        • Downloads (Last 6 weeks)22
        Reflects downloads up to 27 Jan 2025

        Other Metrics

        Citations

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Login options

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media