Immersive Virtual Reality Audio Rendering Adapted to the Listener and the Room

Kim, Hansung; Remaggi, Luca; Jackson, Philip J. B.; Hilton, Adrian

doi:10.1007/978-3-030-41816-8_13

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11900))

2766 Accesses

Abstract

The visual and auditory modalities are the most important stimuli for humans. In order to maximise the sense of immersion in VR environments, a plausible spatial audio reproduction synchronised with visual information is essential. However, measuring acoustic properties of an environment using audio equipment is a complicated process. In this chapter, we introduce a simple and efficient system to estimate room acoustic for plausible spatial audio rendering using 360$^{\circ }$ cameras for real scene reproduction in VR. A simplified 3D semantic model of the scene is estimated from captured images using computer vision algorithms and convolutional neural network (CNN). Spatially synchronised audio is reproduced based on the estimated geometric and acoustic properties in the scene. The reconstructed scenes are rendered with synthesised spatial audio.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

eBook: USD 12.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Immersive audio-visual scene reproduction using semantic scene reconstruction from 360 cameras

Article Open access 30 October 2021

Digital Signal Processing for Audio Applications: Then, Now and the Future

Environment Estimation for Glossy Reflections in Mixed Reality Applications Using a Neural Network

References

Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2481–2495 (2017)
Article Google Scholar
Bailey, W., Fazenda, B.M.: The effect of reverberation and audio spatialization on egocentric distance estimation of objects in stereoscopic virtual reality. J. Acoust. Soc. Am. 141(5), 3510 (2017)
Article Google Scholar
Bailey, W., Fazenda, B.M.: The effect of visual cues and binaural rendering method on plausibility in virtual environments. In: Proceedings of the 144th AES Convention, Milan, Italy (2018)
Google Scholar
Binelli, M., Pinardi, D., Nili, T., Farina, A.: Individualized HRTF for playing VR videos with Ambisonics spatial audio on HMDs. In: Proceedings of the AES Conference on Audio for Virtual and Augmented Reality, Redmond, USA (2018)
Google Scholar
Blauert, J.: Communication Acoustics. Springer, Berlin (2005). https://doi.org/10.1007/b139075
Book Google Scholar
Bonneel, N., Suied, C., Viaud-Delmon, I., Drettakis, G.: Bimodal perception of audio-visual material properties for virtual environments. ACM Trans. Appl. Percept. 7(1), 1:1–1:16 (2010)
Article Google Scholar
Bradley, J.S.: Review of objective room acoustics measures and future needs. Appl. Acoust. 72(10), 713–720 (2011)
Article Google Scholar
Brown, K., Paradis, M., Murphy, D.: OpenAirLib: a Javascript library for the acoustics of spaces. In: Audio Engineering Society Convention 142, May 2017. http://www.aes.org/e-lib/browse.cfm?elib=18586
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: Proceedings of the BMVC (2014)
Google Scholar
Coleman, P., Franck, A., Jackson, P.J.B., Hughes, R.J., Remaggi, L., Melchior, F.: Object-based reverberation for spatial audio. J. Audio Eng. Soc. 65(1/2), 66–77 (2017)
Article Google Scholar
Coleman, P., Franck, A., Menzies, D., Jackson, P.J.B.: Object-based reverberation encoding from first-order Ambisonic RIRs. In: Proceedings of the 142nd AES Convention, Berlin, Germany (2017)
Google Scholar
Cox, T.: Gun shot in anechoic chamber. Freesound (2013). https://freesound.org/people/acs272/sounds/210766/
Dou, M., Guan, L., Frahm, J.-M., Fuchs, H.: Exploring high-level plane primitives for indoor 3D reconstruction with a hand-held RGB-D camera. In: Park, J.-I., Kim, J. (eds.) ACCV 2012. LNCS, vol. 7729, pp. 94–108. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37484-5_9
Chapter Google Scholar
Farina, A.: Simultaneous measurement of impulse response and distortion with a swept-sine technique. In: Proceedings of the AES Convention (2000)
Google Scholar
Franck, A., Fazi, F.M.: VISR: a versatile open software framework for audio signal processing. In: Proceedings of the AES International Conference on Spatial Reproduction - Aesthetics and Science, Tokyo, Japan (2018)
Google Scholar
Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallet, D.S., Dahlgren, N.L.: DARPA TIMIT acoustic phonetic continuous speech corpus CDROM. Technical report, NIST Interagency (1993)
Google Scholar
Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Pearson, London (2017)
Google Scholar
Google: Google VR SDK (2017). https://developers.google.com/resonance-audio/
GoPro: GoPro Fusion (2018). https://shop.gopro.com/EMEA/cameras/fusion/CHDHZ-103-master.html
Gupta, A., Efros, A.A., Hebert, M.: Blocks world revisited: image understanding using qualitative geometry and mechanics. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 482–496. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_35
Chapter Google Scholar
Hoeg, W., Christensen, L., Walker, R.: Subjective assessment of audio quality - the means and methods within the EBU. Technical report, EBU Technical Review (1997)
Google Scholar
HTC: VIVE Pro (2018). https://www.vive.com/uk/product/vive-pro-full-kit/
Hulusic, V., et al.: Acoustic rendering and auditory-visual cross-modal perception and interaction. J. Comput. Graph. Forum 31(1), 102–131 (2012)
Article Google Scholar
Insta360: Insta360 ONE X (2018). https://www.insta360.com/product/insta360-onex
Jeong, C.H., Marbjerg, G., Brunskog, J.: Uncertainty of input data for room acoustic simulations. In: Proceedings of Bi-annual Baltic-Nordic Acoustic Meeting (2016)
Google Scholar
Judd, D.B.: Chromaticity sensibility to stimulus differences. J. Opt. Soc. Am. 22(2), 72 (1932)
Article Google Scholar
Kim, H., Campos, T., Hilton, A.: Room layout estimation with object and material attributes information using a spherical camera. In: Proceedings of the 3DV (2016)
Google Scholar
Kim, H., Hilton, A.: 3D scene reconstruction from multiple spherical stereo pairs. Int. J. Comput. Vis. 104(1), 94–116 (2013)
Article MathSciNet Google Scholar
Kim, H., et al.: Acoustic room modelling using a spherical camera for reverberant spatial audio objects. In: Audio Engineering Society Convention 142, Berlin, Germany (2017). http://www.aes.org/e-lib/browse.cfm?elib=18583
Kim, H., Hernaggi, L., Jackson, P.J., Hilton, A.: Immersive spatial audio reproduction for VR/AR using room acoustic modelling from 360 images. In: Proceedings of the IEEE VR Conference (2019)
Google Scholar
Kim, H., Sohn, K.: 3D reconstruction from stereo images for interactions between real and virtual objects. Sig. Process. Image Commun. 20(1), 61–75 (2005)
Article Google Scholar
Kwon, S.W., Bosche, F., Kim, C., Haas, C., Liapi, K.: Fitting range data to primitives for rapid local 3D modeling using sparse range point clouds. Autom. Constr. 13(1), 67–81 (2004)
Article Google Scholar
Larsson, P., Väljamäe, A., Västfjäll, D., Tajadura-Jiménez, A., Kleiner, M.: Auditory-induced presence in mixed reality environments and related technology. In: Dubois, E., Gray, P., Nigay, L. (eds.) The Engineering of Mixed Reality Systems. HCIS, pp. 143–163. Springer, London (2010). https://doi.org/10.1007/978-1-84882-733-2_8
Chapter Google Scholar
Li, M., Nan, L., Liu, S.: Fitting boxes to Manhattan scenes using linear integer programming. Int. J. Digit. Earth 9, 806–817 (2016)
Article Google Scholar
Lindau, A., Kosanke, L., Weinzierl, S.: Perceptual evaluation of model- and signal-based predictors of the mixing time in binaural room impulse responses. J. Audio Eng. Soc. 60(11), 887–898 (2012)
Google Scholar
Lindau, A., Weinzierl, S.: Assessing the plausibility of virtual acoustic environments. Acta Acust. United Acust. 98(5), 804–810 (2012)
Article Google Scholar
Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63(4), 561–580 (1975)
Article Google Scholar
Matas, J., Galambos, C., Kittler, J.: Robust detection of lines using the progressive probabilistic Hough transform. Comput. Vis. Image Underst. 78, 119–137 (2000)
Article Google Scholar
McArthur, A., Sandler, M., Stewart, R.: Perception of mismatched auditory distance - cinematic VR. In: Proceedings of the AES Conference on Audio for Virtual and Augmented Reality, Redmond, USA (2018)
Google Scholar
McGurk, H., MacDonald, J.: Hearing lips and seeing voices. Nature 264(5588), 746–748 (1976)
Article Google Scholar
Meng, Z., Zhao, F., He, M.: The just noticeable difference of noise length and reverberation perception. In: Proceedings of the International Symposium on Communications and Information Technologies, Bangkok, Thailand (2006)
Google Scholar
Naylor, P.A., Kounoudes, A., Gudnason, J., Brookes, M.: Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Trans. Audio Speech Lang. Process. 15(1), 34–43 (2007)
Article Google Scholar
Neidhardt, A., Tommy, A.I., Pereppadan, A.D.: Plausibility of an interactive approaching motion towards a virtual sound source based on simplified BRIR sets. In: Proceedings of the 144th AES Convention, Milan, Italy (2018)
Google Scholar
Nguatem, W., Drauschke, M., Mayer, H.: Finding cuboid-based building models in point clouds. In: Proceedings of ISPRS, pp. 149–154 (2012)
Google Scholar
Oculus: Oculus SDK (2017). https://developer.oculus.com/audio/
Pointgrey: Ladybug (2018). https://www.ptgrey.com/360-degree-spherical-camera-systems
Politis, A., Tervo, S., Lokki, T., Pulkki, V.: Parametric multidirectional decomposition of microphone recordings for broadband high-order Ambisonic encoding. In: Proceedings of the 144th AES Convention, Milan, Italy (2018)
Google Scholar
Pulkki, V.: Spatial sound reproduction with directional audio coding. J. Audio Eng. Soc. 55(6), 503–516 (2007)
Google Scholar
Remaggi, L., Jackson, P.J.B., Coleman, P.: Estimation of room reflection parameters for a reverberant spatial audio object. In: Proceedings of the 138th AES Convention, Warsaw, Poland (2015)
Google Scholar
Remaggi, L., Jackson, P.J.B., Coleman, P., Wang, W.: Acoustic reflector localization: novel image source reversion and direct localization methods. IEEE/ACM Trans. Audio Speech Lang. Process. 25(2), 296–309 (2017)
Article Google Scholar
Remaggi, L., Kim, H., Neidhardt, A., Hilton, A., Jackson, P.J.B.: Perceived quality and spatial impression of room reverberation in VR reproduction from measured images and acoustics. In: Proceedings of the ICA (2019)
Google Scholar
Ricoh: Ricoh Theta V (2018). https://theta360.com/en/about/theta/v.html
Rix, J., Haas, S., Teixeira, J.: Virtual Prototyping: Virtual Environments and the Product Design Process. Springer, Boston (2016)
Google Scholar
Rummukainen, O., Robotham, T., Schlecht, S.J., Plinge, A., Herre, J., Habets, E.A.P.: Audio quality evaluation in virtual reality: multiple stimulus ranking with behavior tracking. In: Proceedings of the AES Conference on Audio for Virtual and Augmented Reality, Redmond, USA (2018)
Google Scholar
Rumsey, F.: Spatial quality evaluation for reproduced sound: terminology, meaning, and a scene-based paradigm. J. Audio Eng. Soc. 50(9), 651–666 (2002)
Google Scholar
Schissler, C., Loftin, C., Manocha, D.: Acoustic classification and optimization for multi-modal rendering of real-world scenes. IEEE Trans. Vis. Comput. Graph. 24(3), 1246–1259 (2018)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)
Google Scholar
Song, S., Lichtenberg, S., Xiao, J.: SUN RGB-D: a RGB-D scene understanding benchmark suite. In: Proceedings of the CVPR (2015)
Google Scholar
Spheron: Spheron VR (2018). https://www.spheron.com/products.html
Stan, G.B., Embrechts, J.J., Archambeau, D.: Comparison of different impulse response measurement techniques. J. Audio Eng. Soc. 50(4), 249–262 (2002)
Google Scholar
Stecker, G.C., Moore, T.M., Folkerts, M., Zotkin, D., Duraiswami, R.: Toward objective measure of auditory co-immersion in virtual and augmented reality. In: Proceedings of the AES Conference on Audio for Virtual and Augmented Reality, Redmond, USA (2018)
Google Scholar
Stenzel, H., Jackson, P.J.B.: Perceptual thresholds of audio-visual spatial coherence for a variety of audio-visual objects. In: Proceedings of the AES Conference on Audio for Virtual and Augmented Reality, Redmond, USA (2018)
Google Scholar
Sun, B., Saenko, K.: From virtual to reality: fast adaptation of virtual object detectors to real domains. In: Proceedings of the BMVC, Nottingham, UK (2014)
Google Scholar
McKenzie, T., Murphy, D., Kearney, G.: Directional bias equalisation of first-order binaural Ambisonic rendering. In: Proceedings of the AES Conference on Audio for Virtual and Augmented Reality, Redmond, USA (2018)
Google Scholar
Unity Technologies: Unity (2018). https://unity3d.com/
Tervo, S., Patynen, J., Kuusinen, A., Lokki, T.: Spatial decomposition method for room impulse responses. J. Audio Eng. Soc. 61(1/2), 17–28 (2013)
Google Scholar
Tsingos, N., Funkhouser, T., Ngan, A., Carlbom, I.: Modeling acoustics in virtual environments using the uniform theory of diffraction. In: Proceedings of the ACM SIGGRAPH, pp. 545–552, Aug 2001
Google Scholar
Turk, M.: Multimodal interaction: a review. Pattern Recogn. Lett. 36, 189–195 (2014)
Article Google Scholar
Välimäki, V., Parker, J.D., Savioja, L., Smith, J.O., Abel, J.S.: Fifty years of artificial reverberation. IEEE TASLP 20(5), 1421–1448 (2012)
Google Scholar
Valve: Steamaudio SDK (2017). https://valvesoftware.github.io/steam-audio/
Vorländer, M.: Auralization: Fundamentals of Acoustics, Modelling, Simulation, Algorithms and Acoustic Virtual Reality. Springer, Berlin (2008). https://doi.org/10.1007/978-3-540-48830-9
Book Google Scholar
Vorländer, M.: Virtual acoustics: opportunities and limits of spatial sound reproduction. Arch. Acoust. 33(4), 413–422 (2008)
Google Scholar
Vorländer, M.: International round robin on room acoustical computer simulations. In: Proceedings of the ICA, Trondheim, Norway (1995)
Google Scholar
Zhang, Z.: Microsoft Kinect sensor and its effect. IEEE Multimed. 19(2), 4–10 (2012)
Article Google Scholar
Zheng, S., et al.: Dense semantic image segmentation with objects and attributes. In: Proceedings of the CVPR (2014)
Google Scholar
Zhu, H., Meng, F., Cai, J., Lu, S.: Beyond pixels: a comprehensive survey from bottom-up to semantic image segmentation and cosegmentation. J. Vis. Commun. Image Represent. 34, 12–27 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

CVSSP, University of Surrey, Guildford, UK
Hansung Kim, Luca Remaggi, Philip J. B. Jackson & Adrian Hilton
Creative Tech UK, Creative Labs, Staines-upon-Thames, UK
Luca Remaggi

Authors

Hansung Kim
View author publications
You can also search for this author in PubMed Google Scholar
Luca Remaggi
View author publications
You can also search for this author in PubMed Google Scholar
Philip J. B. Jackson
View author publications
You can also search for this author in PubMed Google Scholar
Adrian Hilton
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hansung Kim .

Editor information

Editors and Affiliations

TU Braunschweig, Brunswick, Germany
Marcus Magnor
Facebook Zurich, Zürich, Switzerland
Alexander Sorkine-Hornung

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kim, H., Remaggi, L., Jackson, P.J.B., Hilton, A. (2020). Immersive Virtual Reality Audio Rendering Adapted to the Listener and the Room. In: Magnor, M., Sorkine-Hornung, A. (eds) Real VR – Immersive Digital Reality. Lecture Notes in Computer Science(), vol 11900. Springer, Cham. https://doi.org/10.1007/978-3-030-41816-8_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-41816-8_13
Published: 03 March 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41815-1
Online ISBN: 978-3-030-41816-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics