SoundSpaces: Audio-Visual Navigation in 3D Environments

Chen, Changan; Jain, Unnat; Schissler, Carl; Gari, Sebastia Vicenc Amengual; Al-Halah, Ziad; Ithapu, Vamsi Krishna; Robinson, Philip; Grauman, Kristen

doi:10.1007/978-3-030-58539-6_2

Changan Chen^12,15,
Unnat Jain^13,15,
Carl Schissler¹⁴,
Sebastia Vicenc Amengual Gari¹⁴,
Ziad Al-Halah¹²,
Vamsi Krishna Ithapu¹⁴,
Philip Robinson¹⁴ &
…
Kristen Grauman^12,15

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12351))

Included in the following conference series:

European Conference on Computer Vision

6043 Accesses
102 Citations

Abstract

Moving around in the world is naturally a multisensory experience, but today’s embodied agents are deaf—restricted to solely their visual perception of the environment. We introduce audio-visual navigation for complex, acoustically and visually realistic 3D environments. By both seeing and hearing, the agent must learn to navigate to a sounding object. We propose a multi-modal deep reinforcement learning approach to train navigation policies end-to-end from a stream of egocentric audio-visual observations, allowing the agent to (1) discover elements of the geometry of the physical space indicated by the reverberating audio and (2) detect and follow sound-emitting targets. We further introduce SoundSpaces: a first-of-its-kind dataset of audio renderings based on geometrical acoustic simulations for two sets of publicly available 3D environments (Matterport3D and Replica), and we instrument Habitat to support the new sensor, making it possible to insert arbitrary sound sources in an array of real-world scanned environments. Our results show that audio greatly benefits embodied visual navigation in 3D spaces, and our work lays groundwork for new research in embodied AI with audio-visual perception. Project: http://vision.cs.utexas.edu/projects/audio_visual_navigation.

C. Chen and U. Jain—Contributed equally.

U. Jain—Work done as an intern at Facebook AI Research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Audio-Visual Navigation with Anti-Backtracking

Out of the Box: Embodied Navigation in the Real World

Visually-Guided Audio Spatialization in Video with Geometry-Aware Multi-task Learning

Article 20 June 2023

Notes

1.
While algorithms could also run with ambisonic inputs, using binaural sound has the advantage of allowing human listeners to interpret our video results (see Supp video).
2.
Replica has more multi-room trajectories, where audio gives clear cues of room entrances/exits (vs. open floor plans in Matterport). This may be why AG is better than PG and APG on Replica.

References

Alameda-Pineda, X., Horaud, R.: Vision-guided robot hearing. Int. J. Robot. Res. 34, 437–456 (2015)
Article Google Scholar
Alameda-Pineda, X., et al.: Salsa: a novel dataset for multimodal group behavior analysis. IEEE Trans. Pattern Anal. Mach. intell. 38(8), 1707–1720 (2015)
Article Google Scholar
Ammirato, P., Poirson, P., Park, E., Kosecka, J., Berg, A.: A dataset for developing and benchmarking active vision. In: ICRA (2016)
Google Scholar
Anderson, P., et al.: On evaluation of embodied navigation agents. arXiv preprint arXiv:1807.06757 (2018)
Anderson, P., et al.: Vision-and-language navigation: interpreting visually-grounded navigation instructions in real environments. In: CVPR (2018)
Google Scholar
Arandjelovic, R., Zisserman, A.: Objects that sound. In: ECCV (2018)
Google Scholar
Armeni, I., Sax, A., Zamir, A.R., Savarese, S.: Joint 2D–3D-Semantic Data for Indoor Scene Understanding. ArXiv e-prints, February 2017
Google Scholar
Ban, Y., Girin, L., Alameda-Pineda, X., Horaud, R.: Exploiting the complementarity of audio and visual data in multi-speaker tracking. In: ICCV Workshop on Computer Vision for Audio-Visual Media. 2017 IEEE International Conference on Computer Vision Workshops (ICCVW) (2017). https://hal.inria.fr/hal-01577965
Ban, Y., Li, X., Alameda-Pineda, X., Girin, L., Horaud, R.: Accounting for room acoustics in audio-visual multi-speaker tracking. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018)
Google Scholar
Brodeur, S., et al.: Home: a household multimodal environment. https://arxiv.org/abs/1711.11017 (2017)
Cao, C., Ren, Z., Schissler, C., Manocha, D., Zhou, K.: Interactive sound propagation with bidirectional path tracing. ACM Trans. Graph. (TOG) 35(6), 1–11 (2016)
Article Google Scholar
Chang, A., et al.: Matterport3D: learning from RGB-D data in indoor environments. In: 3DV (2017)
Google Scholar
Chang, A., et al.: Matterport3D: learning from RGB-D data in indoor environments. In: Proceedings of the International Conference on 3D Vision (3DV) (2017)
Google Scholar
Chaplot, D.S., Gupta, S., Gupta, A., Salakhutdinov, R.: Learning to explore using active neural mapping. In: ICLR (2020)
Google Scholar
Chen, H., Suhr, A., Misra, D., Snavely, N., Artzi, Y.: Touchdown: natural language navigation and spatial reasoning in visual street environments. In: CVPR (2019)
Google Scholar
Chen, L., Srivastava, S., Duan, Z., Xu, C.: Deep cross-modal audio-visual generation. In: Proceedings of the on Thematic Workshops of ACM Multimedia 2017. ACM (2017)
Google Scholar
Chen, T., Gupta, S., Gupta, A.: Learning exploration policies for navigation. http://arxiv.org/abs/1903.01959
Chung, J., Kastner, K., Dinh, L., Goel, K., Courville, A.C., Bengio, Y.: A recurrent latent variable model for sequential data. In: NeurIPS (2015)
Google Scholar
Connors, E.C., Yazzolino, L.A., Sánchez, J., Merabet, L.B.: Development of an audio-based virtual gaming environment to assist with navigation skills in the blind. J. Vis. Exp. JoVE 73, e50272 (2013)
Google Scholar
Das, A., Datta, S., Gkioxari, G., Lee, S., Parikh, D., Batra, D.: Embodied question answering. In: CVPR (2018)
Google Scholar
Das, A., Gkioxari, G., Lee, S., Parikh, D., Batra, D.: Neural modular control for embodied question answering. In: ECCV (2018)
Google Scholar
Das, A., et al.: Probing emergent semantics in predictive agents via question answering. In: ICML (2020)
Google Scholar
Egan, M.D., Quirt, J., Rousseau, M.: Architectural Acoustics. Elsevier, Amsterdam (1989)
Google Scholar
Ekstrom, A.D.: Why vision is important to how we navigate. Hippocampus 25, 731–735 (2015)
Article Google Scholar
Ephrat, A., et al.: Looking to listen at the cocktail party: a speaker-independent audio-visual model for speech separation. In: SIGGRAPH (2018)
Google Scholar
Evers, C., Naylor, P.: Acoustic slam. IEEE/ACM Trans. Audio Speech Lang. Process. 26(9), 1484–1498 (2018)
Article Google Scholar
Fortin, M., et al.: Wayfinding in the blind: larger hippocampal volume and supranormal spatial navigation. Brain 131, 2995–3005 (2008)
Article Google Scholar
Gan, C., Zhang, Y., Wu, J., Gong, B., Tenenbaum, J.: Look, listen, and act: towards audio-visual embodied navigation. In: ICRA (2020)
Google Scholar
Gao, R., Chen, C., Al-Halah, Z., Schissler, C., Grauman, K.: VisualEchoes: spatial image representation learning through echolocation. In: ECCV (2020)
Google Scholar
Gao, R., Feris, R., Grauman, K.: Learning to separate object sounds by watching unlabeled video. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 36–54. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_3
Chapter Google Scholar
Gao, R., Grauman, K.: 2.5 D visual sound. In: CVPR (2019)
Google Scholar
Gao, R., Grauman, K.: Co-separating sounds of visual objects. In: ICCV (2019)
Google Scholar
Gebru, I.D., Ba, S., Evangelidis, G., Horaud, R.: Tracking the active speaker based on a joint audio-visual observation model. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 15–21 (2015)
Google Scholar
Gordon, D., Kembhavi, A., Rastegari, M., Redmon, J., Fox, D., Farhadi, A.: IQA: visual question answering in interactive environments. In: CVPR (2018)
Google Scholar
Gordon, D., Kadian, A., Parikh, D., Hoffman, J., Batra, D.: SplitNet: Sim2Sim and Task2Task transfer for embodied visual navigation. In: ICCV (2019)
Google Scholar
Gougoux, F., Zatorre, R.J., Lassonde, M., Voss, P., Lepore, F.: A functional neuroimaging study of sound localization: visual cortex activity predicts performance in early-blind individuals. PLoS Biol. 3(2), e27 (2005)
Article Google Scholar
Gunther, R., Kazman, R., MacGregor, C.: Using 3D sound as a navigational aid in virtual environments. Behav. Inf. Technol. 23(6), 435–446 (2010). https://doi.org/10.1080/01449290410001723364
Article Google Scholar
Gupta, S., Davidson, J., Levine, S., Sukthankar, R., Malik, J.: Cognitive mapping and planning for visual navigation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2616–2625 (2017)
Google Scholar
Gupta, S., Fouhey, D., Levine, S., Malik, J.: Unifying map and landmark based representations for visual navigation. arXiv preprint arXiv:1712.08125 (2017)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: ICML (2018)
Google Scholar
Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2004)
Book Google Scholar
Henriques, J.F., Vedaldi, A.: MapNet: an allocentric spatial memory for mapping environments. In: CVPR (2018)
Google Scholar
Hershey, J.R., Movellan, J.R.: Audio vision: using audio-visual synchrony to locate sounds. In: NeurIPS (2000)
Google Scholar
Jain, U., et al.: A cordial sync: going beyond marginal policies for multi-agent embodied tasks. In: ECCV (2020)
Google Scholar
Jain, U., et al.: Two body problem: collaborative visual task completion. In: CVPR (2019)
Google Scholar
Jayaraman, D., Grauman, K.: End-to-end policy learning for active visual categorization. TPAMI 41(7), 1601–1614 (2018)
Article Google Scholar
Johnson, M., Hofmann, K., Hutton, T., Bignell, D.: The malmo platform for artificial intelligence experimentation. In: International Joint Conference on AI (2016)
Google Scholar
Kempka, M., Wydmuch, M., Runc, G., Toczek, J., Jakowski, W.: ViZDoom: a doom-based AI research platform for visual reinforcement learning. In: Proceedings of the IEEE Conference on Computational Intelligence and Games (2016)
Google Scholar
Kingma, D., Ba, J.: A method for stochastic optimization. In: CVPR (2017)
Google Scholar
Kojima, N., Deng, J.: To learn or not to learn: analyzing the role of learning for navigation in virtual environments. arXiv preprint arXiv:1907.11770 (2019)
Kolve, E., et al.: AI2-THOR: an interactive 3D environment for visual AI. arXiv (2017)
Google Scholar
Kuttruff, H.: Room Acoustics. CRC Press, Boca Raton (2016)
Book Google Scholar
Lerer, A., Gross, S., Fergus, R.: Learning physical intuition of block towers by example. In: ICML (2016)
Google Scholar
Lessard, N., Paré, M., Lepore, F., Lassonde, M.: Early-blind human subjects localize sound sources better than sighted subjects. Nature 395, 278–280 (1998)
Article Google Scholar
Savva, M., et al.: Habitat: a platform for embodied AI research. In: ICCV (2019)
Google Scholar
Massiceti, D., Hicks, S.L., van Rheede, J.J.: Stereosonic vision: exploring visual-to-auditory sensory substitution mappings in an immersive virtual reality navigation paradigm. PLoS ONE 13(7), e0199389 (2018)
Article Google Scholar
Merabet, L., Sanchez, J.: Audio-based navigation using virtual environments: combining technology and neuroscience. AER J. Res. Pract. Vis. Impair. Blind. 2, 128–137 (2009)
Google Scholar
Merabet, L.B., Pascual-Leone, A.: Neural reorganization following sensory loss: the opportunity of change. Nat. Rev. Neurosci. 11, 44–52 (2010)
Article Google Scholar
Mirowski, P., et al.: Learning to navigate in complex environments. In: ICLR (2017)
Google Scholar
Mishkin, D., Dosovitskiy, A., Koltun, V.: Benchmarking classic and learned navigation in complex 3D environments. arXiv preprint arXiv:1901.10915 (2019)
Morgado, P., Nvasconcelos, N., Langlois, T., Wang, O.: Self-supervised generation of spatial audio for 360 video. In: NeurIPS (2018)
Google Scholar
Murali, A. et al..: PyRobot: an open-source robotics framework for research and benchmarking. arXiv preprint arXiv:1906.08236 (2019)
Nakadai, K., Lourens, T., Okuno, H.G., Kitano, H.: Active audition for humanoid. In: AAAI (2000)
Google Scholar
Nakadai, K., Nakamura, K.: Sound source localization and separation. Wiley Encyclopedia of Electrical and Electronics Engineering (1999)
Google Scholar
Nakadai, K., Okuno, H.G., Kitano, H.: Epipolar geometry based sound localization and extraction for humanoid audition. In: IROS Workshops. IEEE (2001)
Google Scholar
Owens, A., Efros, A.A.: Audio-visual scene analysis with self-supervised multisensory features. In: ECCV (2018)
Google Scholar
Owens, A., Isola, P., McDermott, J., Torralba, A., Adelson, E.H., Freeman, W.T.: Visually indicated sounds. In: CVPR (2016)
Google Scholar
Owens, A., Wu, J., McDermott, J.H., Freeman, W.T., Torralba, A.: Ambient sound provides supervision for visual learning. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 801–816. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_48
Chapter Google Scholar
Picinali, L., Afonso, A., Denis, M., Katz, B.: Exploration of architectural spaces by blind people using auditory virtual reality for the construction of spatial knowledge. Int. J. Hum.-Comput. Stud. 72(4), 393–407 (2014)
Article Google Scholar
Qin, J., Cheng, J., Wu, X., Xu, Y.: A learning based approach to audio surveillance in household environment. Int. J. Inf. Acquis. 3, 213–219 (2006)
Article Google Scholar
Rascon, C., Meza, I.: Localization of sound sources in robotics: a review. Robot. Auton. Syst. 96, 184–210 (2017)
Article Google Scholar
RoÈder, B., Teder-SaÈlejaÈrvi, W., Sterr, A., RoÈsler, F., Hillyard, S.A., Neville, H.J.: Improved auditory spatial tuning in blind humans. Nature 400, 162–166 (1999)
Article Google Scholar
Romano, J.M., Brindza, J.P., Kuchenbecker, K.J.: ROS open-source audio recognizer: ROAR environmental sound detection tools for robot programming. Auton. Robot. 34, 207–215 (2013). https://doi.org/10.1007/s10514-013-9323-6
Article Google Scholar
Savinov, N., Dosovitskiy, A., Koltun, V.: Semi-parametric topological memory for navigation. In: ICLR (2018)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Senocak, A., Oh, T.H., Kim, J., Yang, M.H., So Kweon, I.: Learning to localize sound source in visual scenes. In: CVPR (2018)
Google Scholar
Straub, J., et al.: The replica dataset: a digital replica of indoor spaces. arXiv preprint arXiv:1906.05797 (2019)
Sukhbaatar, S., Szlam, A., Synnaeve, G., Chintala, S., Fergus, R.: Mazebase: a sandbox for learning from games. arXiv preprint arXiv:1511.07401 (2015)
Thinus-Blanc, C., Gaunet, F.: Representation of space in blind persons: vision as a spatial sense? Psychol. Bull. 121, 20 (1997)
Article Google Scholar
Thomason, J., Gordon, D., Bisk, Y.: Shifting the baseline: single modality performance on visual navigation & QA. In: NAACL-HLT (2019)
Google Scholar
Thrun, S., Burgard, W., Fox, D.: Probabilistic Robotics. MIT Press, Cambridge (2005)
MATH Google Scholar
Tian, Y., Shi, J., Li, B., Duan, Z., Xu, C.: Audio-visual event localization in unconstrained videos. In: ECCV (2018)
Google Scholar
Tolman, E.C.: Cognitive maps in rats and men. Psychol. Rev. 55, 189 (1948)
Article Google Scholar
van der Maaten, L., Hinton, G.: Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
MATH Google Scholar
Veach, E., Guibas, L.: Bidirectional estimators for light transport. In: Sakas, G., Muller, S., Shirley, P. (eds) Photorealistic Rendering Techniques, pp. 145–167. Springer, Heidelberg (1995). https://doi.org/10.1007/978-3-642-87825-1_11
Viciana-Abad, R., Marfil, R., Perez-Lorenzo, J., Bandera, J., Romero-Garces, A., Reche-Lopez, P.: Audio-visual perception system for a humanoid robotic head. Sensors 14, 9522–9545 (2014)
Article Google Scholar
Voss, P., Lassonde, M., Gougoux, F., Fortin, M., Guillemot, J.P., Lepore, F.: Early-and late-onset blind individuals show supra-normal auditory abilities in far-space. Curr. Biol. 14(19), 1734–1738 (2004)
Article Google Scholar
Wang, Y., Kapadia, M., Huang, P., Kavan, L., Badler, N.: Sound localization and multi-modal steering for autonomous virtual agents. In: Symposium on Interactive 3D Graphics and Games (2014)
Google Scholar
Wijmans, E., et al.: Embodied question answering in photorealistic environments with point cloud perception. In: CVPR (2019)
Google Scholar
Wijmans, E., et al.: Decentralized distributed PPO: solving PointGoal navigation. In: ICLR (2020)
Google Scholar
Wood, J., Magennis, M., Arias, E.F.C., Gutierrez, T., Graupp, H., Bergamasco, M.: The design and evaluation of a computer game for the blind in the GRAB haptic audio virtual environment. In: Proceedings of Eurohpatics (2003)
Google Scholar
Wortsman, M., Ehsani, K., Rastegari, M., Farhadi, A., Mottaghi, R.: Learning to learn how to learn: self-adaptive visual navigation using meta-learning. In: CVPR (2019)
Google Scholar
Woubie, A., Kanervisto, A., Karttunen, J., Hautamaki, V.: Do autonomous agents benefit from hearing? arXiv preprint arXiv:1905.04192 (2019)
Wu, X., Gong, H., Chen, P., Zhong, Z., Xu, Y.: Surveillance robot utilizing video and audio information. J. Intell. Robot. Syst. 55, 403–421 (2009). https://doi.org/10.1007/s10846-008-9297-3
Article MATH Google Scholar
Wu, Y., Wu, Y., Tamar, A., Russell, S., Gkioxari, G., Tian, Y.: Bayesian relational memory for semantic visual navigation. In: ICCV (2019)
Google Scholar
Wymann, B., Espié, E., Guionneau, C., Dimitrakakis, C., Coulom, R., Sumner, A.: TORCS, the open racing car simulator (2013). http://www.torcs.org
Xia, F., et al.: Interactive Gibson: a benchmark for interactive navigation in cluttered environments. arXiv preprint arXiv:1910.14442 (2019)
Xia, F., Zamir, A.R., He, Z., Sax, A., Malik, J., Savarese, S.: Gibson Env: real-world perception for embodied agents. In: CVPR (2018)
Google Scholar
Yoshida, T., Nakadai, K., Okuno, H.G.: Automatic speech recognition improved by two-layered audio-visual integration for robot audition. In: 2009 9th IEEE-RAS International Conference on Humanoid Robots, pp. 604–609. IEEE (2009)
Google Scholar
Aytar, Y., Vondrick, C., Torralba, A.: Learning sound representations from unlabeled video. In: NeurIPS (2016)
Google Scholar
Aytar, Y., Vondrick, C., Torralba, A.: See, hear, and read: deep aligned representations. arXiv:1706.00932 (2017)
Zaunschirm, M., Schörkhuber, C., Höldrich, R.: Binaural rendering of ambisonic signals by head-related impulse response time alignment and a diffuseness constraint. J. Acoust. Soc. Am. 143, 3616 (2018)
Article Google Scholar
Zhao, H., Gan, C., Rouditchenko, A., Vondrick, C., McDermott, J., Torralba, A.: The sound of pixels. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 587–604. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_35
Chapter Google Scholar
Zhou, Y., Wang, Z., Fang, C., Bui, T., Berg, T.L.: Visual to sound: generating natural sound for videos in the wild. In: CVPR (2018)
Google Scholar
Zhu, Y., et al.: Visual semantic planning using deep successor representations. In: ICCV (2017)
Google Scholar
Zhu, Y., et al.: Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: ICRA (2017)
Google Scholar

Download references

Acknowledgements

UT Austin is supported in part by DARPA Lifelong Learning Machines. We thank Alexander Schwing, Dhruv Batra, Erik Wijmans, Oleksandr Maksymets, Ruohan Gao, and Svetlana Lazebnik for valuable discussions and support with the AI-Habitat platform.

Author information

Authors and Affiliations

UT Austin, Austin, USA
Changan Chen, Ziad Al-Halah & Kristen Grauman
UIUC, Champaign, USA
Unnat Jain
Facebook Reality Labs, Pittsburgh, USA
Carl Schissler, Sebastia Vicenc Amengual Gari, Vamsi Krishna Ithapu & Philip Robinson
Facebook AI Research, Pittsburgh, USA
Changan Chen, Unnat Jain & Kristen Grauman

Authors

Changan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Unnat Jain
View author publications
You can also search for this author in PubMed Google Scholar
Carl Schissler
View author publications
You can also search for this author in PubMed Google Scholar
Sebastia Vicenc Amengual Gari
View author publications
You can also search for this author in PubMed Google Scholar
Ziad Al-Halah
View author publications
You can also search for this author in PubMed Google Scholar
Vamsi Krishna Ithapu
View author publications
You can also search for this author in PubMed Google Scholar
Philip Robinson
View author publications
You can also search for this author in PubMed Google Scholar
Kristen Grauman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Changan Chen .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 5785 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, C. et al. (2020). SoundSpaces: Audio-Visual Navigation in 3D Environments. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12351. Springer, Cham. https://doi.org/10.1007/978-3-030-58539-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-58539-6_2
Published: 07 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58538-9
Online ISBN: 978-3-030-58539-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

SoundSpaces: Audio-Visual Navigation in 3D Environments