Skip to main content

R3DS: Reality-Linked 3D Scenes for Panoramic Scene Understanding

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15121))

Included in the following conference series:

  • 235 Accesses

Abstract

We introduce the Reality-linked 3D Scenes (R3DS) dataset of synthetic 3D scenes mirroring the real-world scene arrangements from Matterport3D panoramas. Compared to prior work, R3DS has more complete and densely populated scenes with objects linked to real-world observations in panoramas. R3DS also provides an object support hierarchy, and matching object sets (e.g., same chairs around a dining table) for each scene. Overall, R3DS contains 19K objects represented by 3,784 distinct CAD models from over 100 object categories. We demonstrate the effectiveness of R3DS on the Panoramic Scene Understanding task. We find that: 1) training on R3DS enables better generalization; 2) support relation prediction trained with R3DS improves performance compared to heuristically calculated support; and 3) R3DS offers a challenging benchmark for future work on panoramic scene understanding.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Avetisyan, A., Dahnert, M., Dai, A., Savva, M., Chang, A.X., Nießner, M.: Scan2CAD: learning CAD model alignment in RGB-D scans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  2. Batra, D., et al.: Rearrangement: a challenge for embodied AI. arXiv preprint arXiv:2011.01975 (2020)

  3. Chang, A., et al.: Matterport3D: learning from RGB-D data in indoor environments. In: Proceedings of the International Conference on 3D Vision (3DV), pp. 667–676. IEEE (2017)

    Google Scholar 

  4. Chang, A.X., et al.: Shapenet: an information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015)

  5. Collins, J., et al.: Abo: dataset and benchmarks for real-world 3d object understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 21126–21136 (2022)

    Google Scholar 

  6. Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: Scannet: richly-annotated 3d reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5828–5839 (2017)

    Google Scholar 

  7. Dong, Y., Fang, C., Dong, Z., Bo, L., Tan, P.: PanoContext-Former: Panoramic total scene understanding with a transformer. arXiv preprint arXiv:2305.12497 (2023)

  8. Fu, H., et al.: 3D-FRONT: 3D Furnished Rooms with layOuts and semaNTics. arXiv preprint arXiv:2011.09127 (2020)

  9. Fu, H., et al.: 3D-Future: 3D Furniture shape with TextURE. arXiv preprint arXiv:2009.09633 (2020)

  10. Hua, B.S., Pham, Q.H., Nguyen, D.T., Tran, M.K., Yu, L.F., Yeung, S.K.: SceneNN: a scene meshes dataset with annotations. In: Proceedings of the International Conference on 3D Vision (3DV), pp. 92–101. IEEE (2016)

    Google Scholar 

  11. Karras, T.: Maximizing parallelism in the construction of bvhs, octrees, and k-d trees. In: Proceedings of the Fourth ACM SIGGRAPH/Eurographics Conference on High-Performance Graphics, pp. 33–37. Eurographics Association (2012). https://doi.org/10.2312/EGGH/HPG12/033-037

  12. Li, Z., et al.: OpenRooms: an end-to-end open framework for photorealistic indoor scene datasets. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

    Google Scholar 

  13. Lim, J.J., Pirsiavash, H., Torralba, A.: Parsing ikea objects: Fine pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2992–2999 (2013)

    Google Scholar 

  14. Maninis, K.K., Popov, S., Nießner, M., Ferrari, V.: CAD-estate: large-scale CAD model annotation in RGB videos. arXiv preprint arXiv:2306.09011 (2023)

  15. Paschalidou, D., Kar, A., Shugrina, M., Kreis, K., Geiger, A., Fidler, S.: Atiss: autoregressive transformers for indoor scene synthesis. Adv. Neural. Inf. Process. Syst. 34, 12013–12026 (2021)

    Google Scholar 

  16. Ramakrishnan, S.K., et al.: Habitat-Matterport 3D dataset (hm3d): 1000 large-scale 3D environments for embodied AI. arXiv preprint arXiv:2109.08238 (2021)

  17. Sadalgi, S.: Wayfair’s 3D Model API. https://www.aboutwayfair.com/tech-innovation/wayfairs-3d-model-api (2016). Accessed 15 Nov 2023

  18. Shen, B., et al.: iGibson, a simulation environment for interactive tasks in large realistic scenes. In: Proceedings of the International Conference on Intelligent Robots and Systems (IROS) (2021)

    Google Scholar 

  19. Shen, B., et al.: iGibson 1.0: a simulation environment for interactive tasks in large realistic scenes. In: Proceedings of the International Conference on Intelligent Robots and Systems (IROS), pp. 7520–7527. IEEE (2021)

    Google Scholar 

  20. Straub, J., et al.: The replica dataset: a digital replica of indoor spaces. arXiv preprint arXiv:1906.05797 (2019)

  21. Sun, C., Hsiao, C.W., Sun, M., Chen, H.T.: Horizonnet: learning room layout with 1d representation and pano stretch data augmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1047–1056 (2019)

    Google Scholar 

  22. Sun, X., et al.: Pix3D: dataset and methods for single-image 3D shape modeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2974–2983 (2018)

    Google Scholar 

  23. Szot, A., et al.: Habitat 2.0: training home assistants to rearrange their habitat. Advances in Neural Information Processing Systems 34, 251–266 (2021)

    Google Scholar 

  24. Tzionas, D., Ballan, L., Srikantha, A., Aponte, P., Pollefeys, M., Gall, J.: Capturing hands in action using discriminative salient points and physics simulation. Int. J. Comput. Vis. (IJCV) 118(2), 172–193 (2016). https://doi.org/10.1007/s11263-016-0895-4

  25. Wang, K., Lin, Y.A., Weissmann, B., Savva, M., Chang, A.X., Ritchie, D.: Planit: planning and instantiating indoor scenes with relation graph and spatial prior networks. ACM Trans. Graph. (TOG) 38(4), 1–15 (2019)

    Article  Google Scholar 

  26. Xiang, Y., et al.: Objectnet3D: a large scale database for 3D object recognition. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 160–176. Springer (2016)

    Google Scholar 

  27. Yadav, K., et al.: Habitat-Matterport 3D semantics dataset. arXiv preprint arXiv:2210.05633 (2022)

  28. Zhang, C., et al.: DeepPanoContext: panoramic 3D scene understanding with holistic scene context graph and relation-based optimization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 12632–12641 (2021)

    Google Scholar 

  29. Zhang, Y., Song, S., Tan, P., Xiao, J.: PanoContext: A whole-room 3D context model for panoramic scene understanding. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 668–686. Springer (2014)

    Google Scholar 

  30. Zheng, J., Zhang, J., Li, J., Tang, R., Gao, S., Zhou, Z.: Structured3D: a large photo-realistic dataset for structured 3D modeling. arXiv preprint arXiv:1908.00222 (2019)

  31. Zhou, T., Tucker, R., Flynn, J., Fyffe, G., Snavely, N.: Stereo magnification: learning view synthesis using multiplane images. ACM Trans. Graph. (TOG) 37(4), 1–12 (2018)

    Article  Google Scholar 

Download references

Acknowledgements

This work was funded in part by a CIFAR AI Chair, a Canada Research Chair, NSERC Discovery Grant, NSF award #2016532, and enabled by support from WestGrid and Compute Canada. Daniel Ritchie is an advisor to Geopipe and owns equity in the company. Geopipe is a start-up that is developing 3D technology to build immersive virtual copies of the real world with applications in various fields, including games and architecture. We thank Madhawa Vidanapathirana, Weijie Lin, and David Han for help with development of the annotation tool, and Denys Iliash, Mrinal Goshalia, Brandon Robles, Paul Brown, Chloe Ye, Coco Kaleel, Elizabeth Wu and Hannah Julius for data annotation, and Ivan Tam, Austin Wang, and Ning Wang for feedback on the paper draft.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qirui Wu .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 8032 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, Q., Raychaudhuri, S., Ritchie, D., Savva, M., Chang, A.X. (2025). R3DS: Reality-Linked 3D Scenes for Panoramic Scene Understanding. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15121. Springer, Cham. https://doi.org/10.1007/978-3-031-73036-8_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-73036-8_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-73035-1

  • Online ISBN: 978-3-031-73036-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics