Skip to main content

A Survey on 3D Scene Graphs: Definition, Generation and Application

  • Conference paper
  • First Online:
Robot Intelligence Technology and Applications 7 (RiTA 2022)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 642))

Abstract

With the advancement of intelligent agents, 3D scene understanding has become one of key tasks of computer vision. 3D scene understanding is challenging to represent effectively because objects form various relationships and constantly interact with each other. A scene graph is a powerful tool to concisely represent the properties and relationships of objects in a scene—enabling various multi-modal tasks. Therefore, research on 3D scene graph (3DSG) is attracting increasing attention. However, 3DSG research is in its early stage—requiring a systematically organized survey. In this paper, we survey the latest advancement of 3DSG. In addition, we clarify 3DSG concepts that are currently defined in various ways, provide real-world applicability and present future research directions.

J. Bae and D. Shin—The two authors contributed equally to this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note that SG generation (SGG) or SG prediction or SG construction are interchangeable.

References

  1. Zelinsky, G.J.: Understanding scene understanding (2013)

    Google Scholar 

  2. Wang, W., Yang, Y., Wang, X., Wang, W., Li, J.: Development of convolutional neural network and its application in image classification: a survey. Opt. Eng. 58(4), 040901 (2019)

    Google Scholar 

  3. Zaidi, S.S.A., Ansari, M.S., Aslam, A., Kanwal, N., Asghar, M., Lee, B.: A survey of modern deep learning based object detection models. Digit. Signal Process. 103514 (2022)

    Google Scholar 

  4. Liu, X., Deng, Z., Yang, Y.: Recent progress in semantic image segmentation. Artif. Intell. Rev. 52(2), 1089–1106 (2019)

    Article  Google Scholar 

  5. Kim, U.-H., Park, J.-M., Song, T.-J., Kim, J.-H.: 3-D scene graph: a sparse and semantic representation of physical environments for intelligent agents. IEEE Trans. Cybern. 50(12), 4921–4933 (2019). https://github.com/Uehwan/3-D-Scene-Graph

  6. Hughes, N., Chang, Y., Carlone, L.: Hydra: a real-time spatial perception system for 3D scene graph construction and optimization (2022)

    Google Scholar 

  7. Fisher, M., Savva, M., Hanrahan, P.: Characterizing structural relationships in scenes using graph kernels. In: SIGGRAPH, pp. 1–12 (2011)

    Google Scholar 

  8. Tobler, R.F.: Separating semantics from rendering: a scene graph based architecture for graphics applications. Vis. Comput. 27(6), 687–695 (2011)

    Article  Google Scholar 

  9. Johnson, J., et al.: Image retrieval using scene graphs. In: CVPR, pp. 3668–3678 (2015)

    Google Scholar 

  10. Lu, C., Krishna, R., Bernstein, M., Fei-Fei, L.: Visual relationship detection with language priors. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 852–869. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_51

    Chapter  Google Scholar 

  11. Xu, D., Zhu, Y., Choy, C.B., Fei-Fei, L.: Scene graph generation by iterative message passing. In: CVPR, pp. 5410–5419 (2017)

    Google Scholar 

  12. Li, Y., Ouyang, W., Zhou, B., Wang, K., Wang, X.: Scene graph generation from objects, phrases and region captions. In: ICCV, pp. 1261–1270 (2017)

    Google Scholar 

  13. Yang, J., Lu, J., Lee, S., Batra, D., Parikh, D.: Graph R-CNN for scene graph generation. In: ECCV, pp. 670–685 (2018)

    Google Scholar 

  14. Li, Y., Ouyang, W., Zhou, B., Shi, J., Zhang, C., Wang, X.: Factorizable net: an efficient subgraph-based framework for scene graph generation. In: ECCV, pp. 335–351 (2018)

    Google Scholar 

  15. Tang, K., Niu, Y., Huang, J., Shi, J., Zhang, H.: Unbiased scene graph generation from biased training. In: CVPR, pp. 3716–3725 (2020)

    Google Scholar 

  16. Shang, X., Ren, T., Guo,J., Zhang, H., Chua, T.-S.: Video visual relation detection. In: ACM Multimedia (2017)

    Google Scholar 

  17. Tsai, Y.-H.H., Divvala, S., Morency, L.-P., Salakhutdinov, R., Farhadi, A.: Video relationship reasoning using gated spatio-temporal energy graph. In: CVPR, pp. 10424–10433 (2019)

    Google Scholar 

  18. Teng, Y., Wang, L., Li, Z., Wu, G. : Target adaptive context aggregation for video scene graph generation. In: CVPR, pp. 13688–13697 (2021)

    Google Scholar 

  19. Cong, Y., Liao, W., Ackermann, H., Rosenhahn, B., Yang, M.Y.: Spatial-temporal transformer for dynamic scene graph generation. In: CVPR, pp. 16372–16382 (2021)

    Google Scholar 

  20. Li, Y., Yang, X., Xu, C.: Dynamic scene graph generation via anticipatory pre-training. In: CVPR, pp. 13874–13883 (2022)

    Google Scholar 

  21. Gay, P., Stuart, J., Del Bue, A.: Visual graphs from motion (VGfM): scene understanding with object geometry reasoning. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11363, pp. 330–346. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20893-6_21

    Chapter  Google Scholar 

  22. Chang, X., Ren, P., Xu, P., Li, Z., Chen, X., Hauptmann, A.G.: A comprehensive survey of scene graphs: generation and application. TPAMI 45, 1–26 (2021)

    Article  Google Scholar 

  23. Zhu, G., et al.: Scene graph generation: a comprehensive survey. arXiv preprint arXiv:2201.00443 (2022)

  24. Wald, J., Dhamo, H., Navab, N., Tombari, F.: Learning 3D semantic scene graphs from 3d indoor reconstructions. In: CVPR, pp. 3961–3970 (2020). https://3dssg.github.io/#download

  25. Zhang, S., Hao, A., Qin, H., et al.: Knowledge-inspired 3D scene graph prediction in point cloud. In: NeurIPS, vol. 34, pp. 18620–18632 (2021)

    Google Scholar 

  26. Armeni, I., et al.: 3D scene graph: a structure for unified semantics, 3D space, and camera. In: CVPR, pp. 5664–5673 (2019). https://github.com/StanfordVL/3DSceneGraph

  27. Rosinol, A., et al.: Kimera: from slam to spatial perception with 3D dynamic scene graphs. Int. J. Robot. Res. 40(12-14), 1510–1546 (2021). https://github.com/MIT-SPARK/Kimera

  28. Wu, S.-C., Wald, J., Tateno, K., Navab, N., Tombari, F.: SceneGraphFusion: incremental 3D scene graph prediction from RGB-D sequences. In: CVPR, pp. 7515–7525 (2021)

    Google Scholar 

  29. Li, X., Guo, D., Liu, H., Sun, F.: Embodied semantic scene graph generation. In: CoRL, pp. 1585–1594. PMLR (2022)

    Google Scholar 

  30. Zhang, P., Ge, X., Renz, J.: Support relation analysis for objects in multiple view RGB-D images. In: El Fallah Seghrouchni, A., Sarne, D. (eds.) IJCAI 2019. LNCS (LNAI), vol. 12158, pp. 41–61. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-56150-5_3

    Chapter  Google Scholar 

  31. Zhang, C., Yu, J., Song, Y., Cai, W.: Exploiting edge-oriented reasoning for 3D point-based scene graph analysis. In: CVPR, pp. 9705–9715 (2021)

    Google Scholar 

  32. Talak, R., Hu, S., Peng, L., Carlone, L.: Neural trees for learning on graphs. In: NeurIPS, vol. 34, pp. 26395–26408 (2021)

    Google Scholar 

  33. Krishna, R., et al.: Visual genome: Connecting language and vision using crowdsourced dense image annotations. IJCV 123(1), 32–73 (2017)

    Article  MathSciNet  Google Scholar 

  34. Kuznetsova, A., et al.: The open images dataset v4. IJCV 128(7), 1956–1981 (2020)

    Article  Google Scholar 

  35. Liang, Y., Bai, Y., Zhang, W., Qian, X., Zhu, L., Mei, T.: VrR-VG: refocusing visually-relevant relationships. In: CVPR, pp. 10403–10412 (2019)

    Google Scholar 

  36. Yang, J., Ang, Y.Z., Guo, Z., Zhou, K., Zhang, W., Liu, Z.: Panoptic scene graph generation. arXiv preprint arXiv:2207.11247 (2022)

  37. Ji, J., Krishna, R., Fei-Fei, L., Niebles, J.C.: Action genome: actions as compositions of spatio-temporal scene graphs. In: CVPR, pp. 10236–10247 (2020)

    Google Scholar 

  38. Shang, X., Di, D., Xiao, J., Cao, Y., Yang, X., Chua, T.-S.: Annotating objects and relations in user-generated videos. In: ICMR, pp. 279–287 (2019)

    Google Scholar 

  39. Zhuo, T., Cheng, Z., Zhang, P., Wong, Y., Kankanhalli, M.: Explainable video action reasoning via prior knowledge and state transitions. In: ACM Multimedia, pp. 521–529 (2019)

    Google Scholar 

  40. Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: CVPR, pp. 5828–5839 (2017). http://www.scan-net.org/

  41. Giuliari, F., Skenderi, G., Cristani, M., Wang, Y., Del Bue, A.: Spatial commonsense graph for object localisation in partial scenes. In: CVPR, pp. 19518–19527 (2022). https://fgiuliari.github.io/projects/SpatialCommonsenseGraph/

  42. Tian, Y., Carballo, A., Li, R., Takeda, K.: Road scene graph: a semantic graph-based scene representation dataset for intelligent vehicles. arXiv preprint arXiv:2011.13588 (2020). https://github.com/tianyafu/road-status-graph-dataset

  43. Dreher, C.R., Wächter, M., Asfour, T.: Learning object-action relations from bimanual human demonstration using graph networks. IEEE RA-L 5(1), 187–194 (2019). https://bimanual-actions.humanoids.kit.edu/

  44. Özsoy, E., Örnek, E.P., Eck, U., Czempiel, T., Tombari, F., Navab, N.: 4D-OR: semantic scene graphs for or domain modeling. arXiv preprint arXiv:2203.11937 (2022). https://github.com/egeozsoy/4D-OR

  45. Goyal, A., Yang, K., Yang, D., Deng, J.: Rel3D: a minimally contrastive benchmark for grounding spatial relations in 3D. In: NeurIPS, vol. 33, pp. 10514–10525 (2020). https://github.com/princeton-vl/Rel3D

  46. Hong, Y., Yi, L., Tenenbaum, J., Torralba, A., Gan, C.: PTR: a benchmark for part-based conceptual, relational, and physical reasoning. In: NeurIPS, vol. 34, pp. 17427–17440 (2021). http://ptr.csail.mit.edu/

  47. Wald, J., Avetisyan, A., Navab, N., Tombari, F., Nießner, M.: RIO: 3D object instance re-localization in changing indoor environments. In: CVPR, pp. 7658–7667 (2019). https://waldjohannau.github.io/RIO

  48. Xia, F., Zamir, A.R., He, Z.-Y., Sax, A., Malik, J., Savarese, S.: Gibson env: real-world perception for embodied agents. In: CVPR (2018). http://gibsonenv.stanford.edu/

  49. Tang, K., Zhang, H., Wu, B., Luo, W., Liu, W.: Learning to compose dynamic tree structures for visual contexts. In: CVPR (2019)

    Google Scholar 

  50. Gkanatsios, N., Pitsikalis, V., Koutras, P., Maragos, P.: Attention-translation-relation network for scalable scene graph generation. In: ICCV (2019)

    Google Scholar 

  51. Li, X., Guo, D., Liu, H., Sun, F.: Embodied semantic scene graph generation. In: CoRL. Proceedings of Machine Learning Research, vol. 164, pp. 1585–1594. PMLR (2022)

    Google Scholar 

  52. Wu, F., Yan, F., Shi, W., Zhou, Z.: 3d scene graph prediction from point clouds. Virtual Reality Intell. Hardw. 4(1), 76–88 (2022)

    Article  Google Scholar 

  53. Agia, C., et al.: Taskography: evaluating robot task planning over large 3D scene graphs. In: CoRL, pp. 46–58 (2022)

    Google Scholar 

  54. Jiao, Z., Niu, Y., Zhang, Z., Zhu, S.-C., Zhu, Y., Liu, H.: Sequential manipulation planning on scene graph. In: IROS (2022)

    Google Scholar 

  55. Ravichandran, Z., Peng, L., Hughes, N., Griffith, J.D., Carlone, L.: Hierarchical representations and explicit memory: learning effective navigation policies on 3d scene graphs using graph neural networks. In: ICRA, pp. 9272–9279 (2022)

    Google Scholar 

  56. Dhamo, H., Manhardt, F., Navab, N., Tombari, F.: Graph-to-3D: end-to-end generation and manipulation of 3D scenes using scene graphs. In: CVPR, pp. 16352–16361 (2021)

    Google Scholar 

  57. Savkin, A., Ellouze, R., Navab, N., Tombari, F.: Unsupervised traffic scene generation with synthetic 3D scene graphs. In: IROS, pp. 1229–1235. IEEE (2021)

    Google Scholar 

Download references

Acknowledgement

This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2022-0-00907, Development of AI Bots Collaboration Platform and Self-organizing AI).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ue-Hwan Kim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bae, J., Shin, D., Ko, K., Lee, J., Kim, UH. (2023). A Survey on 3D Scene Graphs: Definition, Generation and Application. In: Jo, J., et al. Robot Intelligence Technology and Applications 7. RiTA 2022. Lecture Notes in Networks and Systems, vol 642. Springer, Cham. https://doi.org/10.1007/978-3-031-26889-2_13

Download citation

Publish with us

Policies and ethics