Skip to main content

Heterogeneous Graph Learning for Scene Graph Prediction in 3D Point Clouds

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Abstract

3D Scene Graph Prediction (SGP) aims to recognize the objects and predict their semantic and spatial relationships in a 3D scene. Existing methods either exploit context information or emphasize knowledge prior to model the scene graph in a fully-connected homogeneous graph framework. However, these methods may lead to indiscriminate message passing among graph nodes (i.e., objects), resulting in sub-optimal performance. In this paper, we propose a 3D Heterogeneous Scene Graph Prediction (3D-HetSGP) framework, which performs graph reasoning on the 3D scene graph in a heterogeneous fashion. Specifically, our method consists of two stages: a heterogeneous graph structure learning (HGSL) stage and a heterogeneous graph reasoning (HGR) stage. In the HGSL stage, we learn the graph structure by predicting the types of different directed edges. In the HGR stage, message passing among nodes is performed on the learned graph structure for scene graph prediction. Extensive experiments show that our method achieves comparable or superior performance to existing methods on 3DSSG dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chen, T., Yu, W., Chen, R., Lin, L.: Knowledge-embedded routing network for scene graph generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6163–6171 (2019)

    Google Scholar 

  2. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Proceedings of the Advances in Neural Information Processing Systems (NeurIPS). vol. 27 (2014)

    Google Scholar 

  3. Feng, M., Hou, H., Zhang, L., Wu, Z., Guo, Y., Mian, A.: 3D spatial multimodal knowledge accumulation for scene graph prediction in point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9182–9191 (2023)

    Google Scholar 

  4. Hu, Q., et al.: RandLA-Net: efficient semantic segmentation of large-scale point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11108–11117 (2020)

    Google Scholar 

  5. Hu, Z., Dong, Y., Wang, K., Sun, Y.: Heterogeneous graph transformer. In: Proceedings of the Web Conference 2020, pp. 2704–2710 (2020)

    Google Scholar 

  6. Johnson, J., et al.: Image retrieval using scene graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3668–3678 (2015)

    Google Scholar 

  7. Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. (IJCV) 123, 32–73 (2017)

    Article  MathSciNet  Google Scholar 

  8. Li, R., Zhang, S., He, X.: SGTR: end-to-end scene graph generation with transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 19486–19496 (2022)

    Google Scholar 

  9. Liu, H., Guo, Y., Ma, Y., Lei, Y., Wen, G.: Semantic context encoding for accurate 3D point cloud segmentation. IEEE Trans. Multimedia (TMM) 23, 2045–2055 (2021)

    Article  Google Scholar 

  10. Liu, H., Ma, Y., Hu, Q., Guo, Y.: CenterTube: tracking multiple 3D objects with 4D tubelets in dynamic point clouds. IEEE Trans. Multimedia (TMM) 25, 8793–8804 (2023)

    Article  Google Scholar 

  11. Liu, H., Ma, Y., Wang, H., Guo, Y.: AnchorPoint: query design for transformer-based 3D object detection and tracking. IEEE Trans. Intell. Transp. Syst. (TITS) 24(10), 10988–11000 (2023)

    Article  Google Scholar 

  12. Lv, C., Qi, M., Li, X., Yang, Z., Ma, H.: SGFormer: semantic graph transformer for point cloud-based 3d scene graph generation. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38, pp. 4035–4043 (2024)

    Google Scholar 

  13. Ma, Y., Guo, Y., Liu, H., Lei, Y., Wen, G.: Global context reasoning for semantic segmentation of 3D point clouds. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 2931–2940 (2020)

    Google Scholar 

  14. Monninger, T., et al.: SCENE: reasoning about traffic scenes using heterogeneous graph neural networks. IEEE Robot. Autom. Lett. (RAL) 8(3), 1531–1538 (2023)

    Article  Google Scholar 

  15. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 652–660 (2017)

    Google Scholar 

  16. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), pp. 5099–5108 (2017)

    Google Scholar 

  17. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 8748–8763. PMLR (2021)

    Google Scholar 

  18. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), vol. 28 (2015)

    Google Scholar 

  19. Rosinol, A., Gupta, A., Abate, M., Shi, J., Carlone, L.: 3D dynamic scene graphs: Actionable spatial perception with places, objects, and humans. arXiv preprint arXiv:2002.06289 (2020)

  20. Sahand, S., Sina, M.B., Volker, T.: Classification by attention: scene graph classification with prior knowledge. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). vol. 35, pp. 5025–5033 (2021)

    Google Scholar 

  21. Shi, C., Hu, B., Zhao, W.X., Philip, S.Y.: Heterogeneous information network embedding for recommendation. IEEE Trans. Knowl. Data Eng. (TKDE) 31(2), 357–370 (2018)

    Article  Google Scholar 

  22. Shit, S., et al.: Relationformer: a unified framework for image-to-graph generation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision - ECCV 2022. ECCV 2022. LNCS, vol. 13697, pp. 422–439. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19836-6_24

  23. Sun, Y., Han, J.: Mining heterogeneous information networks: a structural analysis approach. ACM SIGKDD Explor. Newsl. 14(2), 20–28 (2013)

    Article  Google Scholar 

  24. Tahara, T., Seno, T., Narita, G., Ishikawa, T.: Retargetable AR: context-aware augmented reality in indoor scenes based on 3D scene graph. In: 2020 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), pp. 249–255 (2020)

    Google Scholar 

  25. Tang, K., Niu, Y., Huang, J., Shi, J., Zhang, H.: Unbiased scene graph generation from biased training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3716–3725 (2020)

    Google Scholar 

  26. Tang, K., Zhang, H., Wu, B., Luo, W., Liu, W.: Learning to compose dynamic tree structures for visual contexts. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6619–6628 (2019)

    Google Scholar 

  27. Wald, J., Avetisyan, A., Navab, N., Tombari, F., Nießner, M.: RIO: 3D object instance relocalization in changing indoor environments. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7658–7667 (2019)

    Google Scholar 

  28. Wald, J., Dhamo, H., Navab, N., Tombari, F.: Learning 3D semantic scene graphs from 3D indoor reconstructions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3961–3970 (2020)

    Google Scholar 

  29. Wang, X., et al.: Heterogeneous graph attention network. In: Proceedings of the World Wide Web conference, pp. 2022–2032 (2019)

    Google Scholar 

  30. Wang, Z., Cheng, B., Zhao, L., Xu, D., Tang, Y., Sheng, L.: VL-SAT: visual-linguistic semantics assisted training for 3D semantic scene graph prediction in point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 21560–21569 (2023)

    Google Scholar 

  31. Wu, S.C., Wald, J., Tateno, K., Navab, N., Tombari, F.: SceneGraphFusion: incremental 3D scene graph prediction from RGB-D sequences. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7515–7525 (2021)

    Google Scholar 

  32. Xu, D., Zhu, Y., Choy, C.B., Fei-Fei, L.: Scene graph generation by iterative message passing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5410–5419 (2017)

    Google Scholar 

  33. Xu, D., Zhu, Y., Choy, C.B., Fei-Fei, L.: Scene graph generation by iterative message passing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5410–5419 (2017)

    Google Scholar 

  34. Yang, J., Lu, J., Lee, S., Batra, D., Parikh, D.: Graph R-CNN for scene graph generation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 670–685 (2018)

    Google Scholar 

  35. Yin, T., Zhou, X., Krahenbuhl, P.: Center-based 3D object detection and tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11784–11793 (2021)

    Google Scholar 

  36. Yoon, K., Kim, K., Moon, J., Park, C.: Unbiased heterogeneous scene graph generation with relation-aware message passing neural network. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). vol. 37, pp. 3285–3294 (2023)

    Google Scholar 

  37. Zareian, A., Karaman, S., Chang, S.-F.: Bridging knowledge graphs to generate scene graphs. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12368, pp. 606–623. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58592-1_36

    Chapter  Google Scholar 

  38. Zellers, R., Yatskar, M., Thomson, S., Choi, Y.: Neural motifs: scene graph parsing with global context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5831–5840 (2018)

    Google Scholar 

  39. Zhang, C., Yu, J., Song, Y., Cai, W.: Exploiting edge-oriented reasoning for 3D point-based scene graph analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9705–9715 (2021)

    Google Scholar 

  40. Zhang, C., Song, D., Huang, C., Swami, A., Chawla, N.V.: Heterogeneous graph neural network. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 793–803 (2019)

    Google Scholar 

  41. Zhang, S., Hao, A., Qin, H., et al.: Knowledge-inspired 3D scene graph prediction in point cloud. Proc. Adv. Neural Inf. Process. Syst. (NeruIPS) 34, 18620–18632 (2021)

    Google Scholar 

  42. Zhang, Y., Hu, Q., Xu, G., Ma, Y., Wan, J., Guo, Y.: Not all points are equal: learning highly efficient point-based detectors for 3D lidar point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

    Google Scholar 

  43. Zhao, J., Wang, X., Shi, C., Hu, B., Song, G., Ye, Y.: Heterogeneous graph structure learning for graph neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). vol. 35, pp. 4697–4705 (2021)

    Google Scholar 

Download references

Acknowledgement

This work was partially supported by the National Natural Science Foundation of China (No. U20A20185, 62372491), the Guangdong Basic and Applied Basic Research Foundation (2022B1515020103, 2023B1515120087), the Shenzhen Science and Technology Program (No. RCYX20200714114641140).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yulan Guo .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 757 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ma, Y., Liu, H., Pei, Y., Guo, Y. (2025). Heterogeneous Graph Learning for Scene Graph Prediction in 3D Point Clouds. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15084. Springer, Cham. https://doi.org/10.1007/978-3-031-73347-5_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-73347-5_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-73346-8

  • Online ISBN: 978-3-031-73347-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics