Skip to main content

GFENet: Group-Free Enhancement Network for Indoor Scene 3D Object Detection

  • Conference paper
  • First Online:
Advances in Computer Graphics (CGI 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14497))

Included in the following conference series:

  • 178 Accesses

Abstract

The state-of-the-art group-free network (GFNet) has achieved superior performance for indoor scene 3D object detection. However, we find there is still room for improvement in the following three aspects. Firstly, seed point features extracted by multi-layer perception (MLP) in the backbone (PointNet++) neglect to consider the different importance of each level feature. Second, the single-scale transformer module in GFNet to handle hand-crafted grouping via Hough Voting cannot adequately model the relationship between points and objects. Finally, GFNet directly utilizes the decoders to predict detection results disregarding the different contributions of decoders at each stage. In this paper, we propose the group-free enhancement network (GFENet) to tackle the above issues. Specifically, our network mainly consists of three lifting modules: the weighted MLP (WMLP) module, the hierarchical-aware module, and the stage-aware module. The WMLP module adaptively combines features of different levels in the backbone before max-pooling for informative feature learning. The hierarchical-aware module formulates a hierarchical way to mitigate the negative impact of insufficient modeling of points and objects. The stage-aware module aggregates multi-stage predictions adaptively for better detection performance. Extensive experiments on ScanNet V2 and SUN RGB-D datasets demonstrate the effectiveness and advantages of our method against existing 3D object detection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wu, Q., Yu, Y., Luo, T., Lu, P.: GridPointNet: grid and point-based 3D object detection from point cloud. In: Sun, F., Hu, D., Wermter, S., Yang, L., Liu, H., Fang, B. (eds.) ICCSIP 2021. CCIS, vol. 1515, pp. 191–199. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-9247-5_14

    Chapter  Google Scholar 

  2. Lian, Q., Xu, Y., Yao, W., Chen, Y., Zhang, T.: Semi-supervised monocular 3D object detection by multi-view consistency. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13668, pp. 715–731. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20074-8_41

    Chapter  Google Scholar 

  3. Qin, Y., Chi, X., Sheng, B., Lau, R.W.: GuideRender: large-scale scene navigation based on multi-modal view frustum movement prediction. Vis. Comput. 39, 3597–3607 (2023). https://doi.org/10.1007/s00371-023-02922-x

    Article  Google Scholar 

  4. Li, J., et al.: Automatic detection and classification system of domestic waste via multimodel cascaded convolutional neural network. IEEE Trans. Ind. Inf. 18(1), 163–173 (2021)

    Article  Google Scholar 

  5. Lin, D., Fidler, S., Urtasun, R.: Holistic scene understanding for 3D object detection with RGBD cameras. In: IEEE International Conference on Computer Vision, pp. 1417–1424 (2013)

    Google Scholar 

  6. Song, S., Xiao, J.: Sliding shapes for 3D object detection in depth images. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 634–651. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_41

    Chapter  Google Scholar 

  7. Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3D proposal generation and object detection from view aggregation. In: IEEE International Conference on Intelligent Robots and Systems, pp. 1–8 (2018)

    Google Scholar 

  8. Liang, M., Yang, B., Wang, S., Urtasun, R.: Deep continuous fusion for multi-sensor 3D object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 663–678. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_39

    Chapter  Google Scholar 

  9. Hou, J., Dai, A., Nießner, M.: 3D-SIS: 3D semantic instance segmentation of RGB-D scans. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4421–4430 (2019)

    Google Scholar 

  10. Shi, S., Wang, Z., Shi, J., Wang, X., Li, H.: From points to parts: 3D object detection from point cloud with part-aware and part-aggregation network. IEEE Trans. Pattern Anal. Mach. Intell. 43(8), 2647–2664 (2020)

    Google Scholar 

  11. Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 12697–12705 (2019)

    Google Scholar 

  12. Vu, T., Kim, K., Luu, T.M., Nguyen, X.T., Yoo, C.D.: Softgroup for 3D instance segmentation on 3D point clouds. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2708–2717 (2022)

    Google Scholar 

  13. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)

    Google Scholar 

  14. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  15. Zhang, Z., Sun, B., Yang, H., Huang, Q.: H3DNet: 3D object detection using hybrid geometric primitives. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 311–329. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_19

    Chapter  Google Scholar 

  16. Xie, Q., et al.: MLCVNet: multi-level context VoteNet for 3D object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 10447–10456 (2020)

    Google Scholar 

  17. Xie, Q., et al.: VENet: voting enhancement network for 3D object detection. In: IEEE International Conference on Computer Vision, pp. 3712–3721 (2021)

    Google Scholar 

  18. Rukhovich, D., Vorontsova, A., Konushin, A.: FCAF3D: fully convolutional anchor-free 3D object detection. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13670, pp. 477–493. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20080-9_28

    Chapter  Google Scholar 

  19. Cheng, B., Sheng, L., Shi, S., Yang, M., Xu, D.: Back-tracing representative points for voting-based 3D object detection in point clouds. In: IEEE International Conference on Computer Vision, pp. 8963–8972 (2021)

    Google Scholar 

  20. Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep hough voting for 3D object detection in point clouds. In: IEEE International Conference on Computer Vision, pp. 9277–9286 (2019)

    Google Scholar 

  21. Liu, Z., Zhang, Z., Cao, Y., Hu, H., Tong, X.: Group-free 3D object detection via transformers. In: IEEE International Conference on Computer Vision, pp. 2949–2958 (2021)

    Google Scholar 

  22. Chen, H., et al.: Learning to match features with seeded graph matching network. In: IEEE International Conference on Computer Vision, pp. 6301–6310 (2021)

    Google Scholar 

  23. Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  24. Song, S., Lichtenberg, S.P., Xiao, J.: SUN RGB-D: a RGB-D scene understanding benchmark suite. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 567–576 (2015)

    Google Scholar 

  25. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28, pp. 91–99 (2015)

    Google Scholar 

  26. Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. In: International Conference on Learning Representations (2021)

    Google Scholar 

  27. Song, G., Liu, Y., Wang, X.: Revisiting the sibling head in object detector. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 11563–11572 (2020)

    Google Scholar 

  28. Li, Y., et al.: Should all proposals be treated equally in object detection? In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13685, pp. 556–572. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19806-9_32

    Chapter  Google Scholar 

  29. Wang, S.Y., Qu, Z., Li, C.J., Gao, L.Y.: BANet: small and multi-object detection with a bidirectional attention network for traffic scenes. Eng. Appl. Artif. Intell. 117, 105504 (2023)

    Article  Google Scholar 

  30. Guo, J., Feng, H., Xu, H., Yu, W., Shuzhi Ge, S.: D3-Net: integrated multi-task convolutional neural network for water surface deblurring, dehazing and object detection. Eng. Appl. Artif. Intell. 117, 105558 (2023)

    Article  Google Scholar 

  31. Song, S., Xiao, J.: Deep sliding shapes for amodal 3D object detection in RGB-D images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 808–816 (2016)

    Google Scholar 

  32. Chen, K., Zhou, F., Dai, J., Shen, P., Cai, X., Zhang, F.: MCGNet: multi-level context-aware and geometric-aware network for 3D object detection. In: IEEE International Conference on Image Processing, pp. 1846–1850 (2022)

    Google Scholar 

  33. Huang, Z., Yu, Y., Xu, J., Ni, F., Le, X.: PF-Net: point fractal network for 3D point cloud completion. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7662–7670 (2020)

    Google Scholar 

  34. Vaswani, A., et al.: Attention is all you need. In: Conference and Workshop on Neural Information Processing Systems, pp. 5998–6008 (2017)

    Google Scholar 

  35. Zhao, B., Gong, M., Li, X.: Hierarchical multimodal transformer to summarize videos. Neurocomputing 468, 360–369 (2022)

    Article  Google Scholar 

  36. Yuan, L., et al.: Tokens-to-Token ViT: training vision transformers from scratch on ImageNet. In: IEEE International Conference on Computer Vision, pp. 558–567 (2021)

    Google Scholar 

  37. Liu, X., Wang, L., Han, X.: Transformer with peak suppression and knowledge guidance for fine-grained image recognition. Neurocomputing 492, 137–149 (2022)

    Article  Google Scholar 

  38. Park, C., Jeong, Y., Cho, M., Park, J.: Fast point transformer. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 16949–16958 (2022)

    Google Scholar 

  39. Chen, Y., Yang, Z., Zheng, X., Chang, Y., Li, X.: PointFormer: a dual perception attention-based network for point cloud classification. In: Proceedings of the Asian Conference on Computer Vision, pp. 3291–3307 (2022)

    Google Scholar 

  40. Wu, X., Lao, Y., Jiang, L., Liu, X., Zhao, H.: Point Transformer V2: grouped vector attention and partition-based pooling. In: Advances in Neural Information Processing Systems (2022)

    Google Scholar 

  41. Lai, X., et al.: Stratified transformer for 3D point cloud segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8500–8509 (2022)

    Google Scholar 

  42. Misra, I., Girdhar, R., Joulin, A.: An end-to-end transformer model for 3D object detection. In: IEEE International Conference on Computer Vision, pp. 2906–2917 (2021)

    Google Scholar 

  43. Pan, X., Xia, Z., Song, S., Li, L.E., Huang, G.: 3D object detection with pointformer. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7463–7472 (2021)

    Google Scholar 

  44. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

    Google Scholar 

  45. Li, Y., Ma, L., Tan, W., Sun, C., Cao, D., Li, J.: GRNet: geometric relation network for 3D object detection from point clouds. ISPRS J. Photogramm. Remote. Sens. 165, 43–53 (2020)

    Article  Google Scholar 

  46. Griffiths, D., Boehm, J., Ritschel, T.: Finding your (3D) center: 3D object detection using a learned loss. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12363, pp. 70–85. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58523-5_5

    Chapter  Google Scholar 

  47. Du, H., Li, L., Liu, B., Vasconcelos, N.: SPOT: selective point cloud voting for better proposal in point cloud object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 230–247. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_14

    Chapter  Google Scholar 

  48. Gwak, J.Y., Choy, C., Savarese, S.: Generative sparse detection networks for 3D single-shot object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 297–313. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_18

    Chapter  Google Scholar 

  49. Chen, J., Lei, B., Song, Q., Ying, H., Chen, D.Z., Wu, J.: A hierarchical graph network for 3D object detection on point clouds. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 392–401 (2020)

    Google Scholar 

  50. Zhao, N., Chua, T.S., Lee, G.H.: SESS: self-ensembling semi-supervised 3D object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 11079–11087 (2020)

    Google Scholar 

  51. Najibi, M., et al.: DOPS: learning to detect 3D objects and predict their 3D shapes. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 11913–11922 (2020)

    Google Scholar 

  52. Zheng, Y., Duan, Y., Lu, J., Zhou, J., Tian, Q.: HyperDet3D: learning a scene-conditioned 3D object detector. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5585–5594 (2022)

    Google Scholar 

  53. Wang, H., et al.: RBGNet: ray-based grouping for 3D object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1110–1119 (2022)

    Google Scholar 

  54. Feng, M., Gilani, S.Z., Wang, Y., Zhang, L., Mian, A.: Relation graph network for 3D object detection in point clouds. IEEE Trans. Image Process. 30, 92–107 (2021)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by Beijing Natural Science Foundation (4232023), R\( { \& }\)D Program of Beijing Municipal Education Commission (KM202310009002), and National Natural Science Foundation of China (62102208). The authors also thank the editor and all the reviewers for their very helpful comments to improve this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ju Dai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, F. et al. (2024). GFENet: Group-Free Enhancement Network for Indoor Scene 3D Object Detection. In: Sheng, B., Bi, L., Kim, J., Magnenat-Thalmann, N., Thalmann, D. (eds) Advances in Computer Graphics. CGI 2023. Lecture Notes in Computer Science, vol 14497. Springer, Cham. https://doi.org/10.1007/978-3-031-50075-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-50075-6_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-50074-9

  • Online ISBN: 978-3-031-50075-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics