GFENet: Group-Free Enhancement Network for Indoor Scene 3D Object Detection

Zhou, Feng; Dai, Ju; Pan, Junjun; Zhu, Mengxiao; Cai, Xingquan; Huang, Bin; Wang, Chen

doi:10.1007/978-3-031-50075-6_10

Feng Zhou¹²,
Ju Dai¹³,
Junjun Pan^13,14,
Mengxiao Zhu¹²,
Xingquan Cai¹²,
Bin Huang¹⁵ &
…
Chen Wang¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14497))

Included in the following conference series:

Computer Graphics International Conference

178 Accesses

Abstract

The state-of-the-art group-free network (GFNet) has achieved superior performance for indoor scene 3D object detection. However, we find there is still room for improvement in the following three aspects. Firstly, seed point features extracted by multi-layer perception (MLP) in the backbone (PointNet++) neglect to consider the different importance of each level feature. Second, the single-scale transformer module in GFNet to handle hand-crafted grouping via Hough Voting cannot adequately model the relationship between points and objects. Finally, GFNet directly utilizes the decoders to predict detection results disregarding the different contributions of decoders at each stage. In this paper, we propose the group-free enhancement network (GFENet) to tackle the above issues. Specifically, our network mainly consists of three lifting modules: the weighted MLP (WMLP) module, the hierarchical-aware module, and the stage-aware module. The WMLP module adaptively combines features of different levels in the backbone before max-pooling for informative feature learning. The hierarchical-aware module formulates a hierarchical way to mitigate the negative impact of insufficient modeling of points and objects. The stage-aware module aggregates multi-stage predictions adaptively for better detection performance. Extensive experiments on ScanNet V2 and SUN RGB-D datasets demonstrate the effectiveness and advantages of our method against existing 3D object detection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wu, Q., Yu, Y., Luo, T., Lu, P.: GridPointNet: grid and point-based 3D object detection from point cloud. In: Sun, F., Hu, D., Wermter, S., Yang, L., Liu, H., Fang, B. (eds.) ICCSIP 2021. CCIS, vol. 1515, pp. 191–199. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-9247-5_14
Chapter Google Scholar
Lian, Q., Xu, Y., Yao, W., Chen, Y., Zhang, T.: Semi-supervised monocular 3D object detection by multi-view consistency. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13668, pp. 715–731. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20074-8_41
Chapter Google Scholar
Qin, Y., Chi, X., Sheng, B., Lau, R.W.: GuideRender: large-scale scene navigation based on multi-modal view frustum movement prediction. Vis. Comput. 39, 3597–3607 (2023). https://doi.org/10.1007/s00371-023-02922-x
Article Google Scholar
Li, J., et al.: Automatic detection and classification system of domestic waste via multimodel cascaded convolutional neural network. IEEE Trans. Ind. Inf. 18(1), 163–173 (2021)
Article Google Scholar
Lin, D., Fidler, S., Urtasun, R.: Holistic scene understanding for 3D object detection with RGBD cameras. In: IEEE International Conference on Computer Vision, pp. 1417–1424 (2013)
Google Scholar
Song, S., Xiao, J.: Sliding shapes for 3D object detection in depth images. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 634–651. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_41
Chapter Google Scholar
Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3D proposal generation and object detection from view aggregation. In: IEEE International Conference on Intelligent Robots and Systems, pp. 1–8 (2018)
Google Scholar
Liang, M., Yang, B., Wang, S., Urtasun, R.: Deep continuous fusion for multi-sensor 3D object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 663–678. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_39
Chapter Google Scholar
Hou, J., Dai, A., Nießner, M.: 3D-SIS: 3D semantic instance segmentation of RGB-D scans. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4421–4430 (2019)
Google Scholar
Shi, S., Wang, Z., Shi, J., Wang, X., Li, H.: From points to parts: 3D object detection from point cloud with part-aware and part-aggregation network. IEEE Trans. Pattern Anal. Mach. Intell. 43(8), 2647–2664 (2020)
Google Scholar
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 12697–12705 (2019)
Google Scholar
Vu, T., Kim, K., Luu, T.M., Nguyen, X.T., Yoo, C.D.: Softgroup for 3D instance segmentation on 3D point clouds. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2708–2717 (2022)
Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)
Google Scholar
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Zhang, Z., Sun, B., Yang, H., Huang, Q.: H3DNet: 3D object detection using hybrid geometric primitives. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 311–329. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_19
Chapter Google Scholar
Xie, Q., et al.: MLCVNet: multi-level context VoteNet for 3D object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 10447–10456 (2020)
Google Scholar
Xie, Q., et al.: VENet: voting enhancement network for 3D object detection. In: IEEE International Conference on Computer Vision, pp. 3712–3721 (2021)
Google Scholar
Rukhovich, D., Vorontsova, A., Konushin, A.: FCAF3D: fully convolutional anchor-free 3D object detection. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13670, pp. 477–493. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20080-9_28
Chapter Google Scholar
Cheng, B., Sheng, L., Shi, S., Yang, M., Xu, D.: Back-tracing representative points for voting-based 3D object detection in point clouds. In: IEEE International Conference on Computer Vision, pp. 8963–8972 (2021)
Google Scholar
Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep hough voting for 3D object detection in point clouds. In: IEEE International Conference on Computer Vision, pp. 9277–9286 (2019)
Google Scholar
Liu, Z., Zhang, Z., Cao, Y., Hu, H., Tong, X.: Group-free 3D object detection via transformers. In: IEEE International Conference on Computer Vision, pp. 2949–2958 (2021)
Google Scholar
Chen, H., et al.: Learning to match features with seeded graph matching network. In: IEEE International Conference on Computer Vision, pp. 6301–6310 (2021)
Google Scholar
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)
Google Scholar
Song, S., Lichtenberg, S.P., Xiao, J.: SUN RGB-D: a RGB-D scene understanding benchmark suite. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 567–576 (2015)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28, pp. 91–99 (2015)
Google Scholar
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. In: International Conference on Learning Representations (2021)
Google Scholar
Song, G., Liu, Y., Wang, X.: Revisiting the sibling head in object detector. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 11563–11572 (2020)
Google Scholar
Li, Y., et al.: Should all proposals be treated equally in object detection? In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13685, pp. 556–572. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19806-9_32
Chapter Google Scholar
Wang, S.Y., Qu, Z., Li, C.J., Gao, L.Y.: BANet: small and multi-object detection with a bidirectional attention network for traffic scenes. Eng. Appl. Artif. Intell. 117, 105504 (2023)
Article Google Scholar
Guo, J., Feng, H., Xu, H., Yu, W., Shuzhi Ge, S.: D3-Net: integrated multi-task convolutional neural network for water surface deblurring, dehazing and object detection. Eng. Appl. Artif. Intell. 117, 105558 (2023)
Article Google Scholar
Song, S., Xiao, J.: Deep sliding shapes for amodal 3D object detection in RGB-D images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 808–816 (2016)
Google Scholar
Chen, K., Zhou, F., Dai, J., Shen, P., Cai, X., Zhang, F.: MCGNet: multi-level context-aware and geometric-aware network for 3D object detection. In: IEEE International Conference on Image Processing, pp. 1846–1850 (2022)
Google Scholar
Huang, Z., Yu, Y., Xu, J., Ni, F., Le, X.: PF-Net: point fractal network for 3D point cloud completion. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7662–7670 (2020)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Conference and Workshop on Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Zhao, B., Gong, M., Li, X.: Hierarchical multimodal transformer to summarize videos. Neurocomputing 468, 360–369 (2022)
Article Google Scholar
Yuan, L., et al.: Tokens-to-Token ViT: training vision transformers from scratch on ImageNet. In: IEEE International Conference on Computer Vision, pp. 558–567 (2021)
Google Scholar
Liu, X., Wang, L., Han, X.: Transformer with peak suppression and knowledge guidance for fine-grained image recognition. Neurocomputing 492, 137–149 (2022)
Article Google Scholar
Park, C., Jeong, Y., Cho, M., Park, J.: Fast point transformer. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 16949–16958 (2022)
Google Scholar
Chen, Y., Yang, Z., Zheng, X., Chang, Y., Li, X.: PointFormer: a dual perception attention-based network for point cloud classification. In: Proceedings of the Asian Conference on Computer Vision, pp. 3291–3307 (2022)
Google Scholar
Wu, X., Lao, Y., Jiang, L., Liu, X., Zhao, H.: Point Transformer V2: grouped vector attention and partition-based pooling. In: Advances in Neural Information Processing Systems (2022)
Google Scholar
Lai, X., et al.: Stratified transformer for 3D point cloud segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8500–8509 (2022)
Google Scholar
Misra, I., Girdhar, R., Joulin, A.: An end-to-end transformer model for 3D object detection. In: IEEE International Conference on Computer Vision, pp. 2906–2917 (2021)
Google Scholar
Pan, X., Xia, Z., Song, S., Li, L.E., Huang, G.: 3D object detection with pointformer. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7463–7472 (2021)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Google Scholar
Li, Y., Ma, L., Tan, W., Sun, C., Cao, D., Li, J.: GRNet: geometric relation network for 3D object detection from point clouds. ISPRS J. Photogramm. Remote. Sens. 165, 43–53 (2020)
Article Google Scholar
Griffiths, D., Boehm, J., Ritschel, T.: Finding your (3D) center: 3D object detection using a learned loss. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12363, pp. 70–85. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58523-5_5
Chapter Google Scholar
Du, H., Li, L., Liu, B., Vasconcelos, N.: SPOT: selective point cloud voting for better proposal in point cloud object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 230–247. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_14
Chapter Google Scholar
Gwak, J.Y., Choy, C., Savarese, S.: Generative sparse detection networks for 3D single-shot object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 297–313. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_18
Chapter Google Scholar
Chen, J., Lei, B., Song, Q., Ying, H., Chen, D.Z., Wu, J.: A hierarchical graph network for 3D object detection on point clouds. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 392–401 (2020)
Google Scholar
Zhao, N., Chua, T.S., Lee, G.H.: SESS: self-ensembling semi-supervised 3D object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 11079–11087 (2020)
Google Scholar
Najibi, M., et al.: DOPS: learning to detect 3D objects and predict their 3D shapes. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 11913–11922 (2020)
Google Scholar
Zheng, Y., Duan, Y., Lu, J., Zhou, J., Tian, Q.: HyperDet3D: learning a scene-conditioned 3D object detector. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5585–5594 (2022)
Google Scholar
Wang, H., et al.: RBGNet: ray-based grouping for 3D object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1110–1119 (2022)
Google Scholar
Feng, M., Gilani, S.Z., Wang, Y., Zhang, L., Mian, A.: Relation graph network for 3D object detection in point clouds. IEEE Trans. Image Process. 30, 92–107 (2021)
Article Google Scholar

Download references

Acknowledgements

This work was supported by Beijing Natural Science Foundation (4232023), R\( { \& }\)D Program of Beijing Municipal Education Commission (KM202310009002), and National Natural Science Foundation of China (62102208). The authors also thank the editor and all the reviewers for their very helpful comments to improve this paper.

Author information

Authors and Affiliations

North China University of Technology, Beijing, China
Feng Zhou, Mengxiao Zhu & Xingquan Cai
Peng Cheng Laboratory, Shenzhen, China
Ju Dai & Junjun Pan
State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
Junjun Pan
AI Research Center, Hangzhou Innovation Institute, Beihang University, Hangzhou, China
Bin Huang
Beijing Technology and Business University, Beijing, China
Chen Wang

Authors

Feng Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Ju Dai
View author publications
You can also search for this author in PubMed Google Scholar
Junjun Pan
View author publications
You can also search for this author in PubMed Google Scholar
Mengxiao Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Xingquan Cai
View author publications
You can also search for this author in PubMed Google Scholar
Bin Huang
View author publications
You can also search for this author in PubMed Google Scholar
Chen Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ju Dai .

Editor information

Editors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Bin Sheng
Shanghai Jiao Tong University, Shanghai, China
Lei Bi
University of Sydney, Sydney, NSW, Australia
Jinman Kim
MIRALab-CUI, University of Geneva, Carouge, Geneve, Switzerland
Nadia Magnenat-Thalmann
Swiss Federal Institute of Technology, Lausanne, Switzerland
Daniel Thalmann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, F. et al. (2024). GFENet: Group-Free Enhancement Network for Indoor Scene 3D Object Detection. In: Sheng, B., Bi, L., Kim, J., Magnenat-Thalmann, N., Thalmann, D. (eds) Advances in Computer Graphics. CGI 2023. Lecture Notes in Computer Science, vol 14497. Springer, Cham. https://doi.org/10.1007/978-3-031-50075-6_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-50075-6_10
Published: 22 January 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-50074-9
Online ISBN: 978-3-031-50075-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

GFENet: Group-Free Enhancement Network for Indoor Scene 3D Object Detection