Skip to main content

Meta-Det3D: Learn to Learn Few-Shot 3D Object Detection

  • Conference paper
  • First Online:
Computer Vision – ACCV 2022 (ACCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13841))

Included in the following conference series:

  • 459 Accesses

Abstract

This paper addresses the problem of few-shot indoor 3D object detection by proposing a meta-learning-based framework that only relies on a few labeled samples from novel classes for training. Our model has two major components: a 3D meta-detector and a 3D object detector. Given a query 3D point cloud and a few support samples, the 3D meta-detector is trained over different 3D detection tasks to learn task distributions for different object classes and dynamically adapt the 3D object detector to complete a specific detection task. The 3D object detector takes task-specific information as input and produces 3D object detection results for the query point cloud. Specifically, the 3D object detector first extracts object candidates and their features from the query point cloud using a point feature learning network. Then, a class-specific re-weighting module generates class-specific re-weighting vectors from the support samples to characterize the task information, one for each distinct object class. Each re-weighting vector performs channel-wise attention to the candidate features to re-calibrate the query object features, adapting them to detect objects of the same classes. Finally, the adapted features are fed into a detection head to predict classification scores and bounding boxes for novel objects in the query point cloud. Several experiments on two 3D object detection benchmark datasets demonstrate that our proposed method acquired the ability to detect 3D objects in the few-shot setting.

S. Yuan and X. Li—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Afif, M., Ayachi, R., Said, Y., Pissaloux, E., Atri, M.: An evaluation of retinanet on indoor object detection for blind and visually impaired persons assistance navigation. Neural Process. Lett. 51, 2265–2279 (2020)

    Article  Google Scholar 

  2. Yang, S., Scherer, S.: Cubeslam: monocular 3-D object slam. IEEE Trans. Rob. 35, 925–938 (2019)

    Article  Google Scholar 

  3. Qi, C.R., Chen, X., Litany, O., Guibas, L.J.: Imvotenet: boosting 3D object detection in point clouds with image votes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4404–4413 (2020)

    Google Scholar 

  4. Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep hough voting for 3D object detection in point clouds. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9277–9286 (2019)

    Google Scholar 

  5. Hospedales, T.M., Antoniou, A., Micaelli, P., Storkey, A.J.: Meta-learning in neural networks: a survey. Trans. Pattern Anal. Mach. Intell. (2021)

    Google Scholar 

  6. Huisman, M., van Rijn, J.N., Plaat, A.: A survey of deep meta-learning. Artif. Intell. Rev. 54(6), 4483–4541 (2021). https://doi.org/10.1007/s10462-021-10004-4

    Article  Google Scholar 

  7. Nie, J., Xu, N., Zhou, M., Yan, G., Wei, Z.: 3D model classification based on few-shot learning. Neurocomputing 398, 539–546 (2020)

    Article  Google Scholar 

  8. Zhang, B., Wonka, P.: Training data generating networks: linking 3D shapes and few-shot classification. arXiv preprint arXiv:2010.08276 (2020)

  9. Huang, H., Li, X., Wang, L., Fang, Y.: 3D-metaconnet: meta-learning for 3D shape classification and segmentation. In: International Conference on 3D Vision, pp. 982–991. IEEE (2021)

    Google Scholar 

  10. Yuan, S., Fang, Y.: Ross: Robust learning of one-shot 3D shape segmentation. In: The IEEE Winter Conference on Applications of Computer Vision, pp. 1961–1969 (2020)

    Google Scholar 

  11. Wang, L., Li, X., Fang, Y.: Few-shot learning of part-specific probability space for 3D shape segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4504–4513 (2020)

    Google Scholar 

  12. Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: Scannet: richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5828–5839 (2017)

    Google Scholar 

  13. Song, S., Lichtenberg, S.P., Xiao, J.: Sun RGB-D: a RGB-D scene understanding benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 567–576 (2015)

    Google Scholar 

  14. Wang, D.Z., Posner, I.: Voting for voting in online point cloud object detection. In: Robotics: Science and Systems, vol. 1, pp. 10–15, Rome, Italy (2015)

    Google Scholar 

  15. Yang, B., et al.: Learning object bounding boxes for 3D instance segmentation on point clouds. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  16. Koch, G., Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. In: ICML Deep Learning Workshop, vol. 2, Lille (2015)

    Google Scholar 

  17. Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al.: Matching networks for one shot learning. In: Advances in Neural Information Processing Systems, pp. 3630–3638 (2016)

    Google Scholar 

  18. Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: Advances in Neural Information Processing Systems, pp. 4077–4087 (2017)

    Google Scholar 

  19. Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., Hospedales, T.M.: Learning to compare: relation network for few-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1199–1208 (2018)

    Google Scholar 

  20. Bertinetto, L., Henriques, J.F., Valmadre, J., Torr, P., Vedaldi, A.: Learning feed-forward one-shot learners. In: Advances in Neural Information Processing Systems, pp. 523–531 (2016)

    Google Scholar 

  21. Graves, A., Wayne, G., Danihelka, I.: Neural turing machines. arXiv preprint arXiv:1410.5401 (2014)

  22. Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. arXiv preprint arXiv:1703.03400 (2017)

  23. Li, Z., Zhou, F., Chen, F., Li, H.: Meta-SGD: learning to learn quickly for few-shot learning. arXiv preprint arXiv:1707.09835 (2017)

  24. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)

    Google Scholar 

  25. Leibe, B., Leonardis, A., Schiele, B.: Combined object categorization and segmentation with an implicit shape model. In: Workshop on Statistical Learning in Computer Vision, ECCV, vol. 2, p. 7 (2004)

    Google Scholar 

  26. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)

    Google Scholar 

  27. Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., Lillicrap, T.: Meta-learning with memory-augmented neural networks. In: International Conference on Machine Learning, pp. 1842–1850. PMLR (2016)

    Google Scholar 

  28. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  29. Xie, Q., et al.: Mlcvnet: multi-level context votenet for 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10447–10456 (2020)

    Google Scholar 

  30. Paszke, A., et al.: Automatic differentiation in PyTorch (2017)

    Google Scholar 

  31. Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)

    Google Scholar 

Download references

Acknowledgements

The authors appreciate the generous support provided by Inception Institute of Artificial Intelligence (IIAI) in the form of NYUAD Global Ph.D. Student Fellowship. This work was also partially supported by the NYUAD Center for Artificial Intelligence and Robotics (CAIR), funded by Tamkeen under the NYUAD Research Institute Award CG010.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Fang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 89 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yuan, S., Li, X., Huang, H., Fang, Y. (2023). Meta-Det3D: Learn to Learn Few-Shot 3D Object Detection. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13841. Springer, Cham. https://doi.org/10.1007/978-3-031-26319-4_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26319-4_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26318-7

  • Online ISBN: 978-3-031-26319-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics