Meta-Det3D: Learn to Learn Few-Shot 3D Object Detection

Yuan, Shuaihang; Li, Xiang; Huang, Hao; Fang, Yi

doi:10.1007/978-3-031-26319-4_15

Shuaihang Yuan^6,7,8,9,
Xiang Li^6,7,8,9,
Hao Huang^6,7,8,9 &
…
Yi Fang^6,7,8,9

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13841))

Included in the following conference series:

Asian Conference on Computer Vision

459 Accesses

Abstract

This paper addresses the problem of few-shot indoor 3D object detection by proposing a meta-learning-based framework that only relies on a few labeled samples from novel classes for training. Our model has two major components: a 3D meta-detector and a 3D object detector. Given a query 3D point cloud and a few support samples, the 3D meta-detector is trained over different 3D detection tasks to learn task distributions for different object classes and dynamically adapt the 3D object detector to complete a specific detection task. The 3D object detector takes task-specific information as input and produces 3D object detection results for the query point cloud. Specifically, the 3D object detector first extracts object candidates and their features from the query point cloud using a point feature learning network. Then, a class-specific re-weighting module generates class-specific re-weighting vectors from the support samples to characterize the task information, one for each distinct object class. Each re-weighting vector performs channel-wise attention to the candidate features to re-calibrate the query object features, adapting them to detect objects of the same classes. Finally, the adapted features are fed into a detection head to predict classification scores and bounding boxes for novel objects in the query point cloud. Several experiments on two 3D object detection benchmark datasets demonstrate that our proposed method acquired the ability to detect 3D objects in the few-shot setting.

S. Yuan and X. Li—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Afif, M., Ayachi, R., Said, Y., Pissaloux, E., Atri, M.: An evaluation of retinanet on indoor object detection for blind and visually impaired persons assistance navigation. Neural Process. Lett. 51, 2265–2279 (2020)
Article Google Scholar
Yang, S., Scherer, S.: Cubeslam: monocular 3-D object slam. IEEE Trans. Rob. 35, 925–938 (2019)
Article Google Scholar
Qi, C.R., Chen, X., Litany, O., Guibas, L.J.: Imvotenet: boosting 3D object detection in point clouds with image votes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4404–4413 (2020)
Google Scholar
Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep hough voting for 3D object detection in point clouds. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9277–9286 (2019)
Google Scholar
Hospedales, T.M., Antoniou, A., Micaelli, P., Storkey, A.J.: Meta-learning in neural networks: a survey. Trans. Pattern Anal. Mach. Intell. (2021)
Google Scholar
Huisman, M., van Rijn, J.N., Plaat, A.: A survey of deep meta-learning. Artif. Intell. Rev. 54(6), 4483–4541 (2021). https://doi.org/10.1007/s10462-021-10004-4
Article Google Scholar
Nie, J., Xu, N., Zhou, M., Yan, G., Wei, Z.: 3D model classification based on few-shot learning. Neurocomputing 398, 539–546 (2020)
Article Google Scholar
Zhang, B., Wonka, P.: Training data generating networks: linking 3D shapes and few-shot classification. arXiv preprint arXiv:2010.08276 (2020)
Huang, H., Li, X., Wang, L., Fang, Y.: 3D-metaconnet: meta-learning for 3D shape classification and segmentation. In: International Conference on 3D Vision, pp. 982–991. IEEE (2021)
Google Scholar
Yuan, S., Fang, Y.: Ross: Robust learning of one-shot 3D shape segmentation. In: The IEEE Winter Conference on Applications of Computer Vision, pp. 1961–1969 (2020)
Google Scholar
Wang, L., Li, X., Fang, Y.: Few-shot learning of part-specific probability space for 3D shape segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4504–4513 (2020)
Google Scholar
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: Scannet: richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5828–5839 (2017)
Google Scholar
Song, S., Lichtenberg, S.P., Xiao, J.: Sun RGB-D: a RGB-D scene understanding benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 567–576 (2015)
Google Scholar
Wang, D.Z., Posner, I.: Voting for voting in online point cloud object detection. In: Robotics: Science and Systems, vol. 1, pp. 10–15, Rome, Italy (2015)
Google Scholar
Yang, B., et al.: Learning object bounding boxes for 3D instance segmentation on point clouds. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Koch, G., Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. In: ICML Deep Learning Workshop, vol. 2, Lille (2015)
Google Scholar
Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al.: Matching networks for one shot learning. In: Advances in Neural Information Processing Systems, pp. 3630–3638 (2016)
Google Scholar
Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: Advances in Neural Information Processing Systems, pp. 4077–4087 (2017)
Google Scholar
Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., Hospedales, T.M.: Learning to compare: relation network for few-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1199–1208 (2018)
Google Scholar
Bertinetto, L., Henriques, J.F., Valmadre, J., Torr, P., Vedaldi, A.: Learning feed-forward one-shot learners. In: Advances in Neural Information Processing Systems, pp. 523–531 (2016)
Google Scholar
Graves, A., Wayne, G., Danihelka, I.: Neural turing machines. arXiv preprint arXiv:1410.5401 (2014)
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. arXiv preprint arXiv:1703.03400 (2017)
Li, Z., Zhou, F., Chen, F., Li, H.: Meta-SGD: learning to learn quickly for few-shot learning. arXiv preprint arXiv:1707.09835 (2017)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)
Google Scholar
Leibe, B., Leonardis, A., Schiele, B.: Combined object categorization and segmentation with an implicit shape model. In: Workshop on Statistical Learning in Computer Vision, ECCV, vol. 2, p. 7 (2004)
Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)
Google Scholar
Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., Lillicrap, T.: Meta-learning with memory-augmented neural networks. In: International Conference on Machine Learning, pp. 1842–1850. PMLR (2016)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Xie, Q., et al.: Mlcvnet: multi-level context votenet for 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10447–10456 (2020)
Google Scholar
Paszke, A., et al.: Automatic differentiation in PyTorch (2017)
Google Scholar
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
Google Scholar

Download references

Acknowledgements

The authors appreciate the generous support provided by Inception Institute of Artificial Intelligence (IIAI) in the form of NYUAD Global Ph.D. Student Fellowship. This work was also partially supported by the NYUAD Center for Artificial Intelligence and Robotics (CAIR), funded by Tamkeen under the NYUAD Research Institute Award CG010.

Author information

Authors and Affiliations

NYU Multimedia and Visual Computing Lab, New York, USA
Shuaihang Yuan, Xiang Li, Hao Huang & Yi Fang
NYUAD Center for Artificial Intelligence and Robotics (CAIR), Abu Dhabi, United Arab Emirates
Shuaihang Yuan, Xiang Li, Hao Huang & Yi Fang
NYU Tandon School of Engineering, New York University, New York, USA
Shuaihang Yuan, Xiang Li, Hao Huang & Yi Fang
New York University Abu Dhabi, Abu Dhabi, United Arab Emirates
Shuaihang Yuan, Xiang Li, Hao Huang & Yi Fang

Authors

Shuaihang Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Li
View author publications
You can also search for this author in PubMed Google Scholar
Hao Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yi Fang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Fang .

Editor information

Editors and Affiliations

University of Wollongong, Wollongong, NSW, Australia
Lei Wang
University of Bonn, Bonn, Germany
Juergen Gall
University of Adelaide, Adelaide, SA, Australia
Tat-Jun Chin
National Institute of Informatics, Tokyo, Japan
Imari Sato
Johns Hopkins University, Baltimore, MD, USA
Rama Chellappa

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 89 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yuan, S., Li, X., Huang, H., Fang, Y. (2023). Meta-Det3D: Learn to Learn Few-Shot 3D Object Detection. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13841. Springer, Cham. https://doi.org/10.1007/978-3-031-26319-4_15

Download citation

DOI: https://doi.org/10.1007/978-3-031-26319-4_15
Published: 04 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26318-7
Online ISBN: 978-3-031-26319-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Meta-Det3D: Learn to Learn Few-Shot 3D Object Detection