TNT-Net: Point Cloud Completion by Transformer in Transformer

Zhang, Xiaohai; Zhang, Jinming; Li, Jianliang; Chen, Ming

doi:10.1007/978-3-031-53308-2_25

Xiaohai Zhang¹⁴,
Jinming Zhang¹⁴,
Jianliang Li¹⁴ &
…
Ming Chen¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14555))

Included in the following conference series:

International Conference on Multimedia Modeling

329 Accesses

Abstract

Estimating the overall structure of a point cloud from a partial 3D point cloud input is a crucial task in computer vision. However, existing point cloud completion methods often overlook object detail information and the local correlation within the incomplete point cloud. To address this challenge, we propose an enhanced point cloud completion approach called TNT-Net, which leverages a transformer in transformer architecture for accurate and refined point cloud completion. TNT-Net incorporates a local feature extraction module to capture long-range correlations within the input point cloud. Moreover, we introduce stacked feature extractors to simplify subsequent calculations and gather more comprehensive feature information on the spatial distribution of the point cloud. Additionally, we present an efficient method that integrates the kNN-transformer into the existing point transformer to address the deficiency of local detail information in previous works. This method enables TNT-Net to capture fine-grained object details and local correlations more effectively. Extensive experiments conducted on synthetic datasets PCN, ShapeNet55/34, and real-world dataset KITTI demonstrate the superior quantitative and qualitative performance of TNT-Net compared to state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Charles, R.Q., Su, H., Kaichun, M., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 77–85 (2017). https://doi.org/10.1109/CVPR.2017.16
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, vol. 30, pp. 5099–5108 (2017)
Google Scholar
Groueix, T., Fisher, M., Kim, V.G., Russell, B.C., Aubry, M.: A papier-Mache approach to learning 3D surface generation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 216–224 (2018). https://doi.org/10.1109/CVPR.2018.00030
Yang, Y., Feng, C., Shen, Y., Tian, D.: FoldingNet: point cloud auto-encoder via deep grid deformation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 206–215 (2018). https://doi.org/10.1109/CVPR.2018.00029
Yuan, W., Khot, T., Held, D., Mertz, C., Hebert, M.: PCN: point completion network. In: 2018 International Conference on 3D Vision (3DV), pp. 728–737 (2018). https://doi.org/10.1109/3DV.2018.00088
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008 (2017)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. In: Proceedings of the International Conference on Learning Representations (2021)
Google Scholar
Yu, X., Rao, Y., Wang, Z., Liu, Z., Lu, J., Zhou, J.: PoinTr: diverse point cloud completion with geometry-aware transformers. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 12478–12487 (2021). https://doi.org/10.1109/ICCV48922.2021.01227
Xiang, P., et al.: SnowflakeNet: point cloud completion by snowflake point deconvolution with skip-transformer. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5479–5489 (2021). https://doi.org/10.1109/ICCV48922.2021.00545
Han, K., Xiao, A., Wu, E., Guo, J., Xu, C., Wang, Y.: Transformer in transformer. In: Advances in Neural Information Processing Systems, vol. 34, pp. 15908–15919 (2021)
Google Scholar
Choy, C.B., Xu, D., Gwak, J.Y., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 628–644. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_38
Chapter Google Scholar
Girdhar, R., Fouhey, D.F., Rodriguez, M., Gupta, A.: Learning a predictable and generative vector representation for objects. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 484–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_29
Chapter Google Scholar
Wang, P., Liu, Y., Guo, Y., Sun, C., Tong, X.: O-CNN: octree-based convolutional neural networks for 3D shape analysis. ACM Trans. Graph. 36(4), 72:1–72:11 (2017)
Google Scholar
Maturana, D., Scherer, S.: VoxNet: A 3D convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 922–928 (2015). https://doi.org/10.1109/IROS.2015.7353481
Dai, A., Qi, C.R., Nießner, M.: Shape completion using 3D-encoder-predictor CNNs and shape synthesis. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6545–6554 (2017). https://doi.org/10.1109/CVPR.2017.693
Han, X., Li, Z., Huang, H., Kalogerakis, E., Yu, Y.: High-resolution shape completion using deep neural networks for global structure and local geometry inference. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 85–93 (2017). https://doi.org/10.1109/ICCV.2017.19
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. 38(5), 146:1–146:12 (2019)
Google Scholar
Tchapmi, L.P., Kosaraju, V., Rezatofighi, H., Reid, I., Savarese, S.: TopNet: structural point cloud decoder. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 383–392 (2019). https://doi.org/10.1109/CVPR.2019.00047
Huang, Z., Yu, Y., Xu, J., Ni, F., Le, X.: PF-Net: point fractal network for 3D point cloud completion. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7659–7667 (2020). https://doi.org/10.1109/CVPR42600.2020.00768
Wen, X., et al.: PMP-Net: point cloud completion by learning multi-step point moving paths. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7439–7448 (2021). https://doi.org/10.1109/CVPR46437.2021.00736
Zhou, H., et al.: SeedFormer: patch seeds based point cloud completion with upsample transformer. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13663, pp. 416–432. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20062-5_24
Chapter Google Scholar
Zhao, H., Jiang, L., Jia, J., Torr, P.H.S., Koltun, V.: Point transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16239–16248 (2021)
Google Scholar
Fan, H., Su, H., Guibas, L.: A point set generation network for 3D object reconstruction from a single image. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2463–2471 (2017). https://doi.org/10.1109/CVPR.2017.264
Xie, H., Yao, H., Zhou, S., Mao, J., Zhang, S., Sun, W.: GRNet: gridding residual network for dense point cloud completion. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 365–381. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_21
Chapter Google Scholar
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robotics Res. 32(11), 1231–1237 (2013)
Article Google Scholar

Download references

Acknowledgement

This work was sponsored by Natural Science Foundation of Xinjiang Uygur Autonomous Region under Grant 2022D01C690.

Author information

Authors and Affiliations

School of Computer Science and Technology, Xinjiang University, Urumqi, China
Xiaohai Zhang, Jinming Zhang, Jianliang Li & Ming Chen

Authors

Xiaohai Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jinming Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jianliang Li
View author publications
You can also search for this author in PubMed Google Scholar
Ming Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinming Zhang .

Editor information

Editors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Stevan Rudinac
Delft University of Technology, Delft, The Netherlands
Alan Hanjalic
Delft University of Technology, Delft, The Netherlands
Cynthia Liem
University of Amsterdam, Amsterdam, The Netherlands
Marcel Worring
Reykjavik University, Reykjavik, Iceland
Björn Þór Jónsson
Microsoft Research Lab – Asia, Beijing, China
Bei Liu
The University of Tokyo, Tokyo, Japan
Yoko Yamakata

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, X., Zhang, J., Li, J., Chen, M. (2024). TNT-Net: Point Cloud Completion by Transformer in Transformer. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14555. Springer, Cham. https://doi.org/10.1007/978-3-031-53308-2_25

Download citation

DOI: https://doi.org/10.1007/978-3-031-53308-2_25
Published: 28 January 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53307-5
Online ISBN: 978-3-031-53308-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

TNT-Net: Point Cloud Completion by Transformer in Transformer