FH-Net: A Fast Hierarchical Network for Scene Flow Estimation on Real-World Point Clouds

Ding, Lihe; Dong, Shaocong; Xu, Tingfa; Xu, Xinli; Wang, Jie; Li, Jianan

doi:10.1007/978-3-031-19842-7_13

Lihe Ding¹²,
Shaocong Dong¹²,
Tingfa Xu¹²,
Xinli Xu¹²,
Jie Wang¹² &
…
Jianan Li¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13699))

Included in the following conference series:

European Conference on Computer Vision

3751 Accesses

Abstract

Estimating scene flow from real-world point clouds is a fundamental task for practical 3D vision. Previous methods often rely on deep models to first extract expensive per-point features at full resolution, and then get the flow either from complex matching mechanism or feature decoding, suffering high computational cost and latency. In this work, we propose a fast hierarchical network, FH-Net, which directly gets the key points flow through a lightweight Trans-flow layer utilizing the reliable local geometry prior, and optionally back-propagates the computed sparse flows through an inverse Trans-up layer to obtain hierarchical flows at different resolutions. To focus more on challenging dynamic objects, we also provide a new copy-and-paste data augmentation technique based on dynamic object pairs generation. Moreover, to alleviate the chronic shortage of real-world training data, we establish two new large-scale datasets to this field by collecting lidar-scanned point clouds from public autonomous driving datasets and annotating the collected data through novel pseudo-labeling. Extensive experiments on both public and proposed datasets show that our method outperforms prior state-of-the-arts while running at least $\boldsymbol{7 \times }$ faster at $\boldsymbol{113}$ FPS. Code and data are released at https://github.com/pigtigger/FH-Net.

L. Ding and S. Dong—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

RMS-FlowNet++: Efficient and Robust Multi-scale Scene Flow Estimation for Large-Scale Point Clouds

Article Open access 23 May 2024

PointPWC-Net: Cost Volume on Point Clouds for (Self-)Supervised Scene Flow Estimation

What Matters for 3D Scene Flow Network

References

Behl, A., Paschalidou, D., Donné, S., Geiger, A.: Pointflownet: learning representations for rigid motion estimation from point clouds. In: CVPR (2019)
Google Scholar
Behley, J., Garbade, M., Milioto, A., Quenzel, J., Behnke, S., Stachniss, C., Gall, J.: Semantickitti: a dataset for semantic scene understanding of lidar sequences. In: ICCV (2019)
Google Scholar
Chen, S., Li, Y., Kwok, N.M.: Active vision in robotic systems: a survey of recent developments. In: IJRR (2011)
Google Scholar
Choy, C., Gwak, J., Savarese, S.: 4D spatio-temporal convnets: minkowski convolutional neural networks. In: CVPR (2019)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Fan, H., Yang, Y.: PointRNN: point recurrent neural network for moving point cloud processing. arXiv preprint arXiv:1910.08287 (2019)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: CVPR (2012)
Google Scholar
Ghiasi, G., et al.: Simple copy-paste is a strong data augmentation method for instance segmentation. In: CVPR (2021)
Google Scholar
Gojcic, Z., Litany, O., Wieser, A., Guibas, L.J., Birdal, T.: Weakly supervised learning of rigid 3D scene flow. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5692–5703 (2021)
Google Scholar
Gu, X., Wang, Y., Wu, C., Lee, Y.J., Wang, P.: Hplflownet: hierarchical permutohedral lattice flownet for scene flow estimation on large-scale point clouds. In: CVPR (2019)
Google Scholar
Guo, M.H., Cai, J.X., Liu, Z.N., Mu, T.J., Martin, R.R., Hu, S.M.: PCT: point cloud transformer. Comput. Vis. Media 7(2), 187–199 (2021)
Article Google Scholar
Hu, H., Zhang, Z., Xie, Z., Lin, S.: Local relation networks for image recognition. In: ICCV (2019)
Google Scholar
Huguet, F., Devernay, F.: A variational method for scene flow estimation from stereo sequences. In: ICCV (2007)
Google Scholar
Jaimez, M., Souiai, M., Gonzalez-Jimenez, J., Cremers, D.: A primal-dual framework for real-time dense RGB-D scene flow. In: ICRA (2015)
Google Scholar
Jund, P., Sweeney, C., Abdo, N., Chen, Z., Shlens, J.: Scalable scene flow from point clouds in the real world. IEEE Robot. Autom. Lett. (2021)
Google Scholar
Kittenplon, Y., Eldar, Y.C., Raviv, D.: Flowstep3D: model unrolling for self-supervised scene flow estimation. In: CVPR (2021)
Google Scholar
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: Pointpillars: fast encoders for object detection from point clouds. In: CVPR (2019)
Google Scholar
Liu, X., Qi, C.R., Guibas, L.J.: Flownet3D: learning scene flow in 3D point clouds. In: CVPR (2019)
Google Scholar
Liu, X., Yan, M., Bohg, J.: Meteornet: deep learning on dynamic 3D point cloud sequences. In: ICCV (2019)
Google Scholar
Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: CVPR (2016)
Google Scholar
Menze, M., Heipke, C., Geiger, A.: Joint 3D estimation of vehicles and scene flow. ISPRS (2015)
Google Scholar
Menze, M., Heipke, C., Geiger, A.: Object scene flow. ISPRS (2018)
Google Scholar
Mittal, H., Okorn, B., Held, D.: Just go with the flow: self-supervised scene flow estimation. In: CVPR (2020)
Google Scholar
Mustafa, A., Hilton, A.: Semantically coherent 4D scene flow of dynamic scenes. IJCV (2020)
Google Scholar
Newcombe, R.A., Fox, D., Seitz, S.M.: Dynamicfusion: reconstruction and tracking of non-rigid scenes in real-time. In: CVPR (2015)
Google Scholar
Niemeyer, M., Mescheder, L., Oechsle, M., Geiger, A.: Occupancy flow: 4D reconstruction by learning particle dynamics. In: ICCV (2019)
Google Scholar
Puy, G., Boulch, A., Marlet, R.: Flot: scene flow on point clouds guided by optimal transport. In: ECCV (2020)
Google Scholar
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. arXiv preprint arXiv:1706.02413 (2017)
Ramachandran, P., Parmar, N., Vaswani, A., Bello, I., Levskaya, A., Shlens, J.: Stand-alone self-attention in vision models. arXiv preprint arXiv:1906.05909 (2019)
Rempe, D., Birdal, T., Zhao, Y., Gojcic, Z., Sridhar, S., Guibas, L.J.: Caspr: learning canonical spatiotemporal point cloud representations. arXiv preprint arXiv:2008.02792 (2020)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Shao, L., Shah, P., Dwaracherla, V., Bohg, J.: Motion-based object segmentation based on dense RGB-D scene flow. IEEE Robot. Autom. Lett. 3(4), 3797–3804 (2018)
Article Google Scholar
Sun, D., Yang, X., Liu, M.Y., Kautz, J.: Pwc-net: CNNs for optical flow using pyramid, warping, and cost volume. In: CVPR (2018)
Google Scholar
Sun, P., et al.: Scalability in perception for autonomous driving: waymo open dataset. In: CVPR (2020)
Google Scholar
Sun, P., et al.: RSN: range sparse net for efficient, accurate lidar 3D object detection. In: CVPR (2021)
Google Scholar
Tanzmeister, G., Thomas, J., Wollherr, D., Buss, M.: Grid-based mapping and tracking in dynamic environments using a uniform evidential environment representation. In: ICRA (2014)
Google Scholar
Ushani, A.K., Wolcott, R.W., Walls, J.M., Eustice, R.M.: A learning approach for real-time temporal scene flow estimation from lidar data. In: ICRA (2017)
Google Scholar
Vedula, S., Baker, S., Rander, P., Collins, R., Kanade, T.: Three-dimensional scene flow. In: ICCV (1999)
Google Scholar
Wang, H., Pang, J., Lodhi, M.A., Tian, Y., Tian, D.: Festa: flow estimation via spatial-temporal attention for scene point clouds. In: CVPR (2021)
Google Scholar
Wang, S., Suo, S., Ma, W.C., Pokrovsky, A., Urtasun, R.: Deep parametric continuous convolutional neural networks. In: CVPR (2018)
Google Scholar
Wang, Z., Li, S., Howard-Jenkins, H., Prisacariu, V., Chen, M.: Flownet3D++: geometric losses for deep scene flow estimation. In: WACV (2020)
Google Scholar
Wedel, A., Rabe, C., Vaudrey, T., Brox, T., Franke, U., Cremers, D.: Efficient dense scene flow from sparse or dense stereo data. In: ECCV (2008)
Google Scholar
Wu, W., Wang, Z., Li, Z., Liu, W., Fuxin, L.: Pointpwc-net: a coarse-to-fine network for supervised and self-supervised scene flow estimation on 3D point clouds. arXiv preprint arXiv:1911.12408 (2019)
Yan, Y., Mao, Y., Li, B.: Second: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)
Article Google Scholar
Yin, T., Zhou, X., Krahenbuhl, P.: Center-based 3D object detection and tracking. In: CVPR (2021)
Google Scholar
Yurtsever, E., Lambert, J., Carballo, A., Takeda, K.: A survey of autonomous driving: common practices and emerging technologies. IEEE Access 8, 58443–58469 (2020)
Article Google Scholar
Zhao, H., Jia, J., Koltun, V.: Exploring self-attention for image recognition. In: CVPR (2020)
Google Scholar
Zhao, H., Jiang, L., Jia, J., Torr, P.H., Koltun, V.: Point transformer. In: ICCV (2021)
Google Scholar

Download references

Acknowledgements

This work was financially supported by the National Natural Science Foundation of China (No. 62101032), the Postdoctoral Science Foundation of China (No. 2021M690015), and Beijing Institute of Technology Research Fund Program for Young Scholars (No. 3040011182111).

Author information

Authors and Affiliations

Beijing Institute of Technology, Beijing, China
Lihe Ding, Shaocong Dong, Tingfa Xu, Xinli Xu, Jie Wang & Jianan Li

Authors

Lihe Ding
View author publications
You can also search for this author in PubMed Google Scholar
Shaocong Dong
View author publications
You can also search for this author in PubMed Google Scholar
Tingfa Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xinli Xu
View author publications
You can also search for this author in PubMed Google Scholar
Jie Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jianan Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Tingfa Xu or Jianan Li .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ding, L., Dong, S., Xu, T., Xu, X., Wang, J., Li, J. (2022). FH-Net: A Fast Hierarchical Network for Scene Flow Estimation on Real-World Point Clouds. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13699. Springer, Cham. https://doi.org/10.1007/978-3-031-19842-7_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-19842-7_13
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19841-0
Online ISBN: 978-3-031-19842-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics