Sequential Multi-view Fusion Network for Fast LiDAR Point Motion Estimation

Zhang, Gang; Li, Xiaoyan; Wang, Zhenhua

doi:10.1007/978-3-031-20047-2_17

Gang Zhang¹³,
Xiaoyan Li¹² &
Zhenhua Wang¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13682))

Included in the following conference series:

European Conference on Computer Vision

2851 Accesses

Abstract

The LiDAR point motion estimation, including motion state prediction and velocity estimation, is crucial for understanding a dynamic scene in autonomous driving. Recent 2D projection-based methods run in real-time by applying the well-optimized 2D convolution networks on either the bird’s-eye view (BEV) or the range view (RV) but suffer from lower accuracy due to information loss during the 2D projection. Thus, we propose a novel sequential multi-view fusion network (SMVF), composed of a BEV branch and an RV branch, in charge of encoding the motion information and spatial information, respectively. By looking from distinct views and integrating with the original LiDAR point features, the SMVF produces a comprehensive motion prediction, while keeping its efficiency. Moreover, to generalize the motion estimation well to the objects with fewer training samples, we propose a sequential instance copy-paste (SICP) for generating realistic LiDAR sequences for these objects. The experiments on the SemanticKITTI moving object segmentation (MOS) and Waymo scene flow benchmarks demonstrate that our SMVF outperforms all existing methods by a large margin.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Behley, J., et al.: SemanticKITTI: a dataset for semantic scene understanding of lidar sequences. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9297–9307 (2019)
Google Scholar
Berman, M., Triki, A.R., Blaschko, M.B.: The lovász-softmax loss: a tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4413–4421 (2018)
Google Scholar
Chen, X., et al.: Moving object segmentation in 3d lidar data: a learning-based approach exploiting sequential data. IEEE Robot. Autom. Lett. 6(4), 6529–6536 (2021)
Article Google Scholar
Choy, C., Gwak, J., Savarese, S.: 4D spatio-temporal convNets: Minkowski convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3075–3084 (2019)
Google Scholar
Dosovitskiy, A., et al.: FlowNet: learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2758–2766 (2015)
Google Scholar
Duerr, F., Pfaller, M., Weigel, H., Beyerer, J.: Lidar-based recurrent 3d semantic segmentation with temporal memory alignment. In: 2020 International Conference on 3D Vision (3DV), pp. 781–790. IEEE (2020)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88(2), 303–338 (2010)
Article Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the KITTI vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012)
Google Scholar
Gu, X., Wang, Y., Wu, C., Lee, Y.J., Wang, P.: HplflowNet: Hierarchical permutohedral lattice flowNet for scene flow estimation on large-scale point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3254–3263 (2019)
Google Scholar
Jund, P., Sweeney, C., Abdo, N., Chen, Z., Shlens, J.: Scalable scene flow from point clouds in the real world. IEEE Robot. Autom. Lett. (99):1 (2021)
Google Scholar
Laddha, A., Gautam, S., Meyer, G.P., Vallespi-Gonzalez, C., Wellington, C.K.: RV-FuseNet: range view based fusion of time-series lidar data for joint 3d object detection and motion forecasting. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 7060–7066. IEEE (2020)
Google Scholar
Laddha, A., Gautam, S., Palombo, S., Pandey, S., Vallespi-Gonzalez, C.: MvfuseNet: Improving end-to-end object detection and motion forecasting through multi-view fusion of lidar data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2865–2874 (2021)
Google Scholar
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12697–12705 (2019)
Google Scholar
Liu, X., Qi, C.R., Guibas, L.J.:FlowNet3D: learning scene flow in 3D point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 529–537 (2019)
Google Scholar
Liu, X., Yan, M., Bohg, J.: MeteorNet: deep learning on dynamic 3D point cloud sequences. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9246–9255 (2019)
Google Scholar
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Ranjan, A., Black, M.J.: Optical flow estimation using a spatial pyramid network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4161–4170 (2017)
Google Scholar
Shi, H., Lin, G., Wang, H., Hung, T.Y., Wang, Z.: SpSequenceNet:: semantic segmentation network on 4d point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4574–4583 (2020)
Google Scholar
Sun, D., Yang, X., Liu, M.Y., Kautz, J.: PWC-Net : CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8934–8943 (2018)
Google Scholar
Sun, P., et al.: Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2446–2454 (2020)
Google Scholar
Thomas, H., Qi, C.R., Deschaud, J.E., Marcotegui, B., Goulette, F., Guibas, L.J.: KPConv: flexible and deformable convolution for point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6411–6420 (2019)
Google Scholar
Vanholder, H.: Efficient inference with TensorRT. In: GPU Technology Conference. vol. 1, p. 2 (2016)
Google Scholar
Wang, S., Suo, S., Ma, W.C., Pokrovsky, A., Urtasun, R.: Deep parametric continuous convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2589–2597 (2018)
Google Scholar
Wang, Z., Li, S., Howard-Jenkins, H., Prisacariu, V., Chen, M.: Flownet3d++: Geometric losses for deep scene flow estimation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 91–98 (2020)
Google Scholar
Wu, P., Chen, S., Metaxas, D.N.: MotionNet: joint perception and motion prediction for autonomous driving based on bird’s eye view maps. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11385–11395 (2020)
Google Scholar
Yan, Y., Mao, Y., Li, B.: Second: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Beijing Municipal Key Lab of Multimedia and Intelligent Software Technology, Beijing Artificial Intelligence Institute, Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
Xiaoyan Li
Damo Academy, Alibaba Group, Hangzhou, China
Gang Zhang
Cenozoic Robot, Lanzhou, China
Zhenhua Wang

Authors

Gang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyan Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhenhua Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoyan Li .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4309 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, G., Li, X., Wang, Z. (2022). Sequential Multi-view Fusion Network for Fast LiDAR Point Motion Estimation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13682. Springer, Cham. https://doi.org/10.1007/978-3-031-20047-2_17

Download citation

DOI: https://doi.org/10.1007/978-3-031-20047-2_17
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20046-5
Online ISBN: 978-3-031-20047-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Sequential Multi-view Fusion Network for Fast LiDAR Point Motion Estimation