Abstract
The LiDAR point motion estimation, including motion state prediction and velocity estimation, is crucial for understanding a dynamic scene in autonomous driving. Recent 2D projection-based methods run in real-time by applying the well-optimized 2D convolution networks on either the bird’s-eye view (BEV) or the range view (RV) but suffer from lower accuracy due to information loss during the 2D projection. Thus, we propose a novel sequential multi-view fusion network (SMVF), composed of a BEV branch and an RV branch, in charge of encoding the motion information and spatial information, respectively. By looking from distinct views and integrating with the original LiDAR point features, the SMVF produces a comprehensive motion prediction, while keeping its efficiency. Moreover, to generalize the motion estimation well to the objects with fewer training samples, we propose a sequential instance copy-paste (SICP) for generating realistic LiDAR sequences for these objects. The experiments on the SemanticKITTI moving object segmentation (MOS) and Waymo scene flow benchmarks demonstrate that our SMVF outperforms all existing methods by a large margin.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Behley, J., et al.: SemanticKITTI: a dataset for semantic scene understanding of lidar sequences. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9297–9307 (2019)
Berman, M., Triki, A.R., Blaschko, M.B.: The lovász-softmax loss: a tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4413–4421 (2018)
Chen, X., et al.: Moving object segmentation in 3d lidar data: a learning-based approach exploiting sequential data. IEEE Robot. Autom. Lett. 6(4), 6529–6536 (2021)
Choy, C., Gwak, J., Savarese, S.: 4D spatio-temporal convNets: Minkowski convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3075–3084 (2019)
Dosovitskiy, A., et al.: FlowNet: learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2758–2766 (2015)
Duerr, F., Pfaller, M., Weigel, H., Beyerer, J.: Lidar-based recurrent 3d semantic segmentation with temporal memory alignment. In: 2020 International Conference on 3D Vision (3DV), pp. 781–790. IEEE (2020)
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88(2), 303–338 (2010)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the KITTI vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012)
Gu, X., Wang, Y., Wu, C., Lee, Y.J., Wang, P.: HplflowNet: Hierarchical permutohedral lattice flowNet for scene flow estimation on large-scale point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3254–3263 (2019)
Jund, P., Sweeney, C., Abdo, N., Chen, Z., Shlens, J.: Scalable scene flow from point clouds in the real world. IEEE Robot. Autom. Lett. (99):1 (2021)
Laddha, A., Gautam, S., Meyer, G.P., Vallespi-Gonzalez, C., Wellington, C.K.: RV-FuseNet: range view based fusion of time-series lidar data for joint 3d object detection and motion forecasting. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 7060–7066. IEEE (2020)
Laddha, A., Gautam, S., Palombo, S., Pandey, S., Vallespi-Gonzalez, C.: MvfuseNet: Improving end-to-end object detection and motion forecasting through multi-view fusion of lidar data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2865–2874 (2021)
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12697–12705 (2019)
Liu, X., Qi, C.R., Guibas, L.J.:FlowNet3D: learning scene flow in 3D point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 529–537 (2019)
Liu, X., Yan, M., Bohg, J.: MeteorNet: deep learning on dynamic 3D point cloud sequences. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9246–9255 (2019)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Ranjan, A., Black, M.J.: Optical flow estimation using a spatial pyramid network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4161–4170 (2017)
Shi, H., Lin, G., Wang, H., Hung, T.Y., Wang, Z.: SpSequenceNet:: semantic segmentation network on 4d point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4574–4583 (2020)
Sun, D., Yang, X., Liu, M.Y., Kautz, J.: PWC-Net : CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8934–8943 (2018)
Sun, P., et al.: Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2446–2454 (2020)
Thomas, H., Qi, C.R., Deschaud, J.E., Marcotegui, B., Goulette, F., Guibas, L.J.: KPConv: flexible and deformable convolution for point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6411–6420 (2019)
Vanholder, H.: Efficient inference with TensorRT. In: GPU Technology Conference. vol. 1, p. 2 (2016)
Wang, S., Suo, S., Ma, W.C., Pokrovsky, A., Urtasun, R.: Deep parametric continuous convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2589–2597 (2018)
Wang, Z., Li, S., Howard-Jenkins, H., Prisacariu, V., Chen, M.: Flownet3d++: Geometric losses for deep scene flow estimation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 91–98 (2020)
Wu, P., Chen, S., Metaxas, D.N.: MotionNet: joint perception and motion prediction for autonomous driving based on bird’s eye view maps. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11385–11395 (2020)
Yan, Y., Mao, Y., Li, B.: Second: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, G., Li, X., Wang, Z. (2022). Sequential Multi-view Fusion Network for Fast LiDAR Point Motion Estimation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13682. Springer, Cham. https://doi.org/10.1007/978-3-031-20047-2_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-20047-2_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20046-5
Online ISBN: 978-3-031-20047-2
eBook Packages: Computer ScienceComputer Science (R0)