Skip to main content
Log in

Future pseudo-LiDAR frame prediction for autonomous driving

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

LiDAR sensors are widely used in autonomous driving due to the reliable 3D spatial information. However, the data of LiDAR is sparse and the frequency of LiDAR is lower than that of cameras. To generate denser point clouds spatially and temporally, we propose the first future pseudo-LiDAR frame prediction network. Given the consecutive sparse depth maps and RGB images, we first predict a future dense depth map based on dynamic motion information coarsely. To eliminate the errors of optical flow estimation, an inter-frame aggregation module is proposed to fuse the warped depth maps with adaptive weights. Then, we refine the predicted dense depth map using static contextual information. The future pseudo-LiDAR frame can be obtained by converting the predicted dense depth map into corresponding 3D point clouds. Extensive experiments are conducted in-depth completion, pseudo-LiDAR interpolation, and LiDAR prediction. Our approach achieves state-of-the-art performance on all the above tests on the popular KITTI dataset, and the primary evaluation metric RMSE reaches 1214.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Schneider, L., Jasch, M., Fröhlich, B., Weber, T., Franke, U., Pollefeys, M., Rätsch, M.: Multimodal neural networks: Rgb-d for semantic segmentation and object detection. In: Scandinavian Conference on Image Analysis, pp. 98–109 (2017)

  2. Zhao, J.-X., Cao, Y., Fan, D.-P., Cheng, M.-M., Li, X.-Y., Zhang, L.: Contrast prior and fluid pyramid integration for rgbd salient object detection. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3922–3931 (2019)

  3. Hu, X., Yang, K., Fei, L., Wang, K.: Acnet: Attention based network to exploit complementary features for rgbd semantic segmentation. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 1440–1444 (2019)

  4. Kundu, A., Yin, X., Fathi, A., Ross, D., Brewington, B., Funkhouser, T., Pantofaru, C.: Virtual multi-view fusion for 3d semantic segmentation. In: European Conference on Computer Vision, pp. 518–535 (2020)

  5. Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3d proposal generation and object detection from view aggregation. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1–8 (2018)

  6. Krispel, G., Opitz, M., Waltner, G., Possegger, H., Bischof, H.: Fuseseg: Lidar point cloud segmentation fusing multi-modal data. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1874–1883 (2020)

  7. Liu, H., Liao, K., Lin, C., Zhao, Y., Liu, M.: Plin: a network for pseudo-lidar point cloud interpolation. Sensors 20(6), 1573 (2020)

    Article  Google Scholar 

  8. Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the Kitti dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)

    Article  Google Scholar 

  9. Uhrig, J., Schneider, N., Schneider, L., Franke, U., Brox, T., Geiger, A.: Sparsity invariant cnns. In: 2017 International Conference on 3D Vision (3DV), pp. 11–20 (2017)

  10. Eldesokey, A., Felsberg, M., Holmquist, K., Persson, M.: Uncertainty-aware cnns for depth completion: Uncertainty from beginning to end. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12014–12023 (2020)

  11. Liu, J., Gong, X.: Guided depth enhancement via anisotropic diffusion. In: Pacific-Rim Conference on Multimedia, pp. 408–417 (2013)

  12. Qiu, J., Cui, Z., Zhang, Y., Zhang, X., Liu, S., Zeng, B., Pollefeys, M.: Deeplidar: Deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3313–3322 (2019)

  13. Park, J., Kim, H., Tai, Y.-W., Brown, M.S., Kweon, I.S.: High-quality depth map upsampling and completion for rgb-d cameras. IEEE Trans. Image Process. 23(12), 5559–5572 (2014)

    Article  MathSciNet  Google Scholar 

  14. Herrera, D., Kannala, J., Heikkilä, J., : Depth map inpainting under a second-order smoothness prior. In: Scandinavian Conference on Image Analysis, pp. 555–566 (2013)

  15. Ku, J., Harakeh, A., Waslander, S.L.: In defense of classical image processing: Fast depth completion on the cpu. In: 2018 15th Conference on Computer and Robot Vision (CRV), pp. 16–22 (2018)

  16. Van Gansbeke, W., Neven, D., De Brabandere, B., Van Gool, L.: Sparse and noisy lidar completion with rgb guidance and uncertainty. In: 2019 16th International Conference on Machine Vision Applications (MVA), pp. 1–6 (2019)

  17. Ma, F., Cavalheiro, G.V., Karaman, S.: Self-supervised sparse-to-dense: Self-supervised depth completion from lidar and monocular camera. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 3288–3295 (2019)

  18. Xu, Y., Zhu, X., Shi, J., Zhang, G., Bao, H., Li, H.: Depth completion from sparse lidar data with depth-normal constraints. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2811–2820 (2019)

  19. Park, J., Joo, K., Hu, Z., Liu, C.-K., So Kweon, I.: Non-local spatial propagation network for depth completion. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII 16, pp. 120–136 (2020)

  20. Tang, J., Tian, F.-P., Feng, W., Li, J., Tan, P.: Learning guided convolutional network for depth completion. IEEE Trans. Image Process. 30, 1116–1129 (2020)

    Article  Google Scholar 

  21. Meyer, S., Wang, O., Zimmer, H., Grosse, M., Sorkine-Hornung, A.: Phase-based frame interpolation for video. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 1410–1418 (2015)

  22. Jiang, H., Sun, D., Jampani, V., Yang, M.-H., Learned-Miller, E., Kautz, J.: Super slomo: high quality estimation of multiple intermediate frames for video interpolation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9000–9008 (2018)

  23. Wang, Y., Chao, W.-L., Garg, D., Hariharan, B., Campbell, M., Weinberger, K.Q.: Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8445–8453 (2019)

  24. Lu, F., Chen, G., Qu, S., Li, Z., Liu, Y., Knoll, A.: Pointinet: Point cloud frame interpolation network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 2251–2259 (2021)

  25. Lotter, W., Kreiman, G., Cox, D.: Deep predictive coding networks for video prediction and unsupervised learning. arXiv preprint arXiv:1605.08104 (2016)

  26. Tulyakov, S., Liu, M.-Y., Yang, X., Kautz, J.: Mocogan: Decomposing motion and content for video generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1526–1535 (2018)

  27. Vondrick, C., Pirsiavash, H., Torralba, A.: Generating videos with scene dynamics. Adv. Neural. Inf. Process. Syst. 29, 613–621 (2016)

    Google Scholar 

  28. Liang, X., Lee, L., Dai, W., Xing, E.P.: Dual motion gan for future-flow embedded video prediction. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1744–1752 (2017)

  29. Walker, J., Doersch, C., Gupta, A., Hebert, M.: An uncertain future: forecasting from static images using variational autoencoders. In: European Conference on Computer Vision, pp. 835–851 (2016)

  30. Finn, C., Goodfellow, I., Levine, S.: Unsupervised learning for physical interaction through video prediction. Adv. Neural. Inf. Process. Syst. 29, 64–72 (2016)

    Google Scholar 

  31. Hui, T.-W., Tang, X., Loy, C.C.: Liteflownet: A lightweight convolutional neural network for optical flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8981–8989 (2018)

  32. Luo, C., Zhan, J., Xue, X., Wang, L., Ren, R., Yang, Q.: Cosine normalization: Using cosine similarity instead of dot product in neural networks. In: International Conference on Artificial Neural Networks, pp. 382–391 (2018)

  33. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015)

  34. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  35. Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6, 679–698 (1986)

    Article  Google Scholar 

  36. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)

  37. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  38. Hu, M., Wang, S., Li, B., Ning, S., Fan, L., Gong, X.: Penet: towards precise and efficient image guided depth completion. arXiv preprint arXiv:2103.00783 (2021)

  39. Zhao, S., Gong, M., Fu, H., Tao, D.: Adaptive context-aware multi-modal network for depth completion. IEEE Trans. Image Process. 30, 5264–5276 (2021)

    Article  Google Scholar 

  40. Li, A., Yuan, Z., Ling, Y., Chi, W., Zhang, C., : A multi-scale guided cascade hourglass network for depth completion. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 32–40 (2020)

  41. Deng, D., Zakhor, A.: Temporal lidar frame prediction for autonomous driving. In: 2020 International Conference on 3D Vision (3DV), pp. 829–837 (2020)

  42. Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3d object reconstruction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 605–613 (2017)

  43. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 30, 5099–5108 (2017)

  44. Liu, H., Liao, K., Lin, C., Zhao, Y., Guo, Y.: Pseudo-lidar point cloud interpolation based on 3d motion representation and spatial supervision. IEEE Trans. Intell. Transport. Syst, 1–11 (2021)

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 62172032).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chunyu Lin.

Additional information

Communicated by B-K Bao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, X., Lin, C., Liu, H. et al. Future pseudo-LiDAR frame prediction for autonomous driving. Multimedia Systems 28, 1611–1620 (2022). https://doi.org/10.1007/s00530-022-00921-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-022-00921-x

Keywords

Navigation