Abstract
Predicting futures of surrounding agents is critical for autonomous systems such as self-driving cars. Instead of requiring accurate detection and tracking prior to trajectory prediction, an object agnostic Sequential Pointcloud Forecasting (SPF) task was proposed [28], which enables a forecast-then-detect pipeline effective for downstream detection and trajectory prediction. One limitation of prior work is that it forecasts only a deterministic sequence of future point clouds, despite the inherent uncertainty of dynamic scenes. In this work, we tackle the stochastic SPF problem by proposing a generative model with two main components: (1) a conditional variational recurrent neural network that models a temporally-dependent latent space; (2) a pyramid-LSTM that increases the fidelity of predictions with temporally-aligned skip connections. Through experiments on real-world autonomous driving datasets, our stochastic SPF model produces higher-fidelity predictions, reducing Chamfer distances by up to 56.6% compared to its deterministic counterpart. In addition, our model can estimate the uncertainty of predicted points, which can be helpful to downstream tasks.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Our project website is at https://www.xinshuoweng.com/projects/S2Net.
- 2.
An illustrative comparison between these works can be found in [6].
- 3.
There might be still a small amount of information loss due to discretization unless one uses very high-resolution range map as we do in the experiments.
- 4.
Unlike RGB images, not every pixel in the range map is valid. This is because there are pixels without corresponding points in 3D due to no return of the LiDAR beam in that direction, e.g., when the LiDAR beam shots to the sky.
- 5.
Note that the original numbers reported in [28] are different due to various changes including improved network architectures, balanced data split, improved implementation of the metrics, etc.
- 6.
Even for KITTI, [21] has only evaluated on the odometry dataset with only 22 sequences, i.e., a subset of the raw dataset, so we have re-trained their model on the full KITTI raw dataset using the official code and KITTI configuration files.
- 7.
We did not aggregate point clouds along the temporal dimension.
References
KITTI Raw Dataset. http://www.cvlibs.net/datasets/kitti/raw_data.php
nuScenes Prediction Data Split. https://github.com/nutonomy/nuscenes-devkit/blob/master/python-sdk/nuscenes/eval/prediction/splits.py
Bayer, J., Osendorfer, C.: Learning Stochastic Recurrent Networks. arXiv:1411.7610 (2014)
Bei, X., Yang, Y., Soatto, S.: Learning Semantic-Aware Dynamics for Video Prediction. CVPR (2021)
Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q.: nuScenes: A Multimodal Dataset for Autonomous Driving. CVPR (2020)
Castrejón, L., Ballas, N., Courville, A.C.: Improved conditional vrnns for video prediction. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7607–7616 (2019)
Chatterjee, M., Ahuja, N., Cherian, A.: A Hierarchical Variational Neural Uncertainty Model for Stochastic Video Prediction. ICCV (2021)
Chung, J., Kastner, K., Dinh, L., Goel, K., Courville, A.C., Bengio, Y.: A Recurrent Latent Variable Model for Sequential Data. NIPS (2015)
Deng, D., Zakhor, A.: Temporal LiDAR Frame Prediction for Autonomous Driving. 3DV (2020)
Denton, E., Fergus, R.: Stochastic Video Generation with a Learned Prior. ICML (2018)
Fan, H., Su, H., Guibas, L.: A Point Set Generation Network for 3D Object Reconstruction from a Single Image. CVPR (2017)
Fan, H., Yang, Y.: PointRNN: Point Recurrent Neural Network for Moving Point Cloud Processing. arXiv:1910.08287 (2019)
Geiger, A., Lenz, P., Urtasun, R.: Are We Ready for Autonomous Driving? the KITTI Vision Benchmark Suite. CVPR (2012)
Gomes, P., Rossi, S., Toni, L.: Spatio-Temporal Graph-RNN for Point Cloud Prediction. ICIP (2021)
Gupta, A., Johnson, J., Fei-Fei, L., Savarese, S., Alahi, A.: Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks. CVPR (2018)
Kingma, D.P., Welling, M.: Auto-Encoding Variational Bayes. arXiv:1312.6114 (2013)
Klokov, R., Verbeek, J.J., Boyer, E.: Probabilistic reconstruction networks for 3d shape inference from a single image. In: BMVC (2019)
Liang, J., Jiang, L., Murphy, K., Yu, T., Hauptmann, A.: The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction. CVPR (2020)
Lin, C.H., Kong, C., Lucey, S.: Learning efficient point cloud generation for dense 3d object reconstruction. In: AAAI (2018)
Liu, B., Chen, Y., Liu, S., Kim, H.S.: Deep Learning in Latent Space for Video Prediction and Compression. CVPR (2021)
Mersch, B., Chen, X., Behley, J., Stachniss, C.: Self-Supervised Point Cloud Prediction Using 3D Spatio-temporal Convolutional Networks. CoRL (2021)
Min, Y., Zhang, Y., Chai, X., Chen, X.: An Efficient PointLSTM for Point Clouds Based Gesture Recognition. CVPR (2020)
Nair, S., Savarese, S., Finn, C.: Goal-Aware Prediction: Learning to Model What Matters. ICML (2020)
Rubner, Y., Tomasi, C., Guibas, L.J.: The Earth Mover’s Distance as a Metric for Image Retrieval. IJCV (2000)
Salzmann, T., Ivanovic, B., Chakravarty, P., Pavone, M.: Trajectron++: dynamically-feasible trajectory forecasting with heterogeneous data. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12363, pp. 683–700. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58523-5_40
Shu, D.W., Park, S.W., Kwon, J.: 3d point cloud generative adversarial network based on tree structured graph convolutions. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3858–3867 (2019)
Sun, X., Wang, S., Wang, M., Wang, Z., Liu, M.: A Novel Coding Architecture for LiDAR Point Cloud Sequence. RA-L (2020)
Weng, X., Wang, J., Levine, S., Kitani, K., Rhinehart, N.: Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting. CoRL (2020)
Weng, X., Yuan, Y., Kitani, K.: PTP: Parallelized Tracking and Prediction with Graph Neural Networks and Diversity Sampling. Robot. Autom. Lett. 6(3), 4640–4647 (2021)
Wu, B., Nair, S., Martin-Martin, R., Fei-Fei, L., Finn, C.: Greedy Hierarchical Variational Autoencoders for Large-Scale Video Prediction. CVPR (2021)
Wu, Y., Gao, R., Park, J., Chen, Q.: Future Video Synthesis with Object Motion Prediction. CVPR (2020)
Yang, G., Huang, X., Hao, Z., Liu, M.Y., Belongie, S.J., Hariharan, B.: Pointflow: 3d point cloud generation with continuous normalizing flows. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4540–4549 (2019)
Yuan, Y., Weng, X., Ou, Y., Kitani, K.: AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting. ICCV (2021)
Zhang, C., Fiore, M., Murray, I., Patras, P.: CloudLSTM: A Recurrent Neural Model for Spatiotemporal Point-cloud Stream Forecasting. AAAI (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Weng, X. et al. (2022). S2Net: Stochastic Sequential Pointcloud Forecasting. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13687. Springer, Cham. https://doi.org/10.1007/978-3-031-19812-0_32
Download citation
DOI: https://doi.org/10.1007/978-3-031-19812-0_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19811-3
Online ISBN: 978-3-031-19812-0
eBook Packages: Computer ScienceComputer Science (R0)