S2Net: Stochastic Sequential Pointcloud Forecasting

Weng, Xinshuo; Nan, Junyu; Lee, Kuan-Hui; McAllister, Rowan; Gaidon, Adrien; Rhinehart, Nicholas; Kitani, Kris M.

doi:10.1007/978-3-031-19812-0_32

S2Net: Stochastic Sequential Pointcloud Forecasting

Xinshuo Weng¹²,
Junyu Nan¹²,
Kuan-Hui Lee¹³,
Rowan McAllister¹³,
Adrien Gaidon¹³,
Nicholas Rhinehart¹⁴ &
…
Kris M. Kitani¹²

Conference paper
First Online: 30 October 2022

2092 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13687))

Abstract

Predicting futures of surrounding agents is critical for autonomous systems such as self-driving cars. Instead of requiring accurate detection and tracking prior to trajectory prediction, an object agnostic Sequential Pointcloud Forecasting (SPF) task was proposed [28], which enables a forecast-then-detect pipeline effective for downstream detection and trajectory prediction. One limitation of prior work is that it forecasts only a deterministic sequence of future point clouds, despite the inherent uncertainty of dynamic scenes. In this work, we tackle the stochastic SPF problem by proposing a generative model with two main components: (1) a conditional variational recurrent neural network that models a temporally-dependent latent space; (2) a pyramid-LSTM that increases the fidelity of predictions with temporally-aligned skip connections. Through experiments on real-world autonomous driving datasets, our stochastic SPF model produces higher-fidelity predictions, reducing Chamfer distances by up to 56.6% compared to its deterministic counterpart. In addition, our model can estimate the uncertainty of predicted points, which can be helpful to downstream tasks.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Our project website is at https://www.xinshuoweng.com/projects/S2Net.
2.
An illustrative comparison between these works can be found in [6].
3.
There might be still a small amount of information loss due to discretization unless one uses very high-resolution range map as we do in the experiments.
4.
Unlike RGB images, not every pixel in the range map is valid. This is because there are pixels without corresponding points in 3D due to no return of the LiDAR beam in that direction, e.g., when the LiDAR beam shots to the sky.
5.
Note that the original numbers reported in [28] are different due to various changes including improved network architectures, balanced data split, improved implementation of the metrics, etc.
6.
Even for KITTI, [21] has only evaluated on the odometry dataset with only 22 sequences, i.e., a subset of the raw dataset, so we have re-trained their model on the full KITTI raw dataset using the official code and KITTI configuration files.
7.
We did not aggregate point clouds along the temporal dimension.

References

KITTI Raw Dataset. http://www.cvlibs.net/datasets/kitti/raw_data.php
nuScenes Prediction Data Split. https://github.com/nutonomy/nuscenes-devkit/blob/master/python-sdk/nuscenes/eval/prediction/splits.py
Bayer, J., Osendorfer, C.: Learning Stochastic Recurrent Networks. arXiv:1411.7610 (2014)
Bei, X., Yang, Y., Soatto, S.: Learning Semantic-Aware Dynamics for Video Prediction. CVPR (2021)
Google Scholar
Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q.: nuScenes: A Multimodal Dataset for Autonomous Driving. CVPR (2020)
Google Scholar
Castrejón, L., Ballas, N., Courville, A.C.: Improved conditional vrnns for video prediction. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7607–7616 (2019)
Google Scholar
Chatterjee, M., Ahuja, N., Cherian, A.: A Hierarchical Variational Neural Uncertainty Model for Stochastic Video Prediction. ICCV (2021)
Google Scholar
Chung, J., Kastner, K., Dinh, L., Goel, K., Courville, A.C., Bengio, Y.: A Recurrent Latent Variable Model for Sequential Data. NIPS (2015)
Google Scholar
Deng, D., Zakhor, A.: Temporal LiDAR Frame Prediction for Autonomous Driving. 3DV (2020)
Google Scholar
Denton, E., Fergus, R.: Stochastic Video Generation with a Learned Prior. ICML (2018)
Google Scholar
Fan, H., Su, H., Guibas, L.: A Point Set Generation Network for 3D Object Reconstruction from a Single Image. CVPR (2017)
Google Scholar
Fan, H., Yang, Y.: PointRNN: Point Recurrent Neural Network for Moving Point Cloud Processing. arXiv:1910.08287 (2019)
Geiger, A., Lenz, P., Urtasun, R.: Are We Ready for Autonomous Driving? the KITTI Vision Benchmark Suite. CVPR (2012)
Google Scholar
Gomes, P., Rossi, S., Toni, L.: Spatio-Temporal Graph-RNN for Point Cloud Prediction. ICIP (2021)
Google Scholar
Gupta, A., Johnson, J., Fei-Fei, L., Savarese, S., Alahi, A.: Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks. CVPR (2018)
Google Scholar
Kingma, D.P., Welling, M.: Auto-Encoding Variational Bayes. arXiv:1312.6114 (2013)
Klokov, R., Verbeek, J.J., Boyer, E.: Probabilistic reconstruction networks for 3d shape inference from a single image. In: BMVC (2019)
Google Scholar
Liang, J., Jiang, L., Murphy, K., Yu, T., Hauptmann, A.: The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction. CVPR (2020)
Google Scholar
Lin, C.H., Kong, C., Lucey, S.: Learning efficient point cloud generation for dense 3d object reconstruction. In: AAAI (2018)
Google Scholar
Liu, B., Chen, Y., Liu, S., Kim, H.S.: Deep Learning in Latent Space for Video Prediction and Compression. CVPR (2021)
Google Scholar
Mersch, B., Chen, X., Behley, J., Stachniss, C.: Self-Supervised Point Cloud Prediction Using 3D Spatio-temporal Convolutional Networks. CoRL (2021)
Google Scholar
Min, Y., Zhang, Y., Chai, X., Chen, X.: An Efficient PointLSTM for Point Clouds Based Gesture Recognition. CVPR (2020)
Google Scholar
Nair, S., Savarese, S., Finn, C.: Goal-Aware Prediction: Learning to Model What Matters. ICML (2020)
Google Scholar
Rubner, Y., Tomasi, C., Guibas, L.J.: The Earth Mover’s Distance as a Metric for Image Retrieval. IJCV (2000)
Google Scholar
Salzmann, T., Ivanovic, B., Chakravarty, P., Pavone, M.: Trajectron++: dynamically-feasible trajectory forecasting with heterogeneous data. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12363, pp. 683–700. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58523-5_40
Chapter Google Scholar
Shu, D.W., Park, S.W., Kwon, J.: 3d point cloud generative adversarial network based on tree structured graph convolutions. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3858–3867 (2019)
Google Scholar
Sun, X., Wang, S., Wang, M., Wang, Z., Liu, M.: A Novel Coding Architecture for LiDAR Point Cloud Sequence. RA-L (2020)
Google Scholar
Weng, X., Wang, J., Levine, S., Kitani, K., Rhinehart, N.: Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting. CoRL (2020)
Google Scholar
Weng, X., Yuan, Y., Kitani, K.: PTP: Parallelized Tracking and Prediction with Graph Neural Networks and Diversity Sampling. Robot. Autom. Lett. 6(3), 4640–4647 (2021)
Google Scholar
Wu, B., Nair, S., Martin-Martin, R., Fei-Fei, L., Finn, C.: Greedy Hierarchical Variational Autoencoders for Large-Scale Video Prediction. CVPR (2021)
Google Scholar
Wu, Y., Gao, R., Park, J., Chen, Q.: Future Video Synthesis with Object Motion Prediction. CVPR (2020)
Google Scholar
Yang, G., Huang, X., Hao, Z., Liu, M.Y., Belongie, S.J., Hariharan, B.: Pointflow: 3d point cloud generation with continuous normalizing flows. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4540–4549 (2019)
Google Scholar
Yuan, Y., Weng, X., Ou, Y., Kitani, K.: AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting. ICCV (2021)
Google Scholar
Zhang, C., Fiore, M., Murray, I., Patras, P.: CloudLSTM: A Recurrent Neural Model for Spatiotemporal Point-cloud Stream Forecasting. AAAI (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Robotics Institute, Carnegie Mellon University, Pittsburgh, USA
Xinshuo Weng, Junyu Nan & Kris M. Kitani
Toyota Research Institute, Palo Alto, USA
Kuan-Hui Lee, Rowan McAllister & Adrien Gaidon
Berkeley Artificial Intelligence Research Lab, University of California, Berkeley, USA
Nicholas Rhinehart

Authors

Xinshuo Weng
View author publications
You can also search for this author in PubMed Google Scholar
Junyu Nan
View author publications
You can also search for this author in PubMed Google Scholar
Kuan-Hui Lee
View author publications
You can also search for this author in PubMed Google Scholar
Rowan McAllister
View author publications
You can also search for this author in PubMed Google Scholar
Adrien Gaidon
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas Rhinehart
View author publications
You can also search for this author in PubMed Google Scholar
Kris M. Kitani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinshuo Weng .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Weng, X. et al. (2022). S2Net: Stochastic Sequential Pointcloud Forecasting. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13687. Springer, Cham. https://doi.org/10.1007/978-3-031-19812-0_32

Download citation

DOI: https://doi.org/10.1007/978-3-031-19812-0_32
Published: 30 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19811-3
Online ISBN: 978-3-031-19812-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics