Abstract
Detecting objects in 3D LiDAR data is a core technology for autonomous driving and other robotics applications. Although LiDAR data is acquired over time, most of the 3D object detection algorithms propose object bounding boxes independently for each frame and neglect the useful information available in the temporal domain. To address this problem, in this paper we propose a sparse LSTM-based multi-frame 3d object detection algorithm. We use a U-Net style 3D sparse convolution network to extract features for each frame’s LiDAR point-cloud. These features are fed to the LSTM module together with the hidden and memory features from last frame to predict the 3d objects in the current frame as well as hidden and memory features that are passed to the next frame. Experiments on the Waymo Open Dataset show that our algorithm outperforms the traditional frame by frame approach by 7.5% mAP@0.7 and other multi-frame approaches by 1.2% while using less memory and computation per frame. To the best of our knowledge, this is the first work to use an LSTM for 3D object detection in sparse point clouds.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Using code released by [44].
References
Behl, A., Hosseini Jafari, O., Karthik Mustikovela, S., Abu Alhaija, H., Rother, C., Geiger, A.: Bounding boxes, segmentations and object coordinates: how important is recognition for 3D scene flow estimation in autonomous driving scenarios? In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2574–2583 (2017)
Behl, A., Paschalidou, D., Donné, S., Geiger, A.: PointFlowNet: learning representations for rigid motion estimation from point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7962–7971 (2019)
Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. arXiv preprint arXiv:1903.11027 (2019)
Casas, S., Luo, W., Urtasun, R.: IntentNet: learning to predict intention from raw sensor data. In: Conference on Robot Learning, pp. 947–956 (2018)
Chang, M.F., et al.: Argoverse: 3D tracking and forecasting with rich maps. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8748–8757 (2019)
Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1907–1915 (2017)
Chen, X., Yu, J., Wu, Z.: Temporally identity-aware SSD with attentional LSTM. IEEE Trans. Cybern. 50(6), 2674–2686 (2019)
Chiu, H.k., Prioletti, A., Li, J., Bohg, J.: Probabilistic 3D multi-object tracking for autonomous driving. arXiv preprint arXiv:2001.05673 (2020)
Choy, C., Gwak, J., Savarese, S.: 4D spatio-temporal convnets: Minkowski convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3075–3084 (2019)
Dosovitskiy, A., et al.: FlowNet: learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2758–2766 (2015)
El Sallab, A., Sobh, I., Zidan, M., Zahran, M., Abdelkarim, S.: YOLO4D: a spatio-temporal approach for real-time multi-object detection and classification from LiDAR point clouds. In: NIPS 2018 Workshop MLITS (2018)
Engelcke, M., Rao, D., Wang, D.Z., Tong, C.H., Posner, I.: Vote3Deep: fast object detection in 3D point clouds using efficient convolutional neural networks. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 1355–1361. IEEE (2017)
Fan, H., Yang, Y.: PointRNN: point recurrent neural network for moving point cloud processing. arXiv preprint arXiv:1910.08287 (2019)
Feng, Y., Ma, L., Liu, W., Luo, J.: Spatio-temporal video re-localization by warp LSTM. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1288–1297 (2019)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the KITTI vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012)
Graham, B.: Sparse 3D convolutional neural networks. In: Xie, X., Jones, M.W.G.K.L.T. (eds.) Proceedings of the British Machine Vision Conference (BMVC), pp. 150.1–150.9. BMVA Press, September 2015. https://doi.org/10.5244/C.29.150
Graham, B., Engelcke, M., van der Maaten, L.: 3D semantic segmentation with submanifold sparse convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9224–9232 (2018)
Graham, B., van der Maaten, L.: Submanifold sparse convolutional networks. arXiv preprint arXiv:1706.01307 (2017)
Groueix, T., Fisher, M., Kim, V.G., Russell, B.C., Aubry, M.: AtlasNet: a papier-mâché approach to learning 3D surface generation. arXiv preprint arXiv:1802.05384 (2018)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Hu, P., Ziglar, J., Held, D., Ramanan, D.: What you see is what you get: exploiting visibility for 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11001–11009 (2020)
Huang, L., Yan, P., Li, G., Wang, Q., Lin, L.: Attention embedded spatio-temporal network for video salient object detection. IEEE Access 7, 166203–166213 (2019)
Kanazawa, A., Zhang, J.Y., Felsen, P., Malik, J.: Learning 3D human dynamics from video. In: Computer Vision and Pattern Regognition (CVPR) (2019)
Kang, K., et al.: Object detection in videos with tubelet proposal networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 727–735 (2017)
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12697–12705 (2019)
Li, B., Zhang, T., Xia, T.: Vehicle detection from 3d lidar using fully convolutional network. arXiv preprint arXiv:1608.07916 (2016)
Liang, M., Yang, B., Wang, S., Urtasun, R.: Deep continuous fusion for multi-sensor 3D object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 641–656 (2018)
Liu, X., Qi, C.R., Guibas, L.J.: FlowNet3D: learning scene flow in 3D point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 529–537 (2019)
Liu, X., Yan, M., Bohg, J.: MeteorNet: deep learning on dynamic 3D point cloud sequences. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9246–9255 (2019)
Luo, W., Yang, B., Urtasun, R.: Fast and furious: real time end-to-end 3D detection, tracking and motion forecasting with a single convolutional net. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 3569–3577 (2018)
McCrae, S., Zakhor, A.: 3D object detection using temporal lidar data. In: IEEE International Conference on Image Processing (ICIP) (2020)
Meyer, G.P., Laddha, A., Kee, E., Vallespi-Gonzalez, C., Wellington, C.K.: LaserNet: an efficient probabilistic 3D object detector for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12677–12686 (2019)
Najibi, M., et al.: DOPS: learning to detect 3D objects and predict their 3D shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)
Ngiam, J., et al.: StarNet: targeted computation for object detection in point clouds. arXiv preprint arXiv:1908.11069 (2019)
Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep Hough voting for 3D object detection in point clouds. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9277–9286 (2019)
Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3D object detection from RGB-D data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 918–927 (2018)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)
Riegler, G., Osman Ulusoy, A., Geiger, A.: OctNet: Learning deep 3D representations at high resolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3577–3586 (2017)
Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–779 (2019)
Simon, M., et al.: Complexer-YOLO: real-time 3D object detection and tracking on semantic point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2019)
Simon, M., Milz, S., Amende, K., Gross, H.-M.: Complex-YOLO: an Euler-region-proposal for real-time 3D object detection on point clouds. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11129, pp. 197–209. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11009-3_11
Sun, P., et al.: Scalability in perception for autonomous driving: Waymo open dataset. arXiv pp. arXiv-1912 (2019)
Teng, E., Falcão, J.D., Huang, R., Iannucci, B.: ClickBAIT: click-based accelerated incremental training of convolutional neural networks. In: 2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), pp. 1–12. IEEE (2018)
Weng, X., Kitani, K.: A Baseline for 3D Multi-Object Tracking. arXiv:1907.03961 (2019)
Xiao, F., Jae Lee, Y.: Video object detection with an aligned spatial-temporal memory. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 485–501 (2018)
Xu, Z., et al.: Unsupervised discovery of parts, structure, and dynamics. arXiv preprint arXiv:1903.05136 (2019)
Yang, B., Luo, W., Urtasun, R.: PIXOR: real-time 3D object detection from point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7652–7660 (2018)
Yang, Y., Feng, C., Shen, Y., Tian, D.: FoldingNet: point cloud auto-encoder via deep grid deformation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 206–215 (2018)
Yin, J., Shen, J., Guan, C., Zhou, D., Yang, R.: LiDAR-based online 3D video object detection with graph-based message passing and spatiotemporal transformer attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020
Zhao, H., Jiang, L., Fu, C.W., Jia, J.: PointWeb: enhancing local neighborhood features for point cloud processing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5565–5573 (2019)
Zhao, Y., Birdal, T., Deng, H., Tombari, F.: 3D point capsule networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1009–1018 (2019)
Zhou, Y., et al.: End-to-end multi-view fusion for 3D object detection in LiDAR point clouds. arXiv preprint arXiv:1910.06528 (2019)
Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4490–4499 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Huang, R. et al. (2020). An LSTM Approach to Temporal 3D Object Detection in LiDAR Point Clouds. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12363. Springer, Cham. https://doi.org/10.1007/978-3-030-58523-5_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-58523-5_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58522-8
Online ISBN: 978-3-030-58523-5
eBook Packages: Computer ScienceComputer Science (R0)