A Unified Pipeline for 3D Detection and Velocity Estimation of Vehicles

Du, Xinxin; Ang, Marcelo H.; Karaman, Sertac; Rus, Daniela

doi:10.1007/978-3-030-95459-8_30

Xinxin Du¹⁵,
Marcelo H. Ang Jr.¹⁶,
Sertac Karaman¹⁷ &
…
Daniela Rus¹⁷

Part of the book series: Springer Proceedings in Advanced Robotics ((SPAR,volume 20))

Included in the following conference series:

The International Symposium of Robotics Research

1652 Accesses

Abstract

3D object detection and velocity estimation are useful for situation awareness and decision making for autonomous vehicles. However much of the current research focuses on 2D detection and does not support velocity estimation. In this paper, we propose a unified pipeline that is able to detect vehicles in 3D space and estimate their velocities simultaneously. The pipeline consists of a 2D detection net, a 3D proposal generation process, a fusion net and an association net. The 2D detection net provides initial information for the 3D proposal generation process, which is able to generate high quality 3D box proposals in the point cloud. The fusion net, with inputs from multiple views, classifies and refines the 3D box proposals. The association net tracks the detected vehicles and estimates their velocities based on the 3D information. The pipeline is tested on the KITTI dataset for both 3D detection and tracking tasks. Our algorithm achieves performance for combined detection of 3D objects and estimation of their velocities that is comparable to the state of the art of 3D detection and the state of the art of object tracking. It is the first pipeline that combines these tasks through deep learning framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bernardin, K., Stiefelhagen, R.: Evaluating multiple object tracking performance: the CLEAR MOT metrics. EURASIP J. Image Video Process. 2008(1) (2008). Article number: 246309. https://doi.org/10.1155/2008/246309
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56
Chapter Google Scholar
Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 354–370. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_22
Chapter Google Scholar
Chabot, F., Chaouch, M., Rabarisoa, J., Teuli, C., Chateau, T.: Deep MANTA: a coarse-to-fine many-task network for joint 2D and 3D vehicle analysis from monocular image. In: CVPR (2017)
Google Scholar
Chen, X., et al.: 3D object proposals for accurate object class detection. In: Advances in Neural Information Processing Systems, pp. 424–432 (2015)
Google Scholar
Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., Urtasun, R.: Monocular 3D object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2147–2156 (2016)
Google Scholar
Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: CVPR (2017)
Google Scholar
Choi, W.: Near-online multi-target tracking with aggregated local flow descriptor. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3029–3037 (2015)
Google Scholar
Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 539–546. IEEE (2005)
Google Scholar
Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs) (2015). arXiv preprint arXiv:151107289
Du, X., Ang Jr, M.H., Rus, D.: Car detection for autonomous vehicle: LIDAR and vision fusion approach through deep learning framework. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 749–754 (2017). https://doi.org/10.1109/IROS.2017.8202234
Du, X., Ang, M.H.J., Karaman, S., Rus, D.: A general pipeline for 3D detection of vehicles. In: 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE (2018)
Google Scholar
Eitel, A., Springenberg, J.T., Spinello, L., Riedmiller, M., Burgard, W.: Multimodal deep learning for robust RGB-D object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 681–687. IEEE (2015)
Google Scholar
Engelcke, M., Rao, D., Wang, D.Z., Tong, C.H., Posner, I.: Vote3Deep: fast object detection in 3D point clouds using efficient convolutional neural networks. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 1355–1361. IEEE (2017)
Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)
Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: KITTI object detection evaluation benchmark site (2012). http://www.cvlibs.net/datasets/kitti/eval_3dobject.php. Accessed 27 Jan 2018
Geiger, A., Lenz, P., Urtasun, R.: KITTI object tracking evaluation benchmark site (2012). http://www.cvlibs.net/datasets/kitti/eval_tracking.php. Accessed 27 Jan 2018
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)
Google Scholar
Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.: Joint 3D proposal generation and object detection from view aggregation (2017). arXiv preprint arXiv:171202294v2
Lee, B., Erdenee, E., Jin, S., Nam, M.Y., Jung, Y.G., Rhee, P.K.: Multi-class multi-object tracking using changing point detection. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 68–83. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_6
Chapter Google Scholar
Li, B.: 3D fully convolutional network for vehicle detection in point cloud (2016). arXiv preprint arXiv:161108069
Li, B., Zhang, T., Xia, T.: Vehicle detection from 3D lidar using fully convolutional network (2016). arXiv preprint arXiv:160807916
Li, Y., Huang, C., Nevatia, R.: Learning to associate: hybridboosted multi-target tracker for crowded scene. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 2953–2960. IEEE (2009)
Google Scholar
Milan, A., Rezatofighi, S.H., Dick, A.R., Reid, I.D., Schindler, K.: Online multi-target tracking using recurrent neural networks. In: AAAI, pp. 4225–4232 (2017)
Google Scholar
Mousavian, A., Anguelov, D., Flynn, J., Kosecka, J.: 3D bounding box estimation using deep learning and geometry (2016). arXiv preprint arXiv:161200496
Ning, G., et al.: Spatially supervised recurrent convolutional neural networks for visual object tracking. In: IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–4. IEEE (2017)
Google Scholar
Osep, A., Mehner, W., Mathias, M., Leibe, B.: Combined image-and world-space tracking in traffic scenes. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 1988–1995. IEEE (2017)
Google Scholar
Pendleton, S.D., et al.: Perception, planning, control, and coordination for autonomous vehicles. Machines 5(1), 6 (2017)
Article Google Scholar
Pham, C.C., Jeon, J.W.: Robust object proposals re-ranking for object detection in autonomous driving using convolutional neural networks. Sig. Process. Image Commun. 53, 110–122 (2017). https://doi.org/10.1016/j.image.2017.02.007. http://www.sciencedirect.com/science/article/pii/S0923596517300231
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation (2016). arXiv preprint arXiv:161200593
Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3D object detection from RGB-D data (2017). arXiv preprint arXiv:171108488
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space (2017). arXiv preprint arXiv:170602413
Ren, M., Pokrovsky, A., Yang, B., Urtasun, R.: SBNet: sparse blocks network for fast inference (2018). arXiv preprint arXiv:180102108
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Schlosser, J., Chow, C.K., Kira, Z.: Fusing lidar and images for pedestrian detection using convolutional neural networks. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 2198–2205. IEEE (2016)
Google Scholar
Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud (2018). arXiv preprint arXiv:181204244
Song, S., Chandraker, M.: Joint SFM and detection cues for monocular 3D localization in road scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3734–3742 (2015)
Google Scholar
Wang, D.Z., Posner, I.: Voting for voting in online point cloud object detection. In: Robotics: Science and Systems (2015)
Google Scholar
Xiang, Y., Alahi, A., Savarese, S.: Learning to track: online multi-object tracking by decision making. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4705–4713 (2015)
Google Scholar
Xiang, Y., Choi, W., Lin, Y., Savarese, S.: Data-driven 3D voxel patterns for object category recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1903–1911 (2015)
Google Scholar
Yang, B., Liang, M., Urtasun, R.: HDNET: exploiting HD maps for 3D object detection. In: 2nd Conference on Robot Learning (CoRL) (2018)
Google Scholar
Yang, F., Choi, W., Lin, Y.: Exploit all the layers: fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2129–2137 (2016)
Google Scholar
Yang, T., Chan, A.B.: Recurrent filter learning for visual tracking (2017). arXiv preprint arXiv:170803874
Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection (2017). arXiv preprint arXiv:171106396

Download references

Acknowledgments

This research was supported by the National Research Foundation, Prime Minister’s Office, Singapore, under its CREATE programme, Singapore-MIT Alliance for Research and Technology (SMART) Future Urban Mobility (FM) IRG. We are grateful for their support. This work has received funding in part from Toyota Research Institute within the Toyota-CSAIL joint research center. The views expressed in this paper solely reflect the opinions and conclusions of its authors and not TRI or any other Toyota entity.

Author information

Authors and Affiliations

Singapore-MIT Alliance for Research and Technology (SMART) Centre, Singapore, Singapore
Xinxin Du
National University of Singapore, Singapore, Singapore
Marcelo H. Ang Jr.
Massachusetts Institute of Technology, Cambridge, USA
Sertac Karaman & Daniela Rus

Authors

Xinxin Du
View author publications
You can also search for this author in PubMed Google Scholar
Marcelo H. Ang Jr.
View author publications
You can also search for this author in PubMed Google Scholar
Sertac Karaman
View author publications
You can also search for this author in PubMed Google Scholar
Daniela Rus
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinxin Du .

Editor information

Editors and Affiliations

Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology, Karlsruhe, Baden-Württemberg, Germany
Tamim Asfour
Department of Information Technology and Human Factors, National Institute of Advanced Industrial Science and Technology, Tsukuba, Japan
Eiichi Yoshida
Seoul National University, Seoul, Korea (Republic of)
Jaeheung Park
Jacobs School of Engineering, Institute for Contextual Robotics, San Diego, CA, USA
Henrik Christensen
Department of Computer Science, Stanford University, Stanford, CA, USA
Oussama Khatib

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 10041 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Du, X., Ang, M.H., Karaman, S., Rus, D. (2022). A Unified Pipeline for 3D Detection and Velocity Estimation of Vehicles. In: Asfour, T., Yoshida, E., Park, J., Christensen, H., Khatib, O. (eds) Robotics Research. ISRR 2019. Springer Proceedings in Advanced Robotics, vol 20. Springer, Cham. https://doi.org/10.1007/978-3-030-95459-8_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-95459-8_30
Published: 17 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95458-1
Online ISBN: 978-3-030-95459-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

A Unified Pipeline for 3D Detection and Velocity Estimation of Vehicles