Abstract
3D object detection and velocity estimation are useful for situation awareness and decision making for autonomous vehicles. However much of the current research focuses on 2D detection and does not support velocity estimation. In this paper, we propose a unified pipeline that is able to detect vehicles in 3D space and estimate their velocities simultaneously. The pipeline consists of a 2D detection net, a 3D proposal generation process, a fusion net and an association net. The 2D detection net provides initial information for the 3D proposal generation process, which is able to generate high quality 3D box proposals in the point cloud. The fusion net, with inputs from multiple views, classifies and refines the 3D box proposals. The association net tracks the detected vehicles and estimates their velocities based on the 3D information. The pipeline is tested on the KITTI dataset for both 3D detection and tracking tasks. Our algorithm achieves performance for combined detection of 3D objects and estimation of their velocities that is comparable to the state of the art of 3D detection and the state of the art of object tracking. It is the first pipeline that combines these tasks through deep learning framework.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bernardin, K., Stiefelhagen, R.: Evaluating multiple object tracking performance: the CLEAR MOT metrics. EURASIP J. Image Video Process. 2008(1) (2008). Article number: 246309. https://doi.org/10.1155/2008/246309
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56
Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 354–370. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_22
Chabot, F., Chaouch, M., Rabarisoa, J., Teuli, C., Chateau, T.: Deep MANTA: a coarse-to-fine many-task network for joint 2D and 3D vehicle analysis from monocular image. In: CVPR (2017)
Chen, X., et al.: 3D object proposals for accurate object class detection. In: Advances in Neural Information Processing Systems, pp. 424–432 (2015)
Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., Urtasun, R.: Monocular 3D object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2147–2156 (2016)
Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: CVPR (2017)
Choi, W.: Near-online multi-target tracking with aggregated local flow descriptor. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3029–3037 (2015)
Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 539–546. IEEE (2005)
Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs) (2015). arXiv preprint arXiv:151107289
Du, X., Ang Jr, M.H., Rus, D.: Car detection for autonomous vehicle: LIDAR and vision fusion approach through deep learning framework. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 749–754 (2017). https://doi.org/10.1109/IROS.2017.8202234
Du, X., Ang, M.H.J., Karaman, S., Rus, D.: A general pipeline for 3D detection of vehicles. In: 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE (2018)
Eitel, A., Springenberg, J.T., Spinello, L., Riedmiller, M., Burgard, W.: Multimodal deep learning for robust RGB-D object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 681–687. IEEE (2015)
Engelcke, M., Rao, D., Wang, D.Z., Tong, C.H., Posner, I.: Vote3Deep: fast object detection in 3D point clouds using efficient convolutional neural networks. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 1355–1361. IEEE (2017)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)
Geiger, A., Lenz, P., Urtasun, R.: KITTI object detection evaluation benchmark site (2012). http://www.cvlibs.net/datasets/kitti/eval_3dobject.php. Accessed 27 Jan 2018
Geiger, A., Lenz, P., Urtasun, R.: KITTI object tracking evaluation benchmark site (2012). http://www.cvlibs.net/datasets/kitti/eval_tracking.php. Accessed 27 Jan 2018
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)
Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.: Joint 3D proposal generation and object detection from view aggregation (2017). arXiv preprint arXiv:171202294v2
Lee, B., Erdenee, E., Jin, S., Nam, M.Y., Jung, Y.G., Rhee, P.K.: Multi-class multi-object tracking using changing point detection. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 68–83. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_6
Li, B.: 3D fully convolutional network for vehicle detection in point cloud (2016). arXiv preprint arXiv:161108069
Li, B., Zhang, T., Xia, T.: Vehicle detection from 3D lidar using fully convolutional network (2016). arXiv preprint arXiv:160807916
Li, Y., Huang, C., Nevatia, R.: Learning to associate: hybridboosted multi-target tracker for crowded scene. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 2953–2960. IEEE (2009)
Milan, A., Rezatofighi, S.H., Dick, A.R., Reid, I.D., Schindler, K.: Online multi-target tracking using recurrent neural networks. In: AAAI, pp. 4225–4232 (2017)
Mousavian, A., Anguelov, D., Flynn, J., Kosecka, J.: 3D bounding box estimation using deep learning and geometry (2016). arXiv preprint arXiv:161200496
Ning, G., et al.: Spatially supervised recurrent convolutional neural networks for visual object tracking. In: IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–4. IEEE (2017)
Osep, A., Mehner, W., Mathias, M., Leibe, B.: Combined image-and world-space tracking in traffic scenes. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 1988–1995. IEEE (2017)
Pendleton, S.D., et al.: Perception, planning, control, and coordination for autonomous vehicles. Machines 5(1), 6 (2017)
Pham, C.C., Jeon, J.W.: Robust object proposals re-ranking for object detection in autonomous driving using convolutional neural networks. Sig. Process. Image Commun. 53, 110–122 (2017). https://doi.org/10.1016/j.image.2017.02.007. http://www.sciencedirect.com/science/article/pii/S0923596517300231
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation (2016). arXiv preprint arXiv:161200593
Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3D object detection from RGB-D data (2017). arXiv preprint arXiv:171108488
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space (2017). arXiv preprint arXiv:170602413
Ren, M., Pokrovsky, A., Yang, B., Urtasun, R.: SBNet: sparse blocks network for fast inference (2018). arXiv preprint arXiv:180102108
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Schlosser, J., Chow, C.K., Kira, Z.: Fusing lidar and images for pedestrian detection using convolutional neural networks. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 2198–2205. IEEE (2016)
Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud (2018). arXiv preprint arXiv:181204244
Song, S., Chandraker, M.: Joint SFM and detection cues for monocular 3D localization in road scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3734–3742 (2015)
Wang, D.Z., Posner, I.: Voting for voting in online point cloud object detection. In: Robotics: Science and Systems (2015)
Xiang, Y., Alahi, A., Savarese, S.: Learning to track: online multi-object tracking by decision making. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4705–4713 (2015)
Xiang, Y., Choi, W., Lin, Y., Savarese, S.: Data-driven 3D voxel patterns for object category recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1903–1911 (2015)
Yang, B., Liang, M., Urtasun, R.: HDNET: exploiting HD maps for 3D object detection. In: 2nd Conference on Robot Learning (CoRL) (2018)
Yang, F., Choi, W., Lin, Y.: Exploit all the layers: fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2129–2137 (2016)
Yang, T., Chan, A.B.: Recurrent filter learning for visual tracking (2017). arXiv preprint arXiv:170803874
Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection (2017). arXiv preprint arXiv:171106396
Acknowledgments
This research was supported by the National Research Foundation, Prime Minister’s Office, Singapore, under its CREATE programme, Singapore-MIT Alliance for Research and Technology (SMART) Future Urban Mobility (FM) IRG. We are grateful for their support. This work has received funding in part from Toyota Research Institute within the Toyota-CSAIL joint research center. The views expressed in this paper solely reflect the opinions and conclusions of its authors and not TRI or any other Toyota entity.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Du, X., Ang, M.H., Karaman, S., Rus, D. (2022). A Unified Pipeline for 3D Detection and Velocity Estimation of Vehicles. In: Asfour, T., Yoshida, E., Park, J., Christensen, H., Khatib, O. (eds) Robotics Research. ISRR 2019. Springer Proceedings in Advanced Robotics, vol 20. Springer, Cham. https://doi.org/10.1007/978-3-030-95459-8_30
Download citation
DOI: https://doi.org/10.1007/978-3-030-95459-8_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95458-1
Online ISBN: 978-3-030-95459-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)