Skip to main content

A Unified Pipeline for 3D Detection and Velocity Estimation of Vehicles

  • Conference paper
  • First Online:
Robotics Research (ISRR 2019)

Part of the book series: Springer Proceedings in Advanced Robotics ((SPAR,volume 20))

Included in the following conference series:

  • 1652 Accesses

Abstract

3D object detection and velocity estimation are useful for situation awareness and decision making for autonomous vehicles. However much of the current research focuses on 2D detection and does not support velocity estimation. In this paper, we propose a unified pipeline that is able to detect vehicles in 3D space and estimate their velocities simultaneously. The pipeline consists of a 2D detection net, a 3D proposal generation process, a fusion net and an association net. The 2D detection net provides initial information for the 3D proposal generation process, which is able to generate high quality 3D box proposals in the point cloud. The fusion net, with inputs from multiple views, classifies and refines the 3D box proposals. The association net tracks the detected vehicles and estimates their velocities based on the 3D information. The pipeline is tested on the KITTI dataset for both 3D detection and tracking tasks. Our algorithm achieves performance for combined detection of 3D objects and estimation of their velocities that is comparable to the state of the art of 3D detection and the state of the art of object tracking. It is the first pipeline that combines these tasks through deep learning framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bernardin, K., Stiefelhagen, R.: Evaluating multiple object tracking performance: the CLEAR MOT metrics. EURASIP J. Image Video Process. 2008(1) (2008). Article number: 246309. https://doi.org/10.1155/2008/246309

  2. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56

    Chapter  Google Scholar 

  3. Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 354–370. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_22

    Chapter  Google Scholar 

  4. Chabot, F., Chaouch, M., Rabarisoa, J., Teuli, C., Chateau, T.: Deep MANTA: a coarse-to-fine many-task network for joint 2D and 3D vehicle analysis from monocular image. In: CVPR (2017)

    Google Scholar 

  5. Chen, X., et al.: 3D object proposals for accurate object class detection. In: Advances in Neural Information Processing Systems, pp. 424–432 (2015)

    Google Scholar 

  6. Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., Urtasun, R.: Monocular 3D object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2147–2156 (2016)

    Google Scholar 

  7. Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: CVPR (2017)

    Google Scholar 

  8. Choi, W.: Near-online multi-target tracking with aggregated local flow descriptor. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3029–3037 (2015)

    Google Scholar 

  9. Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 539–546. IEEE (2005)

    Google Scholar 

  10. Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs) (2015). arXiv preprint arXiv:151107289

  11. Du, X., Ang Jr, M.H., Rus, D.: Car detection for autonomous vehicle: LIDAR and vision fusion approach through deep learning framework. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 749–754 (2017). https://doi.org/10.1109/IROS.2017.8202234

  12. Du, X., Ang, M.H.J., Karaman, S., Rus, D.: A general pipeline for 3D detection of vehicles. In: 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE (2018)

    Google Scholar 

  13. Eitel, A., Springenberg, J.T., Spinello, L., Riedmiller, M., Burgard, W.: Multimodal deep learning for robust RGB-D object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 681–687. IEEE (2015)

    Google Scholar 

  14. Engelcke, M., Rao, D., Wang, D.Z., Tong, C.H., Posner, I.: Vote3Deep: fast object detection in 3D point clouds using efficient convolutional neural networks. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 1355–1361. IEEE (2017)

    Google Scholar 

  15. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)

    Google Scholar 

  16. Geiger, A., Lenz, P., Urtasun, R.: KITTI object detection evaluation benchmark site (2012). http://www.cvlibs.net/datasets/kitti/eval_3dobject.php. Accessed 27 Jan 2018

  17. Geiger, A., Lenz, P., Urtasun, R.: KITTI object tracking evaluation benchmark site (2012). http://www.cvlibs.net/datasets/kitti/eval_tracking.php. Accessed 27 Jan 2018

  18. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  19. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)

    Google Scholar 

  20. Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.: Joint 3D proposal generation and object detection from view aggregation (2017). arXiv preprint arXiv:171202294v2

  21. Lee, B., Erdenee, E., Jin, S., Nam, M.Y., Jung, Y.G., Rhee, P.K.: Multi-class multi-object tracking using changing point detection. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 68–83. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_6

    Chapter  Google Scholar 

  22. Li, B.: 3D fully convolutional network for vehicle detection in point cloud (2016). arXiv preprint arXiv:161108069

  23. Li, B., Zhang, T., Xia, T.: Vehicle detection from 3D lidar using fully convolutional network (2016). arXiv preprint arXiv:160807916

  24. Li, Y., Huang, C., Nevatia, R.: Learning to associate: hybridboosted multi-target tracker for crowded scene. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 2953–2960. IEEE (2009)

    Google Scholar 

  25. Milan, A., Rezatofighi, S.H., Dick, A.R., Reid, I.D., Schindler, K.: Online multi-target tracking using recurrent neural networks. In: AAAI, pp. 4225–4232 (2017)

    Google Scholar 

  26. Mousavian, A., Anguelov, D., Flynn, J., Kosecka, J.: 3D bounding box estimation using deep learning and geometry (2016). arXiv preprint arXiv:161200496

  27. Ning, G., et al.: Spatially supervised recurrent convolutional neural networks for visual object tracking. In: IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–4. IEEE (2017)

    Google Scholar 

  28. Osep, A., Mehner, W., Mathias, M., Leibe, B.: Combined image-and world-space tracking in traffic scenes. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 1988–1995. IEEE (2017)

    Google Scholar 

  29. Pendleton, S.D., et al.: Perception, planning, control, and coordination for autonomous vehicles. Machines 5(1), 6 (2017)

    Article  Google Scholar 

  30. Pham, C.C., Jeon, J.W.: Robust object proposals re-ranking for object detection in autonomous driving using convolutional neural networks. Sig. Process. Image Commun. 53, 110–122 (2017). https://doi.org/10.1016/j.image.2017.02.007. http://www.sciencedirect.com/science/article/pii/S0923596517300231

  31. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation (2016). arXiv preprint arXiv:161200593

  32. Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3D object detection from RGB-D data (2017). arXiv preprint arXiv:171108488

  33. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space (2017). arXiv preprint arXiv:170602413

  34. Ren, M., Pokrovsky, A., Yang, B., Urtasun, R.: SBNet: sparse blocks network for fast inference (2018). arXiv preprint arXiv:180102108

  35. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)

    Google Scholar 

  36. Schlosser, J., Chow, C.K., Kira, Z.: Fusing lidar and images for pedestrian detection using convolutional neural networks. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 2198–2205. IEEE (2016)

    Google Scholar 

  37. Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud (2018). arXiv preprint arXiv:181204244

  38. Song, S., Chandraker, M.: Joint SFM and detection cues for monocular 3D localization in road scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3734–3742 (2015)

    Google Scholar 

  39. Wang, D.Z., Posner, I.: Voting for voting in online point cloud object detection. In: Robotics: Science and Systems (2015)

    Google Scholar 

  40. Xiang, Y., Alahi, A., Savarese, S.: Learning to track: online multi-object tracking by decision making. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4705–4713 (2015)

    Google Scholar 

  41. Xiang, Y., Choi, W., Lin, Y., Savarese, S.: Data-driven 3D voxel patterns for object category recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1903–1911 (2015)

    Google Scholar 

  42. Yang, B., Liang, M., Urtasun, R.: HDNET: exploiting HD maps for 3D object detection. In: 2nd Conference on Robot Learning (CoRL) (2018)

    Google Scholar 

  43. Yang, F., Choi, W., Lin, Y.: Exploit all the layers: fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2129–2137 (2016)

    Google Scholar 

  44. Yang, T., Chan, A.B.: Recurrent filter learning for visual tracking (2017). arXiv preprint arXiv:170803874

  45. Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection (2017). arXiv preprint arXiv:171106396

Download references

Acknowledgments

This research was supported by the National Research Foundation, Prime Minister’s Office, Singapore, under its CREATE programme, Singapore-MIT Alliance for Research and Technology (SMART) Future Urban Mobility (FM) IRG. We are grateful for their support. This work has received funding in part from Toyota Research Institute within the Toyota-CSAIL joint research center. The views expressed in this paper solely reflect the opinions and conclusions of its authors and not TRI or any other Toyota entity.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinxin Du .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 10041 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Du, X., Ang, M.H., Karaman, S., Rus, D. (2022). A Unified Pipeline for 3D Detection and Velocity Estimation of Vehicles. In: Asfour, T., Yoshida, E., Park, J., Christensen, H., Khatib, O. (eds) Robotics Research. ISRR 2019. Springer Proceedings in Advanced Robotics, vol 20. Springer, Cham. https://doi.org/10.1007/978-3-030-95459-8_30

Download citation

Publish with us

Policies and ethics