Skip to main content

3D Shape Reconstruction in Traffic Scenarios Using Monocular Camera and Lidar

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10117))

Abstract

In the near future, a self-driving car will be able to perceive and understand its surroundings by composing a 3D environment map at object level. In this map, the 3D shapes of surrounding objects will be precisely reconstructed. The technique to reconstructing 3D object shapes using a monocular camera and a Lidar is presented in this paper. The proposed approach combines deep neural networks with an optimization process called 3D Shaping in which object pose and shape are jointly optimized. A significant performance improvement by the proposed approach in estimating object 3D orientation and the occupancy bounding box is proven through quantitative evaluation.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Google X: Google Self-Driving Car Project (2014)

    Google Scholar 

  2. Dickmann, J., Appenrodt, N., Klappstein, J., Blöcher, H.L., Muntzinger, M., Sailer, A., Hahn, M., Brenk, C.: Making Bertha see even more: radar contribution. IEEE Access 3, 1233–1247 (2015)

    Article  Google Scholar 

  3. Franke, U., Pfeiffer, D., Rabe, C., Knöppel, C., Enzweiler, M., Stein, F., Herrtwich, R.G.: Making Bertha see. In: ICCV Workshops 2013, pp. 214–221. IEEE (2013)

    Google Scholar 

  4. Rusu, R., Blodow, N., Marton, Z., Soos, A., Beetz, M.: Towards 3D object maps for autonomous household robots. In: IROS 2007, pp. 3191–3198. IEEE (2007)

    Google Scholar 

  5. Rusu, R., Marton, Z., Blodow, N., Holzbach, A., Beetz, M.: Model-based and learned semantic object labeling in 3D point cloud maps of kitchen environments. In: IROS 2009, pp. 3601–3608. IEEE (2009)

    Google Scholar 

  6. Miksik, O., Amar, Y., Vineet, V., Pérez, P., Torr, P.H.S.: Incremental dense multi-modal 3D scene reconstruction. In: IROS 2015, pp. 908–915. IEEE (2015)

    Google Scholar 

  7. Sengupta, S., Greveson, E., Shahrokni, A., Torr, P.H.S.: Urban 3D semantic modelling using stereo vision. In: ICRA 2013, pp. 580–585. IEEE (2013)

    Google Scholar 

  8. Vineet, V., Miksik, O., Lidegaard, M., Niebner, M., Golodetz, S., Prisacariu, V.A., Kahler, O., Murray, D.W., Izadi, S., Pérez, P., Torr, P.H.S.: Incremental dense semantic stereo fusion for large-scale semantic scene reconstruction. In: ICRA 2015, pp. 75–82. IEEE (2015)

    Google Scholar 

  9. Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR 2014, pp. 580–587. IEEE (2014)

    Google Scholar 

  10. Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: NIPS 2012, pp. 1097–1105. NIPS Foundation (2012)

    Google Scholar 

  11. Redmon, J., Divvala, S., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR 2016, IEEE (2016, to appear)

    Google Scholar 

  12. Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Simultaneous detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 297–312. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10584-0_20

    Google Scholar 

  13. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR 2015, pp. 3431–3440. IEEE (2015)

    Google Scholar 

  14. Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.H.S.: Conditional random fields as recurrent neural networks. In: ICCV 2015, pp. 1529–1537. IEEE (2015)

    Google Scholar 

  15. Beyer, L., Hermans, A., Leibe, B.: Biternion nets: continuous head pose regression from discrete training labels. In: Gall, J., Gehler, P., Leibe, B. (eds.) GCPR 2015. LNCS, vol. 9358, pp. 157–168. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24947-6_13

    Chapter  Google Scholar 

  16. Su, H., Qi, C.R., Li, Y., Guibas, L.: Render for CNN: viewpoint estimation in images using CNNs trained with rendered 3D model views. In: ICCV 2015, pp. 2686–2694. IEEE (2015)

    Google Scholar 

  17. Tulsiani, S., Malik, J.: Viewpoints and keypoints. In: CVPR 2015, pp. 1510–1519. IEEE (2015)

    Google Scholar 

  18. Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: ICCV 2015, pp. 2650–2658. IEEE (2015)

    Google Scholar 

  19. Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: CVPR 2015, pp. 5162–5170. IEEE (2015)

    Google Scholar 

  20. Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D ShapeNets: a deep representation for volumetric shapes. In: CVPR 2015, pp. 1912–1920. IEEE (2015)

    Google Scholar 

  21. Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)

    Article  Google Scholar 

  22. Hariharan, B., Arbeláez, P., Girshick, R.B., Malik, J.: Hypercolumns for object segmentation and fine-grained localization. In: CVPR 2015, pp. 447–456. IEEE (2015)

    Google Scholar 

  23. Lin, G., Shen, C., van dan Hengel, A., Reid, I.: Efficient piecewise training of deep structured models for semantic segmentation. In: CVPR 2016, IEEE (2016, to appear)

    Google Scholar 

  24. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR 2016, IEEE (2016, to appear)

    Google Scholar 

  25. Prisacariu, V.A., Reid, I.D.: PWP3D: real-time segmentation and tracking of 3D objects. Int. J. Comput. Vis. 98, 335–354 (2012)

    Article  MathSciNet  Google Scholar 

  26. Sandhu, R., Dambreville, S., Yezzi, A., Tannenbaum, A.: Non-rigid 2D–3D pose estimation and 2D image segmentation. In: CVPR 2009, pp. 786–793. IEEE (2009)

    Google Scholar 

  27. Ren, C.Y., Reid, I.: A unified energy minimization framework for model fitting in depth. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012. LNCS, vol. 7584, pp. 72–82. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33868-7_8

    Google Scholar 

  28. Prisacariu, V.A., Segal, A.V., Reid, I.: Simultaneous monocular 2D segmentation, 3D pose recovery and 3D reconstruction. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 593–606. Springer, Heidelberg (2013). doi:10.1007/978-3-642-37331-2_45

    Chapter  Google Scholar 

  29. Lawrence, N.: Probabilistic non-linear principal component analysis with Gaussian process latent variable models. J. Mach. Learn. Res. 6, 1783–1816 (2005)

    MathSciNet  MATH  Google Scholar 

  30. Dame, A., Prisacariu, V.A., Ren, C.Y., Reid, I.D.: Dense reconstruction using 3D object shape priors. In: CVPR 2013, pp. 1288–1295. IEEE (2013)

    Google Scholar 

  31. Güney, F., Geiger, A.: Displets: resolving stereo ambiguities using object knowledge. In: CVPR 2015, pp. 4165–4175. IEEE (2015)

    Google Scholar 

  32. Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: DTAM: dense tracking and mapping in real-time. In: ICCV 2011, pp. 2320–2327. IEEE (2011)

    Google Scholar 

  33. Rao, Q., Krüger, L., Dietmayer, K.: Monocular 3D shape reconstruction using deep neural networks. In: IV 2016, pp. 310–315. IEEE (2016)

    Google Scholar 

  34. Xiang, Y., Mottaghi, R., Savarese, S.: Beyond PASCAL: a benchmark for 3D object detection in the wild. In: WACV 2014, pp. 75–82. IEEE (2014)

    Google Scholar 

  35. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR 2012, pp. 3354–3361. IEEE (2012)

    Google Scholar 

  36. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR 2015, pp. 1–9. IEEE (2015)

    Google Scholar 

  37. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R.B., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: MM 2014, pp. 675–678. ACM (2014)

    Google Scholar 

  38. Rusu, R.: Semantic 3D object maps for everyday manipulation in human living environments. Ph.D. thesis, Computer Science Department, Technische Universität, München, Germany (2009)

    Google Scholar 

  39. Dai, J., Kaiming, H., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: CVPR 2016, IEEE (2016, to appear)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qing Rao .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 360 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Rao, Q., Krüger, L., Dietmayer, K. (2017). 3D Shape Reconstruction in Traffic Scenarios Using Monocular Camera and Lidar. In: Chen, CS., Lu, J., Ma, KK. (eds) Computer Vision – ACCV 2016 Workshops. ACCV 2016. Lecture Notes in Computer Science(), vol 10117. Springer, Cham. https://doi.org/10.1007/978-3-319-54427-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-54427-4_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-54426-7

  • Online ISBN: 978-3-319-54427-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics