3D Shape Reconstruction in Traffic Scenarios Using Monocular Camera and Lidar

Rao, Qing; Krüger, Lars; Dietmayer, Klaus

doi:10.1007/978-3-319-54427-4_1

3D Shape Reconstruction in Traffic Scenarios Using Monocular Camera and Lidar

Qing Rao¹⁶,
Lars Krüger¹⁶ &
Klaus Dietmayer¹⁷

Conference paper
First Online: 16 March 2017

2233 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10117))

Abstract

In the near future, a self-driving car will be able to perceive and understand its surroundings by composing a 3D environment map at object level. In this map, the 3D shapes of surrounding objects will be precisely reconstructed. The technique to reconstructing 3D object shapes using a monocular camera and a Lidar is presented in this paper. The proposed approach combines deep neural networks with an optimization process called 3D Shaping in which object pose and shape are jointly optimized. A significant performance improvement by the proposed approach in estimating object 3D orientation and the occupancy bounding box is proven through quantitative evaluation.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Google X: Google Self-Driving Car Project (2014)
Google Scholar
Dickmann, J., Appenrodt, N., Klappstein, J., Blöcher, H.L., Muntzinger, M., Sailer, A., Hahn, M., Brenk, C.: Making Bertha see even more: radar contribution. IEEE Access 3, 1233–1247 (2015)
Article Google Scholar
Franke, U., Pfeiffer, D., Rabe, C., Knöppel, C., Enzweiler, M., Stein, F., Herrtwich, R.G.: Making Bertha see. In: ICCV Workshops 2013, pp. 214–221. IEEE (2013)
Google Scholar
Rusu, R., Blodow, N., Marton, Z., Soos, A., Beetz, M.: Towards 3D object maps for autonomous household robots. In: IROS 2007, pp. 3191–3198. IEEE (2007)
Google Scholar
Rusu, R., Marton, Z., Blodow, N., Holzbach, A., Beetz, M.: Model-based and learned semantic object labeling in 3D point cloud maps of kitchen environments. In: IROS 2009, pp. 3601–3608. IEEE (2009)
Google Scholar
Miksik, O., Amar, Y., Vineet, V., Pérez, P., Torr, P.H.S.: Incremental dense multi-modal 3D scene reconstruction. In: IROS 2015, pp. 908–915. IEEE (2015)
Google Scholar
Sengupta, S., Greveson, E., Shahrokni, A., Torr, P.H.S.: Urban 3D semantic modelling using stereo vision. In: ICRA 2013, pp. 580–585. IEEE (2013)
Google Scholar
Vineet, V., Miksik, O., Lidegaard, M., Niebner, M., Golodetz, S., Prisacariu, V.A., Kahler, O., Murray, D.W., Izadi, S., Pérez, P., Torr, P.H.S.: Incremental dense semantic stereo fusion for large-scale semantic scene reconstruction. In: ICRA 2015, pp. 75–82. IEEE (2015)
Google Scholar
Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR 2014, pp. 580–587. IEEE (2014)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: NIPS 2012, pp. 1097–1105. NIPS Foundation (2012)
Google Scholar
Redmon, J., Divvala, S., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR 2016, IEEE (2016, to appear)
Google Scholar
Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Simultaneous detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 297–312. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10584-0_20
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR 2015, pp. 3431–3440. IEEE (2015)
Google Scholar
Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.H.S.: Conditional random fields as recurrent neural networks. In: ICCV 2015, pp. 1529–1537. IEEE (2015)
Google Scholar
Beyer, L., Hermans, A., Leibe, B.: Biternion nets: continuous head pose regression from discrete training labels. In: Gall, J., Gehler, P., Leibe, B. (eds.) GCPR 2015. LNCS, vol. 9358, pp. 157–168. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24947-6_13
Chapter Google Scholar
Su, H., Qi, C.R., Li, Y., Guibas, L.: Render for CNN: viewpoint estimation in images using CNNs trained with rendered 3D model views. In: ICCV 2015, pp. 2686–2694. IEEE (2015)
Google Scholar
Tulsiani, S., Malik, J.: Viewpoints and keypoints. In: CVPR 2015, pp. 1510–1519. IEEE (2015)
Google Scholar
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: ICCV 2015, pp. 2650–2658. IEEE (2015)
Google Scholar
Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: CVPR 2015, pp. 5162–5170. IEEE (2015)
Google Scholar
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D ShapeNets: a deep representation for volumetric shapes. In: CVPR 2015, pp. 1912–1920. IEEE (2015)
Google Scholar
Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)
Article Google Scholar
Hariharan, B., Arbeláez, P., Girshick, R.B., Malik, J.: Hypercolumns for object segmentation and fine-grained localization. In: CVPR 2015, pp. 447–456. IEEE (2015)
Google Scholar
Lin, G., Shen, C., van dan Hengel, A., Reid, I.: Efficient piecewise training of deep structured models for semantic segmentation. In: CVPR 2016, IEEE (2016, to appear)
Google Scholar
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR 2016, IEEE (2016, to appear)
Google Scholar
Prisacariu, V.A., Reid, I.D.: PWP3D: real-time segmentation and tracking of 3D objects. Int. J. Comput. Vis. 98, 335–354 (2012)
Article MathSciNet Google Scholar
Sandhu, R., Dambreville, S., Yezzi, A., Tannenbaum, A.: Non-rigid 2D–3D pose estimation and 2D image segmentation. In: CVPR 2009, pp. 786–793. IEEE (2009)
Google Scholar
Ren, C.Y., Reid, I.: A unified energy minimization framework for model fitting in depth. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012. LNCS, vol. 7584, pp. 72–82. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33868-7_8
Google Scholar
Prisacariu, V.A., Segal, A.V., Reid, I.: Simultaneous monocular 2D segmentation, 3D pose recovery and 3D reconstruction. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 593–606. Springer, Heidelberg (2013). doi:10.1007/978-3-642-37331-2_45
Chapter Google Scholar
Lawrence, N.: Probabilistic non-linear principal component analysis with Gaussian process latent variable models. J. Mach. Learn. Res. 6, 1783–1816 (2005)
MathSciNet MATH Google Scholar
Dame, A., Prisacariu, V.A., Ren, C.Y., Reid, I.D.: Dense reconstruction using 3D object shape priors. In: CVPR 2013, pp. 1288–1295. IEEE (2013)
Google Scholar
Güney, F., Geiger, A.: Displets: resolving stereo ambiguities using object knowledge. In: CVPR 2015, pp. 4165–4175. IEEE (2015)
Google Scholar
Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: DTAM: dense tracking and mapping in real-time. In: ICCV 2011, pp. 2320–2327. IEEE (2011)
Google Scholar
Rao, Q., Krüger, L., Dietmayer, K.: Monocular 3D shape reconstruction using deep neural networks. In: IV 2016, pp. 310–315. IEEE (2016)
Google Scholar
Xiang, Y., Mottaghi, R., Savarese, S.: Beyond PASCAL: a benchmark for 3D object detection in the wild. In: WACV 2014, pp. 75–82. IEEE (2014)
Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR 2012, pp. 3354–3361. IEEE (2012)
Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR 2015, pp. 1–9. IEEE (2015)
Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R.B., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: MM 2014, pp. 675–678. ACM (2014)
Google Scholar
Rusu, R.: Semantic 3D object maps for everyday manipulation in human living environments. Ph.D. thesis, Computer Science Department, Technische Universität, München, Germany (2009)
Google Scholar
Dai, J., Kaiming, H., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: CVPR 2016, IEEE (2016, to appear)
Google Scholar

Download references

Author information

Authors and Affiliations

Daimler AG, Ulm, Germany
Qing Rao & Lars Krüger
Ulm University, Ulm, Germany
Klaus Dietmayer

Authors

Qing Rao
View author publications
You can also search for this author in PubMed Google Scholar
Lars Krüger
View author publications
You can also search for this author in PubMed Google Scholar
Klaus Dietmayer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qing Rao .

Editor information

Editors and Affiliations

Institute of Information Science, Academia Sinica, Taipei, Taiwan
Chu-Song Chen
Tsinghua University, Beijing, China
Jiwen Lu
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
Kai-Kuang Ma

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 360 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rao, Q., Krüger, L., Dietmayer, K. (2017). 3D Shape Reconstruction in Traffic Scenarios Using Monocular Camera and Lidar. In: Chen, CS., Lu, J., Ma, KK. (eds) Computer Vision – ACCV 2016 Workshops. ACCV 2016. Lecture Notes in Computer Science(), vol 10117. Springer, Cham. https://doi.org/10.1007/978-3-319-54427-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-54427-4_1
Published: 16 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54426-7
Online ISBN: 978-3-319-54427-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics