Abstract
In the near future, a self-driving car will be able to perceive and understand its surroundings by composing a 3D environment map at object level. In this map, the 3D shapes of surrounding objects will be precisely reconstructed. The technique to reconstructing 3D object shapes using a monocular camera and a Lidar is presented in this paper. The proposed approach combines deep neural networks with an optimization process called 3D Shaping in which object pose and shape are jointly optimized. A significant performance improvement by the proposed approach in estimating object 3D orientation and the occupancy bounding box is proven through quantitative evaluation.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Google X: Google Self-Driving Car Project (2014)
Dickmann, J., Appenrodt, N., Klappstein, J., Blöcher, H.L., Muntzinger, M., Sailer, A., Hahn, M., Brenk, C.: Making Bertha see even more: radar contribution. IEEE Access 3, 1233–1247 (2015)
Franke, U., Pfeiffer, D., Rabe, C., Knöppel, C., Enzweiler, M., Stein, F., Herrtwich, R.G.: Making Bertha see. In: ICCV Workshops 2013, pp. 214–221. IEEE (2013)
Rusu, R., Blodow, N., Marton, Z., Soos, A., Beetz, M.: Towards 3D object maps for autonomous household robots. In: IROS 2007, pp. 3191–3198. IEEE (2007)
Rusu, R., Marton, Z., Blodow, N., Holzbach, A., Beetz, M.: Model-based and learned semantic object labeling in 3D point cloud maps of kitchen environments. In: IROS 2009, pp. 3601–3608. IEEE (2009)
Miksik, O., Amar, Y., Vineet, V., Pérez, P., Torr, P.H.S.: Incremental dense multi-modal 3D scene reconstruction. In: IROS 2015, pp. 908–915. IEEE (2015)
Sengupta, S., Greveson, E., Shahrokni, A., Torr, P.H.S.: Urban 3D semantic modelling using stereo vision. In: ICRA 2013, pp. 580–585. IEEE (2013)
Vineet, V., Miksik, O., Lidegaard, M., Niebner, M., Golodetz, S., Prisacariu, V.A., Kahler, O., Murray, D.W., Izadi, S., Pérez, P., Torr, P.H.S.: Incremental dense semantic stereo fusion for large-scale semantic scene reconstruction. In: ICRA 2015, pp. 75–82. IEEE (2015)
Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR 2014, pp. 580–587. IEEE (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: NIPS 2012, pp. 1097–1105. NIPS Foundation (2012)
Redmon, J., Divvala, S., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR 2016, IEEE (2016, to appear)
Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Simultaneous detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 297–312. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10584-0_20
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR 2015, pp. 3431–3440. IEEE (2015)
Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.H.S.: Conditional random fields as recurrent neural networks. In: ICCV 2015, pp. 1529–1537. IEEE (2015)
Beyer, L., Hermans, A., Leibe, B.: Biternion nets: continuous head pose regression from discrete training labels. In: Gall, J., Gehler, P., Leibe, B. (eds.) GCPR 2015. LNCS, vol. 9358, pp. 157–168. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24947-6_13
Su, H., Qi, C.R., Li, Y., Guibas, L.: Render for CNN: viewpoint estimation in images using CNNs trained with rendered 3D model views. In: ICCV 2015, pp. 2686–2694. IEEE (2015)
Tulsiani, S., Malik, J.: Viewpoints and keypoints. In: CVPR 2015, pp. 1510–1519. IEEE (2015)
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: ICCV 2015, pp. 2650–2658. IEEE (2015)
Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: CVPR 2015, pp. 5162–5170. IEEE (2015)
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D ShapeNets: a deep representation for volumetric shapes. In: CVPR 2015, pp. 1912–1920. IEEE (2015)
Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)
Hariharan, B., Arbeláez, P., Girshick, R.B., Malik, J.: Hypercolumns for object segmentation and fine-grained localization. In: CVPR 2015, pp. 447–456. IEEE (2015)
Lin, G., Shen, C., van dan Hengel, A., Reid, I.: Efficient piecewise training of deep structured models for semantic segmentation. In: CVPR 2016, IEEE (2016, to appear)
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR 2016, IEEE (2016, to appear)
Prisacariu, V.A., Reid, I.D.: PWP3D: real-time segmentation and tracking of 3D objects. Int. J. Comput. Vis. 98, 335–354 (2012)
Sandhu, R., Dambreville, S., Yezzi, A., Tannenbaum, A.: Non-rigid 2D–3D pose estimation and 2D image segmentation. In: CVPR 2009, pp. 786–793. IEEE (2009)
Ren, C.Y., Reid, I.: A unified energy minimization framework for model fitting in depth. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012. LNCS, vol. 7584, pp. 72–82. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33868-7_8
Prisacariu, V.A., Segal, A.V., Reid, I.: Simultaneous monocular 2D segmentation, 3D pose recovery and 3D reconstruction. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 593–606. Springer, Heidelberg (2013). doi:10.1007/978-3-642-37331-2_45
Lawrence, N.: Probabilistic non-linear principal component analysis with Gaussian process latent variable models. J. Mach. Learn. Res. 6, 1783–1816 (2005)
Dame, A., Prisacariu, V.A., Ren, C.Y., Reid, I.D.: Dense reconstruction using 3D object shape priors. In: CVPR 2013, pp. 1288–1295. IEEE (2013)
Güney, F., Geiger, A.: Displets: resolving stereo ambiguities using object knowledge. In: CVPR 2015, pp. 4165–4175. IEEE (2015)
Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: DTAM: dense tracking and mapping in real-time. In: ICCV 2011, pp. 2320–2327. IEEE (2011)
Rao, Q., Krüger, L., Dietmayer, K.: Monocular 3D shape reconstruction using deep neural networks. In: IV 2016, pp. 310–315. IEEE (2016)
Xiang, Y., Mottaghi, R., Savarese, S.: Beyond PASCAL: a benchmark for 3D object detection in the wild. In: WACV 2014, pp. 75–82. IEEE (2014)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR 2012, pp. 3354–3361. IEEE (2012)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR 2015, pp. 1–9. IEEE (2015)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R.B., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: MM 2014, pp. 675–678. ACM (2014)
Rusu, R.: Semantic 3D object maps for everyday manipulation in human living environments. Ph.D. thesis, Computer Science Department, Technische Universität, München, Germany (2009)
Dai, J., Kaiming, H., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: CVPR 2016, IEEE (2016, to appear)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Rao, Q., Krüger, L., Dietmayer, K. (2017). 3D Shape Reconstruction in Traffic Scenarios Using Monocular Camera and Lidar. In: Chen, CS., Lu, J., Ma, KK. (eds) Computer Vision – ACCV 2016 Workshops. ACCV 2016. Lecture Notes in Computer Science(), vol 10117. Springer, Cham. https://doi.org/10.1007/978-3-319-54427-4_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-54427-4_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54426-7
Online ISBN: 978-3-319-54427-4
eBook Packages: Computer ScienceComputer Science (R0)