Abstract
This paper proposes a new neural network architecture for estimating the four degrees of freedom poses of vehicles from monocular images in an uncontrolled environment. The neural network learns how to reconstruct 3D characteristic points of vehicles from image crops and coordinates of 2D keypoints estimated from these images. The 3D and 2D points are used to compute the vehicle pose solving the Perspective-n-Point problem, while the uncertainty is propagated by applying the Unscented Transform. Our network is trained and tested on the ApolloCar3D dataset, and we introduce a novel method to automatically obtain approximate labels for 3D points in this dataset. Our system outperforms state-of-the-art pose estimation methods on the ApolloCar3D dataset, and unlike competitors, it implements a full pipeline of uncertainty propagation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Brazil, G., Liu, X.: M3D-RPN: monocular 3D region proposal network for object detection. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9286–9295 (2019)
Chabot, F., Chaouch, M., Rabarisoa, J., Teuliere, C., Chateau, T.: Deep MANTA: A coarse-to-fine many-task network for joint 2D and 3D vehicle analysis from monocular image. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2040–2049 (2017)
Ferraz, L., Binefa, X., Moreno-Noguer, F.: Leveraging feature uncertainty in the PNP problem. In: Proceedings of the British Machine Vision Conference (2014)
Haralick, R.M.: Propagating covariance in computer vision. Int. J. Pattern Recogn. Artif. Intell. 10(5), 561–572 (1996)
Hoque, S., Xu, S., Maiti, A., Wei, Y., Arafat, M.Y.: Deep learning for 6d pose estimation of objects - a case study for autonomous driving. Expert Syst. Appl. 223, 119838 (2023)
Huang, J., Zhu, Z., Guo, F.: The devil is in the details: delving into unbiased data processing for human pose estimation. arXiv:2008.07139 (2020)
Ke, L., Li, S., Sun, Y., Tai, Y.-W., Tang, C.-K.: GSNet: joint vehicle pose and shape reconstruction with geometrical and scene-aware supervision. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12360, pp. 515–532. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58555-6_31
Kumar, A., Marks, T.K., Mou, W., Feng, C., Liu, X.: UGLLI face alignment: Estimating uncertainty with gaussian log-likelihood loss. In: IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 778–782 (2019)
Kundu, A., Li, Y., Rehg, J.M.: 3D-RCNN: instance-level 3D object reconstruction via render-and-compare. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3559–3568 (2018)
Lee, H.J., Kim, H., Choi, S.M., Jeong, S.G., Koh, Y.J.: BAAM: monocular 3d pose and shape reconstruction with bi-contextual attention module and attention-guided modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9011–9020, June 2023
Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: an accurate O(n) solution to the PnP problem. Int. J. Comput. Vision 81, 155–166 (2009)
Li, B., Zhang, T., Xia, T.: Vehicle detection from 3d lidar using fully convolutional network. In: Hsu, D., Amato, N.M., Berman, S., Jacobs, S.A. (eds.) Robotics: Science and Systems XII. University of Michigan, Ann Arbor (2016)
Li, H., et al.: Pose-oriented transformer with uncertainty-guided refinement for 2d-to-3d human pose estimation (2023)
Li, P., Chen, X., Shen, S.: Stereo R-CNN based 3D object detection for autonomous driving. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7636–7644 (2019)
Li, P., Zhao, H., Liu, P., Cao, F.: RTM3D: Real-time monocular 3d detection from object keypoints for autonomous driving. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 644–660. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_38
Liu, F., Hu, Y., Salzmann, M.: Linear-covariance loss for end-to-end learning of 6d pose estimation. CoRR abs/2303.11516 (2023)
LÃşpez, J.G., Agudo, A., Moreno-Noguer, F.: Vehicle pose estimation via regression of semantic points of interest. In: 11th International Symposium on Image and Signal Processing and Analysis (ISPA), pp. 209–214 (2019)
Marti, E., de Miguel, M.A., Garcia, F., Perez, J.: A review of sensor technologies for perception in automated driving. IEEE Intell. Transp. Syst. Mag. 11(4), 94–108 (2019)
Möller, T., Trumbore, B.: Fast, minimum storage ray-triangle intersection. J. Graph. Tools 2(1), 21–28 (1997)
Mousavian, A., Anguelov, D., Flynn, J., Kosecka, J.: 3d bounding box estimation using deep learning and geometry. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR. pp. 5632–5640 (2017)
Nowak, T., Skrzypczyński, P.: Geometry-aware keypoint network: accurate prediction of point features in challenging scenario. In: 17th Conference on Computer Science and Intelligence Systems (FedCSIS), pp. 191–200 (2022)
PÃl’rez, D.A., Gietler, H., Zangl, H.: Automatic uncertainty propagation based on the unscented transform. In: IEEE International Instrumentation and Measurement Technology Conference (I2MTC), pp. 1–6 (2020)
Reddy, N.D., Vo, M., Narasimhan, S.G.: Occlusion-Net: 2D/3D occluded keypoint localization using graph networks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7318–7327 (2019)
Shi, J., Yang, H., Carlone, L.: Optimal pose and shape estimation for category-level 3d object perception. arXiv:2104.08383 (2021)
Song, X., et al.: ApolloCar3D: a large 3D car instance understanding benchmark for autonomous driving. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5447–5457 (2019)
Toshpulatov, M., Lee, W., Lee, S., Haghighian Roudsari, A.: Human pose, hand and mesh estimation using deep learning: a survey. J. Supercomput. 78(6), 7616–7654 (2022)
Vakhitov, A., Colomina, L.F., Agudo, A., Moreno-Noguer, F.: Uncertainty-aware camera pose estimation from points and lines. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4657–4666 (2021)
Virtanen, P., et al.: SciPy 1.0 contributors: SciPy 1.0: fundamental algorithms for scientific computing in python. Nat. Methods 17, 261–272 (2020). https://doi.org/10.1038/s41592-019-0686-2
Wang, J., et al.: Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43, 3349–3364 (2021)
Wang, Q., Chen, J., Deng, J., Zhang, X.: 3D-CenterNet: 3D object detection network for point clouds with center estimation priority. Pattern Recogn. 115, 107884 (2021)
Xu, B., Chen, Z.: Multi-level fusion based 3D object detection from monocular images. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2345–2353 (2018)
Yang, B., Luo, W., Urtasun, R.: PIXOR: real-time 3D object detection from point clouds. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7652–7660 (2018)
Yang, H., Pavone, M.: Object pose estimation with statistical guarantees: conformal keypoint detection and geometric uncertainty propagation. CoRR abs/2303.12246 (2023)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Nowak, T., Skrzypczyński, P. (2024). A Neural Network Architecture for Accurate 4D Vehicle Pose Estimation from Monocular Images with Uncertainty Assessment. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1962. Springer, Singapore. https://doi.org/10.1007/978-981-99-8132-8_30
Download citation
DOI: https://doi.org/10.1007/978-981-99-8132-8_30
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8131-1
Online ISBN: 978-981-99-8132-8
eBook Packages: Computer ScienceComputer Science (R0)