A Neural Network Architecture for Accurate 4D Vehicle Pose Estimation from Monocular Images with Uncertainty Assessment

Nowak, Tomasz; Skrzypczyński, Piotr

doi:10.1007/978-981-99-8132-8_30

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1962))

Included in the following conference series:

International Conference on Neural Information Processing

341 Accesses

Abstract

This paper proposes a new neural network architecture for estimating the four degrees of freedom poses of vehicles from monocular images in an uncontrolled environment. The neural network learns how to reconstruct 3D characteristic points of vehicles from image crops and coordinates of 2D keypoints estimated from these images. The 3D and 2D points are used to compute the vehicle pose solving the Perspective-n-Point problem, while the uncertainty is propagated by applying the Unscented Transform. Our network is trained and tested on the ApolloCar3D dataset, and we introduce a novel method to automatically obtain approximate labels for 3D points in this dataset. Our system outperforms state-of-the-art pose estimation methods on the ApolloCar3D dataset, and unlike competitors, it implements a full pipeline of uncertainty propagation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Brazil, G., Liu, X.: M3D-RPN: monocular 3D region proposal network for object detection. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9286–9295 (2019)
Google Scholar
Chabot, F., Chaouch, M., Rabarisoa, J., Teuliere, C., Chateau, T.: Deep MANTA: A coarse-to-fine many-task network for joint 2D and 3D vehicle analysis from monocular image. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2040–2049 (2017)
Google Scholar
Ferraz, L., Binefa, X., Moreno-Noguer, F.: Leveraging feature uncertainty in the PNP problem. In: Proceedings of the British Machine Vision Conference (2014)
Google Scholar
Haralick, R.M.: Propagating covariance in computer vision. Int. J. Pattern Recogn. Artif. Intell. 10(5), 561–572 (1996)
Article Google Scholar
Hoque, S., Xu, S., Maiti, A., Wei, Y., Arafat, M.Y.: Deep learning for 6d pose estimation of objects - a case study for autonomous driving. Expert Syst. Appl. 223, 119838 (2023)
Google Scholar
Huang, J., Zhu, Z., Guo, F.: The devil is in the details: delving into unbiased data processing for human pose estimation. arXiv:2008.07139 (2020)
Ke, L., Li, S., Sun, Y., Tai, Y.-W., Tang, C.-K.: GSNet: joint vehicle pose and shape reconstruction with geometrical and scene-aware supervision. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12360, pp. 515–532. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58555-6_31
Chapter Google Scholar
Kumar, A., Marks, T.K., Mou, W., Feng, C., Liu, X.: UGLLI face alignment: Estimating uncertainty with gaussian log-likelihood loss. In: IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 778–782 (2019)
Google Scholar
Kundu, A., Li, Y., Rehg, J.M.: 3D-RCNN: instance-level 3D object reconstruction via render-and-compare. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3559–3568 (2018)
Google Scholar
Lee, H.J., Kim, H., Choi, S.M., Jeong, S.G., Koh, Y.J.: BAAM: monocular 3d pose and shape reconstruction with bi-contextual attention module and attention-guided modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9011–9020, June 2023
Google Scholar
Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: an accurate O(n) solution to the PnP problem. Int. J. Comput. Vision 81, 155–166 (2009)
Article Google Scholar
Li, B., Zhang, T., Xia, T.: Vehicle detection from 3d lidar using fully convolutional network. In: Hsu, D., Amato, N.M., Berman, S., Jacobs, S.A. (eds.) Robotics: Science and Systems XII. University of Michigan, Ann Arbor (2016)
Google Scholar
Li, H., et al.: Pose-oriented transformer with uncertainty-guided refinement for 2d-to-3d human pose estimation (2023)
Google Scholar
Li, P., Chen, X., Shen, S.: Stereo R-CNN based 3D object detection for autonomous driving. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7636–7644 (2019)
Google Scholar
Li, P., Zhao, H., Liu, P., Cao, F.: RTM3D: Real-time monocular 3d detection from object keypoints for autonomous driving. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 644–660. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_38
Chapter Google Scholar
Liu, F., Hu, Y., Salzmann, M.: Linear-covariance loss for end-to-end learning of 6d pose estimation. CoRR abs/2303.11516 (2023)
Google Scholar
LÃşpez, J.G., Agudo, A., Moreno-Noguer, F.: Vehicle pose estimation via regression of semantic points of interest. In: 11th International Symposium on Image and Signal Processing and Analysis (ISPA), pp. 209–214 (2019)
Google Scholar
Marti, E., de Miguel, M.A., Garcia, F., Perez, J.: A review of sensor technologies for perception in automated driving. IEEE Intell. Transp. Syst. Mag. 11(4), 94–108 (2019)
Article Google Scholar
Möller, T., Trumbore, B.: Fast, minimum storage ray-triangle intersection. J. Graph. Tools 2(1), 21–28 (1997)
Article Google Scholar
Mousavian, A., Anguelov, D., Flynn, J., Kosecka, J.: 3d bounding box estimation using deep learning and geometry. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR. pp. 5632–5640 (2017)
Google Scholar
Nowak, T., Skrzypczyński, P.: Geometry-aware keypoint network: accurate prediction of point features in challenging scenario. In: 17th Conference on Computer Science and Intelligence Systems (FedCSIS), pp. 191–200 (2022)
Google Scholar
PÃl’rez, D.A., Gietler, H., Zangl, H.: Automatic uncertainty propagation based on the unscented transform. In: IEEE International Instrumentation and Measurement Technology Conference (I2MTC), pp. 1–6 (2020)
Google Scholar
Reddy, N.D., Vo, M., Narasimhan, S.G.: Occlusion-Net: 2D/3D occluded keypoint localization using graph networks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7318–7327 (2019)
Google Scholar
Shi, J., Yang, H., Carlone, L.: Optimal pose and shape estimation for category-level 3d object perception. arXiv:2104.08383 (2021)
Song, X., et al.: ApolloCar3D: a large 3D car instance understanding benchmark for autonomous driving. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5447–5457 (2019)
Google Scholar
Toshpulatov, M., Lee, W., Lee, S., Haghighian Roudsari, A.: Human pose, hand and mesh estimation using deep learning: a survey. J. Supercomput. 78(6), 7616–7654 (2022)
Article Google Scholar
Vakhitov, A., Colomina, L.F., Agudo, A., Moreno-Noguer, F.: Uncertainty-aware camera pose estimation from points and lines. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4657–4666 (2021)
Google Scholar
Virtanen, P., et al.: SciPy 1.0 contributors: SciPy 1.0: fundamental algorithms for scientific computing in python. Nat. Methods 17, 261–272 (2020). https://doi.org/10.1038/s41592-019-0686-2
Article Google Scholar
Wang, J., et al.: Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43, 3349–3364 (2021)
Article Google Scholar
Wang, Q., Chen, J., Deng, J., Zhang, X.: 3D-CenterNet: 3D object detection network for point clouds with center estimation priority. Pattern Recogn. 115, 107884 (2021)
Article Google Scholar
Xu, B., Chen, Z.: Multi-level fusion based 3D object detection from monocular images. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2345–2353 (2018)
Google Scholar
Yang, B., Luo, W., Urtasun, R.: PIXOR: real-time 3D object detection from point clouds. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7652–7660 (2018)
Google Scholar
Yang, H., Pavone, M.: Object pose estimation with statistical guarantees: conformal keypoint detection and geometric uncertainty propagation. CoRR abs/2303.12246 (2023)
Google Scholar

Download references

Author information

Authors and Affiliations

Poznan University of Technology, Institute of Robotics and Machine Intelligence, ul. Piotrowo 3A, 60-965, Poznań, Poland
Tomasz Nowak & Piotr Skrzypczyński

Authors

Tomasz Nowak
View author publications
You can also search for this author in PubMed Google Scholar
Piotr Skrzypczyński
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tomasz Nowak .

Editor information

Editors and Affiliations

School of Automation, Central South University, Changsha, China
Biao Luo
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Long Cheng
Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, China
Zheng-Guang Wu
School of Automation, Guangdong University of Technology, Guangzhou, China
Hongyi Li
School of Electrical Engineering and Telecommunications, UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nowak, T., Skrzypczyński, P. (2024). A Neural Network Architecture for Accurate 4D Vehicle Pose Estimation from Monocular Images with Uncertainty Assessment. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1962. Springer, Singapore. https://doi.org/10.1007/978-981-99-8132-8_30

Download citation

DOI: https://doi.org/10.1007/978-981-99-8132-8_30
Published: 26 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8131-1
Online ISBN: 978-981-99-8132-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Neural Network Architecture for Accurate 4D Vehicle Pose Estimation from Monocular Images with Uncertainty Assessment