Abstract
Relative camera pose estimation, i.e. estimating the translation and rotation vectors using a pair of images taken in different locations, is an important part of systems in augmented reality and robotics. In this paper, we present an end-to-end relative camera pose estimation network using a siamese architecture that is independent of camera parameters. The network is trained using the Cambridge Landmarks data with four individual scene datasets and a dataset combining the four scenes. To improve generalization, we propose a novel two-stage training that alleviates the need of a hyperparameter to balance the translation and rotation loss scale. The proposed method is compared with one-stage training CNN-based methods such as RPNet and RCPNet and demonstrate that the proposed model improves translation vector estimation by 16.11%, 28.88%, and 52.27% on the Kings College, Old Hospital, and St Marys Church scenes, respectively. For proving texture invariance, we investigate the generalization of the proposed method augmenting the datasets to different scene styles, as ablation studies, using generative adversarial networks. Also, we present a qualitative assessment of epipolar lines of our network predictions and ground truth poses.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bailo, O., Rameau, F., Joo, K., Park, J., Bogdan, O., Kweon, I.S.: Efficient adaptive non-maximal suppression algorithms for homogeneous spatial keypoint distribution. Pattern Recogn. Lett. 106, 53–60 (2018)
Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (surf). Comput. Vis. Image Underst. 110(3), 346–359 (2008)
Brachmann, E., et al.: Dsac-differentiable ransac for camera localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6684–6692 (2017)
Bradski, G.: The OpenCV library. Dr. Dobb’s J. Softw. Tools (2000)
Chen, K., Snavely, N., Makadia, A.: Wide-baseline relative camera pose estimation with directional learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 3258–3268 (2021)
Dusmanu, M., et al.: D2-net: a trainable CNN for joint detection and description of local features. arXiv preprint arXiv:1905.03561 (2019)
En, S., Lechervy, A., Jurie, F.: Rpnet: an end-to-end network for relative camera pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp. 0–0 (2018)
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Graziani, M., Lompech, T., Müller, H., Depeursinge, A., Andrearczyk, V.: On the scale invariance in state of the art CNNs trained on imagenet. Mach. Learn. Knowl. Extraction 3(2), 374–391 (2021)
Hartley, R., Zisserman, A.: Multiple view geometry in computer vision (cambridge university, 2003). C1 C3 2 (2013)
Hartley, R.I.: In defense of the eight-point algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 19(6), 580–593 (1997)
Howard, A., et al.: Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019)
Hwang, K., Cho, J., Park, J., Har, D., Ahn, S.: Ferrite position identification system operating with wireless power transfer for intelligent train position detection. IEEE Trans. Intell. Transp. Syst. 20(1), 374–382 (2018)
Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5974–5983 (2017)
Kendall, A., Grimes, M., Cipolla, R.: Posenet: a convolutional network for real-time 6-dof camera relocalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2938–2946 (2015)
Kim, S., Kim, I., Vecchietti, L.F., Har, D.: Pose estimation utilizing a gated recurrent unit network for visual localization. Appl. Sci. 10(24), 8876 (2020)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lee, S., Lee, J., Jung, H., Cho, J., Hong, J., Lee, S., Har, D.: Optimal power management for nanogrids based on technical information of electric appliances. Energy Build. 191, 174–186 (2019)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004)
Melekhov, I., Ylioinas, J., Kannala, J., Rahtu, E.: Relative camera pose estimation using convolutional neural networks. In: Blanc-Talon, J., Penne, R., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2017. LNCS, vol. 10617, pp. 675–687. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70353-4_57
Mina, R.: fast-neural-style: Fast style transfer in pytorch! (2018). https://github.com/iamRusty/fast-neural-style-pytorch
Moraes, C., Myung, S., Lee, S., Har, D.: Distributed sensor nodes charged by mobile charger with directional antenna and by energy trading for balancing. Sensors 17(1), 122 (2017)
Nistér, D.: An efficient solution to the five-point relative pose problem. IEEE Trans. Pattern Anal. Mach. Intell. 26(6), 756–770 (2004)
Paszke, A., et al.: Adaptiveavgpool2d. https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveAvgPool2d.html
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc. (2019). https://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2007)
Poursaeed, O., et al.: Deep fundamental matrix estimation without correspondences. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11131, pp. 485–497. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11015-4_35
Raguram, R., Frahm, J.-M., Pollefeys, M.: A comparative analysis of RANSAC techniques leading to adaptive real-time random sample consensus. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 500–513. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88688-4_37
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: Orb: an efficient alternative to sift or surf. In: 2011 International Conference on Computer Vision, pp. 2564–2571. IEEE (2011)
Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: Superglue: learning feature matching with graph neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4938–4947 (2020)
Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Seo, M., Vecchietti, L.F., Lee, S., Har, D.: Rewards prediction-based credit assignment for reinforcement learning with sparse binary rewards. IEEE Access 7, 118776–118791 (2019)
Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: Loftr: detector-free local feature matching with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8922–8931 (2021)
Wynn, K.: pyquaternion (2020). https://github.com/KieranWynn/pyquaternion
Yang, C., Liu, Y., Zell, A.: Rcpnet: deep-learning based relative camera pose estimation for uavs. In: 2020 International Conference on Unmanned Aircraft Systems (ICUAS), pp. 1085–1092. IEEE (2020)
Yew, Z.J., Lee, G.H.: Regtr: end-to-end point cloud correspondences with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6677–6686 (2022)
Acknowledgement
This work was supported by the Institute for Information communications Technology Promotion (IITP) grant funded by the Korean government (MSIT) (No. 2020-0-00440, Development of Artificial Intelligence Technology that continuously improves itself as the situation changes in the real world).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Rajendran, P.K., Mishra, S., Vecchietti, L.F., Har, D. (2023). RelMobNet: End-to-End Relative Camera Pose Estimation Using a Robust Two-Stage Training. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13806. Springer, Cham. https://doi.org/10.1007/978-3-031-25075-0_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-25075-0_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25074-3
Online ISBN: 978-3-031-25075-0
eBook Packages: Computer ScienceComputer Science (R0)