Abstract
The estimation of the camera poses associated with a set of images commonly relies on feature matches between the images. In contrast, we are the first to address this challenge by using objectness regions to guide the pose estimation problem rather than explicit semantic object detections. We propose Pose Refiner Network (PoserNet) a light-weight Graph Neural Network to refine the approximate pair-wise relative camera poses. PoserNet exploits associations between the objectness regions - concisely expressed as bounding boxes - across multiple views to globally refine sparsely connected view graphs. We evaluate on the 7-Scenes dataset across varied sizes of graphs and show how this process can be beneficial to optimisation-based Motion Averaging algorithms improving the median error on the rotation by 62\(^{\circ }\) with respect to the initial estimates obtained based on bounding boxes. Code and data are available at github.com/IIT-PAVIS/PoserNet.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 870743.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Arrigoni, F., Fusiello, A., Ricci, E., Pajdla, T.: Viewing graph solvability via cycle consistency. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5540–5549, October 2021
Arrigoni, F., Rossi, B., Fusiello, A.: Spectral synchronization of multiple views in SE(3). SIAM J. Imag. Sci. 9(4), 1963–1990 (2016)
Bianco, S., Ciocca, G., Marelli, D.: Evaluating the performance of structure from motion pipelines. J. Imaging 4(8), 98 (2018)
Cai, R., Hariharan, B., Snavely, N., Averbuch-Elor, H.: Extreme rotation estimation using dense correlation volumes. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Chen, Y., Zhao, J., Kneip, L.: Hybrid rotation averaging: a fast and robust rotation averaging approach. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10353–10362 (2021)
Crocco, M., Rubino, C., Del Bue, A.: Structure from motion with objects. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4141–4149 (2016)
Fey, M., Lenssen, J.E.: Fast graph representation learning with PyTorch Geometric. In: ICLR Workshop on Representation Learning on Graphs and Manifolds (2019)
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Gaudilliere, V., Simon, G., Berger, M.: Camera relocalization with ellipsoidal abstraction of objects. In: 2019 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Los Alamitos, CA, USA, October 2019. IEEE Computer Society (2019)
Gaudillière, V., Simon, G., Berger, M.-O.: Camera pose estimation with semantic 3d model. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4569–4576 (2019)
Gaudillière, V., Simon, G., Berger, M.-O.: Perspective-2-ellipsoid: bridging the gap between object detections and 6-DoF camera pose. IEEE Robot. Autom. Lett. 5(4), 5189–5196 (2020)
Gay, P., Rubino, C., Bansal, V., Del Bue, A.: Probabilistic structure from motion with objects (PSfMO). In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3075–3084 (2017)
Glocker, B., Izadi, S., Shotton, J., Criminisi, A.: Real-time RGB-D camera relocalization. In: International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, October 2013
Hartley, R.I.: In defense of the eight-point algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 19(6), 580–593 (1997)
Kim, D., Lin, T.-Y., Angelova, A., Kweon, I.S., Kuo, W.: Learning open-world object proposals without learning to classify. IEEE Robot. Autom. Lett. (RA-L) 7, 5453–5460 (2022)
Lee, S.H., Civera, J.: Rotation-only bundle adjustment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021
Li, H., Hartley, R.: Five-point motion estimation made easy. In: Proceedings of the 18th International Conference on Pattern Recognition, ICPR 2006, USA, vol. 01, pp. 630–633. IEEE Computer Society (2006)
Li, Q., et al.: Relative geometry-aware Siamese neural network for 6DoF camera relocalization. Neurocomputing 426, 134–146 (2021)
McCormac, J., Clark, R., Bloesch, M., Davison, A.J., Leutenegger, S.: Fusion++: volumetric object-level slam (2018)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 405–421. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_24
Moran, D., Koslowsky, H., Kasten, Y., Maron H., Galun, M., Basri, R.: Deep permutation equivariant structure from motion (2021)
Moulon, P., Monasse, P., Perrot, R., Marlet, R.: OpenMVG: open multiple view geometry. In: Kerautret, B., Colom, M., Monasse, P. (eds.) RRPR 2016. LNCS, vol. 10214, pp. 60–74. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56414-2_5
Newcombe, R.A., et al.: KinectFusion: real-time dense surface mapping and tracking. In: 2011 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) (2011)
Nicholson, L., Milford, M., Sunderhauf, N.: QuadricSLAM: dual quadrics from object detections as landmarks in object-oriented slam. IEEE Robot. Autom. Lett. 4, 1–8 (2018)
Nistér, D.: An efficient solution to the five-point relative pose problem. IEEE Trans. Pattern Anal. Mach. Intell. 26(6), 756–770 (2004)
Purkait, P., Chin, T.-J., Reid, I.: NeuRoRA: neural robust rotation averaging. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12369, pp. 137–154. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58586-0_9
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (NIPS) (2015)
Rubino, C., Crocco, M., Del Bue, A.: 3d object localization from multi-view image detections. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 40, 1281–1294 (2017)
Salas-Moreno, R.F., Newcombe, R.A., Strasdat, H., Kelly, P.H.J., Davison, A.J.: Slam++: simultaneous localisation and mapping at the level of objects. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Sarlin, P.-E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: CVPR (2020)
Schönberger, J.L., Frahm., J.-M.: Structure-from-motion revisited. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Yu, P.S.: A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 32, 4–24 (2021)
Yan, S., Pen, Y., Lai, S., Liu, Y., Zhang, M.: Image retrieval for structure-from-motion via graph convolutional network. CoRR, abs/2009.08049 (2020)
Yang, L., Li, H., Rahim, J.A., Cui, Z., Tan, P.: End-to-end rotation averaging with multi-source propagation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11774–11783, June 2021
Yen-Chen, L., Florence, P., Barron, J.T., Rodriguez, A., Isola, P., Lin, T.-Y.: iNeRF: inverting neural radiance fields for pose estimation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2021)
Yew, Z.J., Lee, G.H.: Learning iterative robust transformation synchronization. In: International Conference on 3D Vision (3DV) (2021)
Zhou, J., et al.: Graph neural networks: a review of methods and applications. AI Open 1, 57–81 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Taiana, M., Toso, M., James, S., Del Bue, A. (2022). PoserNet: Refining Relative Camera Poses Exploiting Object Detections. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13693. Springer, Cham. https://doi.org/10.1007/978-3-031-19827-4_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-19827-4_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19826-7
Online ISBN: 978-3-031-19827-4
eBook Packages: Computer ScienceComputer Science (R0)