PoserNet: Refining Relative Camera Poses Exploiting Object Detections

Taiana, Matteo; Toso, Matteo; James, Stuart; Del Bue, Alessio

doi:10.1007/978-3-031-19827-4_15

Matteo Taiana¹²,
Matteo Toso¹²,
Stuart James¹² &
…
Alessio Del Bue¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13693))

Included in the following conference series:

European Conference on Computer Vision

3814 Accesses
2 Citations

Abstract

The estimation of the camera poses associated with a set of images commonly relies on feature matches between the images. In contrast, we are the first to address this challenge by using objectness regions to guide the pose estimation problem rather than explicit semantic object detections. We propose Pose Refiner Network (PoserNet) a light-weight Graph Neural Network to refine the approximate pair-wise relative camera poses. PoserNet exploits associations between the objectness regions - concisely expressed as bounding boxes - across multiple views to globally refine sparsely connected view graphs. We evaluate on the 7-Scenes dataset across varied sizes of graphs and show how this process can be beneficial to optimisation-based Motion Averaging algorithms improving the median error on the rotation by 62$^{\circ }$ with respect to the initial estimates obtained based on bounding boxes. Code and data are available at github.com/IIT-PAVIS/PoserNet.

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 870743.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

GTCaR: Graph Transformer for Camera Re-localization

CosyPose: Consistent Multi-view Multi-object 6D Pose Estimation

Relative Pose from Deep Learned Depth and a Single Affine Correspondence

References

Arrigoni, F., Fusiello, A., Ricci, E., Pajdla, T.: Viewing graph solvability via cycle consistency. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5540–5549, October 2021
Google Scholar
Arrigoni, F., Rossi, B., Fusiello, A.: Spectral synchronization of multiple views in SE(3). SIAM J. Imag. Sci. 9(4), 1963–1990 (2016)
Article MathSciNet MATH Google Scholar
Bianco, S., Ciocca, G., Marelli, D.: Evaluating the performance of structure from motion pipelines. J. Imaging 4(8), 98 (2018)
Article Google Scholar
Cai, R., Hariharan, B., Snavely, N., Averbuch-Elor, H.: Extreme rotation estimation using dense correlation volumes. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Google Scholar
Chen, Y., Zhao, J., Kneip, L.: Hybrid rotation averaging: a fast and robust rotation averaging approach. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10353–10362 (2021)
Google Scholar
Crocco, M., Rubino, C., Del Bue, A.: Structure from motion with objects. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4141–4149 (2016)
Google Scholar
Fey, M., Lenssen, J.E.: Fast graph representation learning with PyTorch Geometric. In: ICLR Workshop on Representation Learning on Graphs and Manifolds (2019)
Google Scholar
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Article MathSciNet Google Scholar
Gaudilliere, V., Simon, G., Berger, M.: Camera relocalization with ellipsoidal abstraction of objects. In: 2019 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Los Alamitos, CA, USA, October 2019. IEEE Computer Society (2019)
Google Scholar
Gaudillière, V., Simon, G., Berger, M.-O.: Camera pose estimation with semantic 3d model. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4569–4576 (2019)
Google Scholar
Gaudillière, V., Simon, G., Berger, M.-O.: Perspective-2-ellipsoid: bridging the gap between object detections and 6-DoF camera pose. IEEE Robot. Autom. Lett. 5(4), 5189–5196 (2020)
Article Google Scholar
Gay, P., Rubino, C., Bansal, V., Del Bue, A.: Probabilistic structure from motion with objects (PSfMO). In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3075–3084 (2017)
Google Scholar
Glocker, B., Izadi, S., Shotton, J., Criminisi, A.: Real-time RGB-D camera relocalization. In: International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, October 2013
Google Scholar
Hartley, R.I.: In defense of the eight-point algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 19(6), 580–593 (1997)
Article Google Scholar
Kim, D., Lin, T.-Y., Angelova, A., Kweon, I.S., Kuo, W.: Learning open-world object proposals without learning to classify. IEEE Robot. Autom. Lett. (RA-L) 7, 5453–5460 (2022)
Google Scholar
Lee, S.H., Civera, J.: Rotation-only bundle adjustment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021
Google Scholar
Li, H., Hartley, R.: Five-point motion estimation made easy. In: Proceedings of the 18th International Conference on Pattern Recognition, ICPR 2006, USA, vol. 01, pp. 630–633. IEEE Computer Society (2006)
Google Scholar
Li, Q., et al.: Relative geometry-aware Siamese neural network for 6DoF camera relocalization. Neurocomputing 426, 134–146 (2021)
Article Google Scholar
McCormac, J., Clark, R., Bloesch, M., Davison, A.J., Leutenegger, S.: Fusion++: volumetric object-level slam (2018)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 405–421. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_24
Chapter Google Scholar
Moran, D., Koslowsky, H., Kasten, Y., Maron H., Galun, M., Basri, R.: Deep permutation equivariant structure from motion (2021)
Google Scholar
Moulon, P., Monasse, P., Perrot, R., Marlet, R.: OpenMVG: open multiple view geometry. In: Kerautret, B., Colom, M., Monasse, P. (eds.) RRPR 2016. LNCS, vol. 10214, pp. 60–74. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56414-2_5
Chapter Google Scholar
Newcombe, R.A., et al.: KinectFusion: real-time dense surface mapping and tracking. In: 2011 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) (2011)
Google Scholar
Nicholson, L., Milford, M., Sunderhauf, N.: QuadricSLAM: dual quadrics from object detections as landmarks in object-oriented slam. IEEE Robot. Autom. Lett. 4, 1–8 (2018)
Article Google Scholar
Nistér, D.: An efficient solution to the five-point relative pose problem. IEEE Trans. Pattern Anal. Mach. Intell. 26(6), 756–770 (2004)
Article Google Scholar
Purkait, P., Chin, T.-J., Reid, I.: NeuRoRA: neural robust rotation averaging. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12369, pp. 137–154. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58586-0_9
Chapter Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (NIPS) (2015)
Google Scholar
Rubino, C., Crocco, M., Del Bue, A.: 3d object localization from multi-view image detections. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 40, 1281–1294 (2017)
Google Scholar
Salas-Moreno, R.F., Newcombe, R.A., Strasdat, H., Kelly, P.H.J., Davison, A.J.: Slam++: simultaneous localisation and mapping at the level of objects. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Google Scholar
Sarlin, P.-E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: CVPR (2020)
Google Scholar
Schönberger, J.L., Frahm., J.-M.: Structure-from-motion revisited. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Yu, P.S.: A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 32, 4–24 (2021)
Article MathSciNet Google Scholar
Yan, S., Pen, Y., Lai, S., Liu, Y., Zhang, M.: Image retrieval for structure-from-motion via graph convolutional network. CoRR, abs/2009.08049 (2020)
Google Scholar
Yang, L., Li, H., Rahim, J.A., Cui, Z., Tan, P.: End-to-end rotation averaging with multi-source propagation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11774–11783, June 2021
Google Scholar
Yen-Chen, L., Florence, P., Barron, J.T., Rodriguez, A., Isola, P., Lin, T.-Y.: iNeRF: inverting neural radiance fields for pose estimation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2021)
Google Scholar
Yew, Z.J., Lee, G.H.: Learning iterative robust transformation synchronization. In: International Conference on 3D Vision (3DV) (2021)
Google Scholar
Zhou, J., et al.: Graph neural networks: a review of methods and applications. AI Open 1, 57–81 (2020)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Pattern Analysis and Computer Vision (PAVIS), Istituto Italiano di Tecnologia (IIT), Genoa, Italy
Matteo Taiana, Matteo Toso, Stuart James & Alessio Del Bue

Authors

Matteo Taiana
View author publications
You can also search for this author in PubMed Google Scholar
Matteo Toso
View author publications
You can also search for this author in PubMed Google Scholar
Stuart James
View author publications
You can also search for this author in PubMed Google Scholar
Alessio Del Bue
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matteo Taiana .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4700 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Taiana, M., Toso, M., James, S., Del Bue, A. (2022). PoserNet: Refining Relative Camera Poses Exploiting Object Detections. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13693. Springer, Cham. https://doi.org/10.1007/978-3-031-19827-4_15

Download citation

DOI: https://doi.org/10.1007/978-3-031-19827-4_15
Published: 02 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19826-7
Online ISBN: 978-3-031-19827-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

PoserNet: Refining Relative Camera Poses Exploiting Object Detections

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

GTCaR: Graph Transformer for Camera Re-localization

CosyPose: Consistent Multi-view Multi-object 6D Pose Estimation

Relative Pose from Deep Learned Depth and a Single Affine Correspondence

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 4700 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

PoserNet: Refining Relative Camera Poses Exploiting Object Detections

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

GTCaR: Graph Transformer for Camera Re-localization

CosyPose: Consistent Multi-view Multi-object 6D Pose Estimation

Relative Pose from Deep Learned Depth and a Single Affine Correspondence

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 4700 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation