Skip to main content

PoserNet: Refining Relative Camera Poses Exploiting Object Detections

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13693))

Included in the following conference series:

Abstract

The estimation of the camera poses associated with a set of images commonly relies on feature matches between the images. In contrast, we are the first to address this challenge by using objectness regions to guide the pose estimation problem rather than explicit semantic object detections. We propose Pose Refiner Network (PoserNet) a light-weight Graph Neural Network to refine the approximate pair-wise relative camera poses. PoserNet exploits associations between the objectness regions - concisely expressed as bounding boxes - across multiple views to globally refine sparsely connected view graphs. We evaluate on the 7-Scenes dataset across varied sizes of graphs and show how this process can be beneficial to optimisation-based Motion Averaging algorithms improving the median error on the rotation by 62\(^{\circ }\) with respect to the initial estimates obtained based on bounding boxes. Code and data are available at github.com/IIT-PAVIS/PoserNet.

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 870743.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Arrigoni, F., Fusiello, A., Ricci, E., Pajdla, T.: Viewing graph solvability via cycle consistency. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5540–5549, October 2021

    Google Scholar 

  2. Arrigoni, F., Rossi, B., Fusiello, A.: Spectral synchronization of multiple views in SE(3). SIAM J. Imag. Sci. 9(4), 1963–1990 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bianco, S., Ciocca, G., Marelli, D.: Evaluating the performance of structure from motion pipelines. J. Imaging 4(8), 98 (2018)

    Article  Google Scholar 

  4. Cai, R., Hariharan, B., Snavely, N., Averbuch-Elor, H.: Extreme rotation estimation using dense correlation volumes. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

    Google Scholar 

  5. Chen, Y., Zhao, J., Kneip, L.: Hybrid rotation averaging: a fast and robust rotation averaging approach. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10353–10362 (2021)

    Google Scholar 

  6. Crocco, M., Rubino, C., Del Bue, A.: Structure from motion with objects. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4141–4149 (2016)

    Google Scholar 

  7. Fey, M., Lenssen, J.E.: Fast graph representation learning with PyTorch Geometric. In: ICLR Workshop on Representation Learning on Graphs and Manifolds (2019)

    Google Scholar 

  8. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)

    Article  MathSciNet  Google Scholar 

  9. Gaudilliere, V., Simon, G., Berger, M.: Camera relocalization with ellipsoidal abstraction of objects. In: 2019 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Los Alamitos, CA, USA, October 2019. IEEE Computer Society (2019)

    Google Scholar 

  10. Gaudillière, V., Simon, G., Berger, M.-O.: Camera pose estimation with semantic 3d model. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4569–4576 (2019)

    Google Scholar 

  11. Gaudillière, V., Simon, G., Berger, M.-O.: Perspective-2-ellipsoid: bridging the gap between object detections and 6-DoF camera pose. IEEE Robot. Autom. Lett. 5(4), 5189–5196 (2020)

    Article  Google Scholar 

  12. Gay, P., Rubino, C., Bansal, V., Del Bue, A.: Probabilistic structure from motion with objects (PSfMO). In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3075–3084 (2017)

    Google Scholar 

  13. Glocker, B., Izadi, S., Shotton, J., Criminisi, A.: Real-time RGB-D camera relocalization. In: International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, October 2013

    Google Scholar 

  14. Hartley, R.I.: In defense of the eight-point algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 19(6), 580–593 (1997)

    Article  Google Scholar 

  15. Kim, D., Lin, T.-Y., Angelova, A., Kweon, I.S., Kuo, W.: Learning open-world object proposals without learning to classify. IEEE Robot. Autom. Lett. (RA-L) 7, 5453–5460 (2022)

    Google Scholar 

  16. Lee, S.H., Civera, J.: Rotation-only bundle adjustment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021

    Google Scholar 

  17. Li, H., Hartley, R.: Five-point motion estimation made easy. In: Proceedings of the 18th International Conference on Pattern Recognition, ICPR 2006, USA, vol. 01, pp. 630–633. IEEE Computer Society (2006)

    Google Scholar 

  18. Li, Q., et al.: Relative geometry-aware Siamese neural network for 6DoF camera relocalization. Neurocomputing 426, 134–146 (2021)

    Article  Google Scholar 

  19. McCormac, J., Clark, R., Bloesch, M., Davison, A.J., Leutenegger, S.: Fusion++: volumetric object-level slam (2018)

    Google Scholar 

  20. Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 405–421. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_24

    Chapter  Google Scholar 

  21. Moran, D., Koslowsky, H., Kasten, Y., Maron H., Galun, M., Basri, R.: Deep permutation equivariant structure from motion (2021)

    Google Scholar 

  22. Moulon, P., Monasse, P., Perrot, R., Marlet, R.: OpenMVG: open multiple view geometry. In: Kerautret, B., Colom, M., Monasse, P. (eds.) RRPR 2016. LNCS, vol. 10214, pp. 60–74. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56414-2_5

    Chapter  Google Scholar 

  23. Newcombe, R.A., et al.: KinectFusion: real-time dense surface mapping and tracking. In: 2011 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) (2011)

    Google Scholar 

  24. Nicholson, L., Milford, M., Sunderhauf, N.: QuadricSLAM: dual quadrics from object detections as landmarks in object-oriented slam. IEEE Robot. Autom. Lett. 4, 1–8 (2018)

    Article  Google Scholar 

  25. Nistér, D.: An efficient solution to the five-point relative pose problem. IEEE Trans. Pattern Anal. Mach. Intell. 26(6), 756–770 (2004)

    Article  Google Scholar 

  26. Purkait, P., Chin, T.-J., Reid, I.: NeuRoRA: neural robust rotation averaging. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12369, pp. 137–154. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58586-0_9

    Chapter  Google Scholar 

  27. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (NIPS) (2015)

    Google Scholar 

  28. Rubino, C., Crocco, M., Del Bue, A.: 3d object localization from multi-view image detections. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 40, 1281–1294 (2017)

    Google Scholar 

  29. Salas-Moreno, R.F., Newcombe, R.A., Strasdat, H., Kelly, P.H.J., Davison, A.J.: Slam++: simultaneous localisation and mapping at the level of objects. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)

    Google Scholar 

  30. Sarlin, P.-E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: CVPR (2020)

    Google Scholar 

  31. Schönberger, J.L., Frahm., J.-M.: Structure-from-motion revisited. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  32. Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Yu, P.S.: A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 32, 4–24 (2021)

    Article  MathSciNet  Google Scholar 

  33. Yan, S., Pen, Y., Lai, S., Liu, Y., Zhang, M.: Image retrieval for structure-from-motion via graph convolutional network. CoRR, abs/2009.08049 (2020)

    Google Scholar 

  34. Yang, L., Li, H., Rahim, J.A., Cui, Z., Tan, P.: End-to-end rotation averaging with multi-source propagation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11774–11783, June 2021

    Google Scholar 

  35. Yen-Chen, L., Florence, P., Barron, J.T., Rodriguez, A., Isola, P., Lin, T.-Y.: iNeRF: inverting neural radiance fields for pose estimation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2021)

    Google Scholar 

  36. Yew, Z.J., Lee, G.H.: Learning iterative robust transformation synchronization. In: International Conference on 3D Vision (3DV) (2021)

    Google Scholar 

  37. Zhou, J., et al.: Graph neural networks: a review of methods and applications. AI Open 1, 57–81 (2020)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matteo Taiana .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4700 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Taiana, M., Toso, M., James, S., Del Bue, A. (2022). PoserNet: Refining Relative Camera Poses Exploiting Object Detections. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13693. Springer, Cham. https://doi.org/10.1007/978-3-031-19827-4_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19827-4_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19826-7

  • Online ISBN: 978-3-031-19827-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics