Skip to main content

360\(^{\circ }\) Camera Alignment via Segmentation

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12373))

Included in the following conference series:

Abstract

Panoramic 360\(^{\circ }\) images taken under unconstrained conditions present a significant challenge to current state-of-the-art recognition pipelines, since the assumption of a mostly upright camera is no longer valid. In this work, we investigate how to solve this problem by fusing purely geometric cues, such as apparent vanishing points, with learned semantic cues, such as the expectation that some visual elements (e.g. doors) have a natural upright position. We train a deep neural network to leverage these cues to segment the image-space endpoints of an imagined “vertical axis”, which is orthogonal to the ground plane of a scene, thus levelling the camera. We show that our segmentation-based strategy significantly increases performance, reducing errors by half, compared to the current state-of-the-art on two datasets of 360\(^{\circ }\) imagery. We also demonstrate the importance of 360\(^{\circ }\) camera levelling by analysing its impact on downstream tasks, finding that incorrect levelling severely degrades the performance of real-world computer vision pipelines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. arXiv preprint arXiv:1903.11027 (2019)

  2. Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-8(6), 679–698 (1986). https://doi.org/10.1109/TPAMI.1986.4767851

  3. Coors, B., Condurache, A.P., Geiger, A.: SphereNet: learning spherical representations for detection and classification in omnidirectional images. In: The European Conference on Computer Vision (ECCV), September 2018

    Google Scholar 

  4. Coughlan, J.M., Yuille, A.L.: The manhattan world assumption: regularities in scene statistics which enable Bayesian inference. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) Advances in Neural Information Processing Systems 13, pp. 845–851. MIT Press (2001). http://papers.nips.cc/paper/1804-the-manhattan-world-assumption-regularities-in-scene-statistics-which-enable-bayesian-inference.pdf

  5. Davidson, B., et al.: Automatic cone photoreceptor localisation in healthy and stargardt afflicted retinas using deep learning. Sci. Rep. 8(1), 7911 (2018). https://doi.org/10.1038/s41598-018-26350-3

  6. Doersch, C., Gupta, A., Efros, A.A.: Unsupervised visual representation learning by context prediction. In: The IEEE International Conference on Computer Vision (ICCV), December 2015

    Google Scholar 

  7. Duda, R.O., Hart, P.E.: Use of the hough transformation to detect lines and curves in pictures. Commun. ACM 15(1), 11–15 (1972). https://doi.org/10.1145/361237.361242

    Article  MATH  Google Scholar 

  8. Fernandez-Labrador, C., Fácil, J.M., Pérez-Yus, A., Demonceaux, C., Guerrero, J.J.: PanoRoom: from the sphere to the 3D layout. CoRR abs/1808.09879 (2018). http://arxiv.org/abs/1808.09879

  9. Galamhos, C., Matas, J., Kittler, J.: Progressive probabilistic Hough transform for line detection. In: Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149), vol. 1, pp. 554–560, June 1999. https://doi.org/10.1109/CVPR.1999.786993

  10. Gallagher, A.C.: Using vanishing points to correct camera rotation in images. In: The 2nd Canadian Conference on Computer and Robot Vision. (CRV 2005), pp. 460–467, May 2005. https://doi.org/10.1109/CRV.2005.84

  11. Gidaris, S., Singh, P., Komodakis, N.: Unsupervised representation learning by predicting image rotations. In: ICLR 2018 (2018)

    Google Scholar 

  12. Google: Google Street View product page (2007). https://www.google.com/streetview/. Accessed Mar 2020

  13. Guerrero-Viu, J., Fernandez-Labrador, C., Demonceaux, C., Guerrero, J.J.: What’s in my Room? Object recognition on indoor panoramic images. arXiv e-prints arXiv:1910.06138, October 2019

  14. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2003)

    MATH  Google Scholar 

  15. Henriques, J.F., Vedaldi, A.: Warped convolutions: efficient invariance to spatial transformations. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 1461–1469. JMLR. org (2017)

    Google Scholar 

  16. Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269, July 2017. https://doi.org/10.1109/CVPR.2017.243

  17. Jeon, J., Jung, J., Lee, S.: Deep upright adjustment of 360 panoramas using multiple roll estimations. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds.) Computer Vision - ACCV 2018, pp. 199–214. Springer International Publishing, Cham (2019)

    Chapter  Google Scholar 

  18. Jung, R., Lee, A.S.J., Ashtari, A., Bazin, J.: Deep360Up: a deep learning-based approach for automatic VR image upright adjustment. In: 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), pp. 1–8, March 2019. https://doi.org/10.1109/VR.2019.8798326

  19. Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DOF camera relocalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2938–2946 (2015)

    Google Scholar 

  20. Lee, M., Fowlkes, C.C.: CeMNet: self-supervised learning for accurate continuous ego-motion estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2019)

    Google Scholar 

  21. Lezama, J., von Gioi, R.G., Randall, G., Morel, J.: Finding vanishing points via point alignments in image primal and dual domains. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 509–515, June 2014. https://doi.org/10.1109/CVPR.2014.72

  22. Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  23. Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving Jigsaw puzzles. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 69–84. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_5

    Chapter  Google Scholar 

  24. O’Sullivan, B., Alam, F., Matava, C.: Creating low-cost 360-degree virtual reality videos for hospitals: a technical paper on the dos and don’ts. J. Med. Internet Res. 20(7), e239–e239 (2018).https://doi.org/10.2196/jmir.9596, https://www.ncbi.nlm.nih.gov/pubmed/30012545, 30012545[pmid]

  25. Schindler, G., Dellaert, F.: Atlanta world: an expectation maximization framework for simultaneous low-level edge grouping and camera calibration in complex man-made environments. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. (CVPR 2004). vol. 1, p. I, June 2004. https://doi.org/10.1109/CVPR.2004.1315033

  26. Sellers, G., Wright, R.S., Haemel, N.: OpenGL Superbible: Comprehensive Tutorial and Reference, 7th edn. Addison-Wesley Professional, Boston (2015)

    Google Scholar 

  27. Shan, Y., Li, S.: Discrete spherical image representation for CNN-based inclination estimation. IEEE Access 8, 2008–2022 (2020). https://doi.org/10.1109/ACCESS.2019.2962133

    Article  Google Scholar 

  28. Sweeney, C., Flynn, J., Nuernberger, B., Turk, M., Höllerer, T.: Efficient computation of absolute pose for gravity-aware augmented reality. In: 2015 IEEE International Symposium on Mixed and Augmented Reality. pp. 19–24, September 2015. https://doi.org/10.1109/ISMAR.2015.20

  29. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, June 2016. https://doi.org/10.1109/cvpr.2016.308, https://doi.org/10.1109%2Fcvpr.2016.308

  30. Takikawa, T., Acuna, D., Jampani, V., Fidler, S.: Gated-SCNN: Gated shape CNNs for semantic segmentation. In: The IEEE International Conference on Computer Vision (ICCV), October 2019

    Google Scholar 

  31. Tateno, K., Navab, N., Tombari, F.: Distortion-aware convolutional filters for dense prediction in panoramic images. In: The European Conference on Computer Vision (ECCV), September 2018

    Google Scholar 

  32. Trombka, J.I., et al.: Crime scene investigations using portable, non-destructive space exploration technology. Forensic Sci. Int. 129(1), 1–9 (2002).https://doi.org/10.1016/S0379-0738(02)00079-8,http://www.sciencedirect.com/science/article/pii/S0379073802000798

  33. Wallraven, C., Schwaninger, A., Schuhmacher, S., Bülthoff, H.: View-based recognition of faces in man and machine: re-visiting inter-extra-ortho, vol. 2525, pp. 651–660, November 2002. https://doi.org/10.1007/3-540-36181-2_65

  34. Weiler, M., Hamprecht, F.A., Storath, M.: Learning steerable filters for rotation equivariant CNNs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 849–858 (2018)

    Google Scholar 

  35. Xiao, J., Ehinger, K.A., Oliva, A., Torralba, A.: Recognizing scene viewpoint using panoramic place representation. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2695–2702, June 2012. https://doi.org/10.1109/CVPR.2012.6247991

  36. Xiao, J., Ehinger, K., Oliva, A., Antonio, T.: Recognizing scene viewpoint using panoramic place representation. In: Proceedings of 25th IEEE Conference on Computer Vision and Pattern Recognition (2012)

    Google Scholar 

  37. Xu, J., Stenger, B., Kerola, T., Tung, T.: Pano2CAD: room layout from a single panorama image. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 354–362, March 2017. https://doi.org/10.1109/WACV.2017.46

  38. Zhang, Y., Song, S., Tan, P., Xiao, J.: PanoContext: a whole-room 3D context model for panoramic scene understanding. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision - ECCV 2014, pp. 668–686. Springer International Publishing, Cham (2014)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benjamin Davidson .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 130 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Davidson, B., Alvi, M.S., Henriques, J.F. (2020). 360\(^{\circ }\) Camera Alignment via Segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12373. Springer, Cham. https://doi.org/10.1007/978-3-030-58604-1_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58604-1_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58603-4

  • Online ISBN: 978-3-030-58604-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics