Skip to main content

MODE: Multi-view Omnidirectional Depth Estimation with 360\(^\circ \) Cameras

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13693))

Abstract

In this paper, we propose a two-stage omnidirectional depth estimation framework with multi-view 360\(^\circ \) cameras. The framework first estimates the depth maps from different camera pairs via omnidirectional stereo matching and then fuses the depth maps to achieve robustness against mud spots, water drops on camera lenses, and glare caused by intense light. We adopt spherical feature learning to address the distortion of panoramas. In addition, a synthetic 360\(^\circ \) dataset consisting of 12K road scene panoramas and 3K ground truth depth maps is presented to train and evaluate 360\(^\circ \) depth estimation algorithms. Our dataset takes soiled camera lenses and glare into consideration, which is more consistent with the real-world environment. Experimental results show that the proposed framework generates reliable results in both synthetic and real-world environments, and it achieves state-of-the-art performance on different datasets. The code and data are available at https://github.com/nju-ee/MODE-2022.

M. Li and X. Jin—Contributed equally to this work.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    We use the terms omnidirectional, 360\(^\circ \), spherical and panorama interchangeably in this document.

References

  1. Armeni, I., Sax, S., Zamir, A., Savarese, S.: Joint 2D–3D-semantic data for indoor scene understanding. https://doi.org/10.48550/arXiv.1702.01105 (2017)

  2. Cassini projection: Cassini projection – Wikipedia, the free encyclopedia (2022). https://en.wikipedia.org/wiki/Cassini_projection

  3. Chang, A., et al.: Matterport3d: Learning from RGB-D data in indoor environments. In: 2017 International Conference on 3D Vision (3DV), pp. 667–676 (2017). https://doi.org/10.1109/3DV.2017.00081

  4. Chang, J., Chen, Y.: Pyramid stereo matching network. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5410–5418 (2018). https://doi.org/10.1109/CVPR.2018.00567

  5. Chen, R., Han, S., Xu, J., Su, H.: Point-based multi-view stereo network. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1538–1547 (2019). https://doi.org/10.1109/ICCV.2019.00162

  6. Cheng, X., Wang, P., Zhou, Y., Guan, C., Yang, R.: Omnidirectional depth extension networks. In: 2020 IEEE International Conference on Robotics and Automation (ICRA). pp. 589–595 (2020). https://doi.org/10.1109/ICRA40945.2020.9197123

  7. Coors, B., Condurache, A.P., Geiger, A.: SphereNet: learning spherical representations for detection and classification in omnidirectional images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 525–541. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_32

    Chapter  Google Scholar 

  8. Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: CARLA: an open urban driving simulator. In: Proceedings of the 1st Annual Conference on Robot Learning, pp. 1–16 (2017)

    Google Scholar 

  9. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: 27th Proceedings of the Conference on Advances in Neural Information Processing Systems (2014)

    Google Scholar 

  10. Gu, X., Fan, Z., Zhu, S., Dai, Z., Tan, F., Tan, P.: Cascade cost volume for high-resolution multi-view stereo and stereo matching. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2492–2501 (2020). https://doi.org/10.1109/CVPR42600.2020.00257

  11. Handa, A., Pătrăucean, V., Stent, S., Cipolla, R.: SceneNet: an annotated model generator for indoor scene understanding. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 5737–5743. IEEE (2016)

    Google Scholar 

  12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  13. Jiang, C.M., Huang, J., Kashinath, K., Prabhat, M.P., Niessner, M.: Spherical CNNs on unstructured grids. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=Bkl-43C9FQ

  14. Jiang, H., Sheng, Z., Zhu, S., Dong, Z., Huang, R.: UniFuse: unidirectional fusion for 360\(^{\circ }\) panorama depth estimation. IEEE Rob. Autom. Lett. 6(2), 1519–1526 (2021). https://doi.org/10.1109/LRA.2021.3058957

    Article  Google Scholar 

  15. Kendall, A., et al.: End-to-end learning of geometry and context for deep stereo regression. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 66–75 (2017). https://doi.org/10.1109/ICCV.2017.17

  16. Payen de La Garanderie, G., Atapour Abarghouei, A., Breckon, T.P.: Eliminating the blind spot: adapting 3D object detection and monocular depth estimation to 360\(^\circ \) panoramic imagery. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 812–830. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_48

    Chapter  Google Scholar 

  17. Ladický, L., Shi, J., Pollefeys, M.: Pulling things out of perspective. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 89–96 (2014). https://doi.org/10.1109/CVPR.2014.19

  18. Li, M., Hu, X., Dai, J., Li, Y., Du, S.: Omnidirectional stereo depth estimation based on spherical deep network. Image Vis. Compu. 114, 104264 (2021).https://doi.org/10.1016/j.imavis.2021.104264, https://www.sciencedirect.com/science/article/pii/S0262885621001694

  19. Lipson, L., Teed, Z., Deng, J.: Raft-stereo: Multilevel recurrent field transforms for stereo matching. In: International Conference on 3D Vision (3DV) (2021)

    Google Scholar 

  20. Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4040–4048 (2016). https://doi.org/10.1109/CVPR.2016.438

  21. Menze, M., Heipke, C., Geiger, A.: Joint 3d estimation of vehicles and scene flow. In: ISPRS Workshop on Image Sequence Analysis (ISA) (2015)

    Google Scholar 

  22. Pang, J., Sun, W., Ren, J.S., Yang, C., Yan, Q.: Cascade residual learning: a two-stage convolutional neural network for stereo matching. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW). pp. 878–886 (2017), https://doi.org/10.1109/ICCVW.2017.108

  23. Poggi, M., et al.: On the confidence of stereo matching in a deep-learning era: a quantitative evaluation. IEEE Trans. Pattern Anal. Mach. Intell. pp. 1–1 (2021). https://doi.org/10.1109/TPAMI.2021.3069706

  24. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  25. Shen, Z., Dai, Y., Rao, Z.: CfNet: cascade and fused cost volume for robust stereo matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13906–13915, June 2021

    Google Scholar 

  26. Song, S., Yu, F., Zeng, A., Chang, A.X., Savva, M., Funkhouser, T.: Semantic scene completion from a single depth image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1746–1754 (2017)

    Google Scholar 

  27. Wang, F.-E., Hu, H.-N., Cheng, H.-T., Lin, J.-T., Yang, S.-T., Shih, M.-L., Chu, H.-K., Sun, M.: Self-supervised learning of depth and camera motion from 360\(^\circ \) Videos. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11365, pp. 53–68. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20873-8_4

    Chapter  Google Scholar 

  28. Wang, F.E., Yeh, Y.H., Sun, M., Chiu, W.C., Tsai, Y.H.: BiFuse: monocular 360 depth estimation via bi-projection fusion. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 459–468 (2020). https://doi.org/10.1109/CVPR42600.2020.00054

  29. Wang, N.H., Solarte, B., Tsai, Y.H., Chiu, W.C., Sun, M.: 360sd-net: 360\(^\circ \) stereo depth estimation with learnable cost volume. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 582–588 (2020). https://doi.org/10.1109/ICRA40945.2020.9196975

  30. Won, C., Ryu, J., Lim, J.: SweepNet: wide-baseline omnidirectional depth estimation. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 6073–6079 (2019). https://doi.org/10.1109/ICRA.2019.8793823

  31. Won, C., Ryu, J., Lim, J.: OmniMVS: end-to-end learning for omnidirectional stereo matching. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8987–8996 (2019)

    Google Scholar 

  32. Won, C., Ryu, J., Lim, J.: End-to-end learning for omnidirectional stereo matching with uncertainty prior. IEEE Trans. Pattern Anal. Mach. Intell. 43(11), 3850–3862 (2020)

    Google Scholar 

  33. Xu, H., Zhang, J.: AaNet: Adaptive aggregation network for efficient stereo matching. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1956–1965 (2020). https://doi.org/10.1109/CVPR42600.2020.00203

  34. Yang, J., Mao, W., Alvarez, J., Liu, M.: Cost volume pyramid based depth inference for multi-view stereo. IEEE Trans. Pattern Anal. Mach. Intell. 44, 4748–4760 (2021). https://doi.org/10.1109/TPAMI.2021.3082562

  35. Hu, Y.-T., Huang, J.-B., Schwing, A.G.: VideoMatch: matching based video object segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 56–73. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_4

    Chapter  Google Scholar 

  36. Žbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17(65), 1–32 (2016). http://jmlr.org/papers/v17/15-535.html

  37. Zhang, F., Prisacariu, V., Yang, R., Torr, P.H.: GA-Net: guided aggregation net for end-to-end stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 185–194 (2019)

    Google Scholar 

  38. Zioulis, N., Karakottas, A., Zarpalas, D., Alvarez, F., Daras, P.: Spherical view synthesis for self-supervised 360\(^{\circ }\) depth estimation. In: 2019 International Conference on 3D Vision (3DV), pp. 690–699 (2019). https://doi.org/10.1109/3DV.2019.00081

  39. Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: OmniDepth: dense depth estimation for indoors spherical panoramas. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 453–471. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_28

    Chapter  Google Scholar 

Download references

Acknowledgements

We acknowledge the computational resources provided by the High-Performance Computing Center of the Collaborative Innovation Center of Advanced Microstructures, Nanjing University, and Nanjing Institute of Advanced Artificial Intelligence.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Sidan Du or Yang Li .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4974 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, M., Jin, X., Hu, X., Dai, J., Du, S., Li, Y. (2022). MODE: Multi-view Omnidirectional Depth Estimation with 360\(^\circ \) Cameras. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13693. Springer, Cham. https://doi.org/10.1007/978-3-031-19827-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19827-4_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19826-7

  • Online ISBN: 978-3-031-19827-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics