MODE: Multi-view Omnidirectional Depth Estimation with 360 $$^\circ $$  Cameras

Li, Ming; Jin, Xueqian; Hu, Xuejiao; Dai, Jingzhao; Du, Sidan; Li, Yang

doi:10.1007/978-3-031-19827-4_12

MODE: Multi-view Omnidirectional Depth Estimation with 360$^\circ $ Cameras

Conference paper
First Online: 02 November 2022

3044 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13693))

Abstract

In this paper, we propose a two-stage omnidirectional depth estimation framework with multi-view 360$^\circ $ cameras. The framework first estimates the depth maps from different camera pairs via omnidirectional stereo matching and then fuses the depth maps to achieve robustness against mud spots, water drops on camera lenses, and glare caused by intense light. We adopt spherical feature learning to address the distortion of panoramas. In addition, a synthetic 360$^\circ $ dataset consisting of 12K road scene panoramas and 3K ground truth depth maps is presented to train and evaluate 360$^\circ $ depth estimation algorithms. Our dataset takes soiled camera lenses and glare into consideration, which is more consistent with the real-world environment. Experimental results show that the proposed framework generates reliable results in both synthetic and real-world environments, and it achieves state-of-the-art performance on different datasets. The code and data are available at https://github.com/nju-ee/MODE-2022.

M. Li and X. Jin—Contributed equally to this work.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
We use the terms omnidirectional, 360$^\circ $, spherical and panorama interchangeably in this document.

References

Armeni, I., Sax, S., Zamir, A., Savarese, S.: Joint 2D–3D-semantic data for indoor scene understanding. https://doi.org/10.48550/arXiv.1702.01105 (2017)
Cassini projection: Cassini projection – Wikipedia, the free encyclopedia (2022). https://en.wikipedia.org/wiki/Cassini_projection
Chang, A., et al.: Matterport3d: Learning from RGB-D data in indoor environments. In: 2017 International Conference on 3D Vision (3DV), pp. 667–676 (2017). https://doi.org/10.1109/3DV.2017.00081
Chang, J., Chen, Y.: Pyramid stereo matching network. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5410–5418 (2018). https://doi.org/10.1109/CVPR.2018.00567
Chen, R., Han, S., Xu, J., Su, H.: Point-based multi-view stereo network. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1538–1547 (2019). https://doi.org/10.1109/ICCV.2019.00162
Cheng, X., Wang, P., Zhou, Y., Guan, C., Yang, R.: Omnidirectional depth extension networks. In: 2020 IEEE International Conference on Robotics and Automation (ICRA). pp. 589–595 (2020). https://doi.org/10.1109/ICRA40945.2020.9197123
Coors, B., Condurache, A.P., Geiger, A.: SphereNet: learning spherical representations for detection and classification in omnidirectional images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 525–541. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_32
Chapter Google Scholar
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: CARLA: an open urban driving simulator. In: Proceedings of the 1st Annual Conference on Robot Learning, pp. 1–16 (2017)
Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: 27th Proceedings of the Conference on Advances in Neural Information Processing Systems (2014)
Google Scholar
Gu, X., Fan, Z., Zhu, S., Dai, Z., Tan, F., Tan, P.: Cascade cost volume for high-resolution multi-view stereo and stereo matching. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2492–2501 (2020). https://doi.org/10.1109/CVPR42600.2020.00257
Handa, A., Pătrăucean, V., Stent, S., Cipolla, R.: SceneNet: an annotated model generator for indoor scene understanding. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 5737–5743. IEEE (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Jiang, C.M., Huang, J., Kashinath, K., Prabhat, M.P., Niessner, M.: Spherical CNNs on unstructured grids. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=Bkl-43C9FQ
Jiang, H., Sheng, Z., Zhu, S., Dong, Z., Huang, R.: UniFuse: unidirectional fusion for 360$^{\circ }$ panorama depth estimation. IEEE Rob. Autom. Lett. 6(2), 1519–1526 (2021). https://doi.org/10.1109/LRA.2021.3058957
Article Google Scholar
Kendall, A., et al.: End-to-end learning of geometry and context for deep stereo regression. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 66–75 (2017). https://doi.org/10.1109/ICCV.2017.17
Payen de La Garanderie, G., Atapour Abarghouei, A., Breckon, T.P.: Eliminating the blind spot: adapting 3D object detection and monocular depth estimation to 360$^\circ $ panoramic imagery. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 812–830. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_48
Chapter Google Scholar
Ladický, L., Shi, J., Pollefeys, M.: Pulling things out of perspective. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 89–96 (2014). https://doi.org/10.1109/CVPR.2014.19
Li, M., Hu, X., Dai, J., Li, Y., Du, S.: Omnidirectional stereo depth estimation based on spherical deep network. Image Vis. Compu. 114, 104264 (2021).https://doi.org/10.1016/j.imavis.2021.104264, https://www.sciencedirect.com/science/article/pii/S0262885621001694
Lipson, L., Teed, Z., Deng, J.: Raft-stereo: Multilevel recurrent field transforms for stereo matching. In: International Conference on 3D Vision (3DV) (2021)
Google Scholar
Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4040–4048 (2016). https://doi.org/10.1109/CVPR.2016.438
Menze, M., Heipke, C., Geiger, A.: Joint 3d estimation of vehicles and scene flow. In: ISPRS Workshop on Image Sequence Analysis (ISA) (2015)
Google Scholar
Pang, J., Sun, W., Ren, J.S., Yang, C., Yan, Q.: Cascade residual learning: a two-stage convolutional neural network for stereo matching. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW). pp. 878–886 (2017), https://doi.org/10.1109/ICCVW.2017.108
Poggi, M., et al.: On the confidence of stereo matching in a deep-learning era: a quantitative evaluation. IEEE Trans. Pattern Anal. Mach. Intell. pp. 1–1 (2021). https://doi.org/10.1109/TPAMI.2021.3069706
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Shen, Z., Dai, Y., Rao, Z.: CfNet: cascade and fused cost volume for robust stereo matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13906–13915, June 2021
Google Scholar
Song, S., Yu, F., Zeng, A., Chang, A.X., Savva, M., Funkhouser, T.: Semantic scene completion from a single depth image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1746–1754 (2017)
Google Scholar
Wang, F.-E., Hu, H.-N., Cheng, H.-T., Lin, J.-T., Yang, S.-T., Shih, M.-L., Chu, H.-K., Sun, M.: Self-supervised learning of depth and camera motion from 360$^\circ $ Videos. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11365, pp. 53–68. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20873-8_4
Chapter Google Scholar
Wang, F.E., Yeh, Y.H., Sun, M., Chiu, W.C., Tsai, Y.H.: BiFuse: monocular 360 depth estimation via bi-projection fusion. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 459–468 (2020). https://doi.org/10.1109/CVPR42600.2020.00054
Wang, N.H., Solarte, B., Tsai, Y.H., Chiu, W.C., Sun, M.: 360sd-net: 360$^\circ $ stereo depth estimation with learnable cost volume. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 582–588 (2020). https://doi.org/10.1109/ICRA40945.2020.9196975
Won, C., Ryu, J., Lim, J.: SweepNet: wide-baseline omnidirectional depth estimation. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 6073–6079 (2019). https://doi.org/10.1109/ICRA.2019.8793823
Won, C., Ryu, J., Lim, J.: OmniMVS: end-to-end learning for omnidirectional stereo matching. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8987–8996 (2019)
Google Scholar
Won, C., Ryu, J., Lim, J.: End-to-end learning for omnidirectional stereo matching with uncertainty prior. IEEE Trans. Pattern Anal. Mach. Intell. 43(11), 3850–3862 (2020)
Google Scholar
Xu, H., Zhang, J.: AaNet: Adaptive aggregation network for efficient stereo matching. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1956–1965 (2020). https://doi.org/10.1109/CVPR42600.2020.00203
Yang, J., Mao, W., Alvarez, J., Liu, M.: Cost volume pyramid based depth inference for multi-view stereo. IEEE Trans. Pattern Anal. Mach. Intell. 44, 4748–4760 (2021). https://doi.org/10.1109/TPAMI.2021.3082562
Hu, Y.-T., Huang, J.-B., Schwing, A.G.: VideoMatch: matching based video object segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 56–73. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_4
Chapter Google Scholar
Žbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17(65), 1–32 (2016). http://jmlr.org/papers/v17/15-535.html
Zhang, F., Prisacariu, V., Yang, R., Torr, P.H.: GA-Net: guided aggregation net for end-to-end stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 185–194 (2019)
Google Scholar
Zioulis, N., Karakottas, A., Zarpalas, D., Alvarez, F., Daras, P.: Spherical view synthesis for self-supervised 360$^{\circ }$ depth estimation. In: 2019 International Conference on 3D Vision (3DV), pp. 690–699 (2019). https://doi.org/10.1109/3DV.2019.00081
Zioulis, N., Karakottas, A., Zarpalas, D., Daras, P.: OmniDepth: dense depth estimation for indoors spherical panoramas. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 453–471. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_28
Chapter Google Scholar

Download references

Acknowledgements

We acknowledge the computational resources provided by the High-Performance Computing Center of the Collaborative Innovation Center of Advanced Microstructures, Nanjing University, and Nanjing Institute of Advanced Artificial Intelligence.

Author information

Authors and Affiliations

Nanjing University, Nanjing, China
Ming Li, Xueqian Jin, Xuejiao Hu, Jingzhao Dai, Sidan Du & Yang Li

Authors

Ming Li
View author publications
You can also search for this author in PubMed Google Scholar
Xueqian Jin
View author publications
You can also search for this author in PubMed Google Scholar
Xuejiao Hu
View author publications
You can also search for this author in PubMed Google Scholar
Jingzhao Dai
View author publications
You can also search for this author in PubMed Google Scholar
Sidan Du
View author publications
You can also search for this author in PubMed Google Scholar
Yang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Sidan Du or Yang Li .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4974 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, M., Jin, X., Hu, X., Dai, J., Du, S., Li, Y. (2022). MODE: Multi-view Omnidirectional Depth Estimation with 360$^\circ $ Cameras. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13693. Springer, Cham. https://doi.org/10.1007/978-3-031-19827-4_12

Download citation

DOI: https://doi.org/10.1007/978-3-031-19827-4_12
Published: 02 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19826-7
Online ISBN: 978-3-031-19827-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MODE: Multi-view Omnidirectional Depth Estimation with 360\(^\circ \) Cameras

Abstract

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 4974 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Abstract

Buying options

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 4974 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation