Pixel-Pair Occlusion Relationship Map (P2ORM): Formulation, Inference and Application

Qiu, Xuchong; Xiao, Yang; Wang, Chaohui; Marlet, Renaud

doi:10.1007/978-3-030-58548-8_40

Xuchong Qiu¹²,
Yang Xiao¹²,
Chaohui Wang¹² &
…
Renaud Marlet^12,13

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12349))

Included in the following conference series:

European Conference on Computer Vision

5681 Accesses
3 Citations

Abstract

We formalize concepts around geometric occlusion in 2D images (i.e., ignoring semantics), and propose a novel unified formulation of both occlusion boundaries and occlusion orientations via a pixel-pair occlusion relation. The former provides a way to generate large-scale accurate occlusion datasets while, based on the latter, we propose a novel method for task-independent pixel-level occlusion relationship estimation from single images. Experiments on a variety of datasets demonstrate that our method outperforms existing ones on this task. To further illustrate the value of our formulation, we also propose a new depth map refinement method that consistently improve the performance of state-of-the-art monocular depth estimation methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Monocular relative depth reordering by propagating confidence of local and global cues

Article 17 December 2017

From Occlusion to Global Depth Order, a Monocular Approach

Light field depth estimation using occlusion-aware consistency analysis

Article 04 August 2023

Notes

1.
As DOOBNet and OFNet are coded in Caffe, in order to have an unified platform for experimenting them on new datasets, we carefully re-implemented them in PyTorch (following the Caffe code). We could not reproduce exactly the same quantitative values provided in the original papers (ODS and OIS metrics are a bit less while AP is a bit better), probably due to some intrinsic differences between frameworks Caffe and PyTorch, however, the difference is very small (less than 0.03, cf. Table 2).

References

Acuna, D., Kar, A., Fidler, S.: Devil is in the edges: learning semantic boundaries from noisy annotations. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11075–11083 (2019)
Google Scholar
Apostoloff, N., Fitzgibbon, A.: Learning spatiotemporal t-junctions for occlusion detection. In: Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 553–559. IEEE (2005)
Google Scholar
Barron, J.T., Poole, B.: The fast bilateral solver. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 617–632. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_38
Chapter Google Scholar
Boulch, A., Marlet, R.: Fast and robust normal estimation for point clouds with sharp features. Comput. Graph. Forum (CGF) 31(5), 1765–1774 (2012)
Article Google Scholar
Cooper, M.C.: Interpreting line drawings of curved objects with tangential edges and surfaces. Image Vis. Comput. 15(4), 263–276 (1997)
Article Google Scholar
Dollár, P., Zitnick, C.L.: Fast edge detection using structured forests. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 37(8), 1558–1570 (2014)
Article Google Scholar
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2650–2658 (2015)
Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems (NeurIPS), pp. 2366–2374. Curran Associates, Inc. (2014)
Google Scholar
Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2002–2011 (2018)
Google Scholar
Fu, H., Wang, C., Tao, D., Black, M.J.: Occlusion boundary detection via deep exploration of context. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 241–250 (2016)
Google Scholar
Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 270–279 (2017)
Google Scholar
He, K., Sun, J., Tang, X.: Guided image filtering. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 1–14. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15549-9_1
Chapter Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Google Scholar
He, X., Yuille, A.: Occlusion boundary detection using pseudo-depth. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 539–552. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_39
Chapter Google Scholar
Heise, P., Klose, S., Jensen, B., Knoll, A.: PM-Huber: patchmatch with Huber regularization for stereo matching. In: International Conference on Computer Vision (ICCV), pp. 2360–2367 (2013)
Google Scholar
Heo, M., Lee, J., Kim, K.-R., Kim, H.-U., Kim, C.-S.: Monocular depth estimation using whole strip masking and reliability-based refinement. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 39–55. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_3
Chapter Google Scholar
Hoiem, D., Efros, A.A., Hebert, M.: Recovering occlusion boundaries from an image. Int. J. Comput. Vis. (IJCV) 91, 328–346 (2010)
Article MathSciNet Google Scholar
Hong, Z., Chen, Z., Wang, C., Mei, X., Prokhorov, D., Tao, D.: Multi-store tracker (muster): a cognitive psychology inspired approach to object tracking. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 749–758 (2015)
Google Scholar
Ilg, E., Saikia, T., Keuper, M., Brox, T.: Occlusions, motion and depth boundaries with a generic network for disparity, optical flow or scene flow estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11216, pp. 626–643. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01258-8_38
Chapter Google Scholar
Jiao, J., Cao, Y., Song, Y., Lau, R.: Look deeper into depth: monocular depth estimation with semantic booster and attention-driven loss. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 55–71. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_4
Chapter Google Scholar
Koch, T., Liebel, L., Fraundorfer, F., Körner, M.: Evaluation of CNN-based single-image depth estimation methods. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11131, pp. 331–348. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11015-4_25
Chapter Google Scholar
Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: International Conference on 3D Vision (3DV), pp. 239–248. IEEE (2016)
Google Scholar
Lee, J.H., Kim, C.S.: Monocular depth estimation using relative depth maps. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9729–9738 (2019)
Google Scholar
Leichter, I., Lindenbaum, M.: Boundary ownership by lifting to 2.1-D. In: International Conference on Computer Vision (ICCV), pp. 9–16. IEEE (2008)
Google Scholar
Li, J.Y., Klein, R., Yao, A.: A two-streamed network for estimating fine-scaled depth maps from single RGB images. In: International Conference on Computer Vision (ICCV), pp. 3392–3400 (2016)
Google Scholar
Li, W., et al.: InteriorNet: mega-scale multi-sensor photo-realistic indoor scenes dataset. In: British Machine Vision Conference (BMVC) (2018)
Google Scholar
Liu, C., Yang, J., Ceylan, D., Yumer, E., Furukawa, Y.: PlaneNet: piece-wise planar reconstruction from a single RGB image. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2579–2588 (2018)
Google Scholar
Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Liu, Y., Cheng, M.M., Fan, D.P., Zhang, L., Bian, J., Tao, D.: Semantic edge detection with diverse deep supervision. arXiv preprint arXiv:1804.02864 (2018)
Lu, R., Xue, F., Zhou, M., Ming, A., Zhou, Y.: Occlusion-shared and feature-separated network for occlusion relationship reasoning. In: International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Martin, D.R., Fowlkes, C.C., Malik, J.: Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 26(5), 530–549 (2004)
Article Google Scholar
Massa, F., Marlet, R., Aubry, M.: Crafting a multi-task CNN for viewpoint estimation. In: British Machine Vision Conference (BMVC) (2016)
Google Scholar
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54
Chapter Google Scholar
Nitzberg, M., Mumford, D.B.: The 2.1-D Sketch. IEEE Computer Society Press (1990)
Google Scholar
Oberweger, M., Rad, M., Lepetit, V.: Making deep heatmaps robust to partial occlusions for 3D object pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 125–141. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_8
Chapter Google Scholar
Peng, S., Liu, Y., Huang, Q., Zhou, X., Bao, H.: PVNet: pixel-wise voting network for 6DoF pose estimation. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4561–4570 (2019)
Google Scholar
Rad, M., Lepetit, V.: BB8: a scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: International Conference on Computer Vision (ICCV), pp. 3828–3836 (2017)
Google Scholar
Rafi, U., Gall, J., Leibe, B.: A semantic occlusion model for human pose estimation from a single depth image. In: Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops), pp. 67–74 (2015)
Google Scholar
Ramamonjisoa, M., Du, Y., Lepetit, V.: Predicting sharp and accurate occlusion boundaries in monocular depth estimation using displacement fields. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14648–14657 (2020)
Google Scholar
Ramamonjisoa, M., Lepetit, V.: SharpNet: fast and accurate recovery of occluding contours in monocular depth estimation. In: International Conference on Computer Vision Workshops (ICCV Workshops) (2019)
Google Scholar
Raskar, R., Tan, K.H., Feris, R., Yu, J., Turk, M.: Non-photorealistic camera: depth edge detection and stylized rendering using multi-flash imaging. ACM Trans. Graph. (TOG) 23(3), 679–688 (2004)
Article Google Scholar
Ren, X., Fowlkes, C.C., Malik, J.: Figure/ground assignment in natural images. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 614–627. Springer, Heidelberg (2006). https://doi.org/10.1007/11744047_47
Chapter Google Scholar
Ricci, E., Ouyang, W., Wang, X., Sebe, N., et al.: Monocular depth estimation using multi-scale continuous CRFs as sequential deep networks. IEEE Trans. Pattern Anal. Mach. Intell.(PAMI) 41(6), 1426–1440 (2018)
Google Scholar
Roberts, L.G.: Machine perception of three-dimensional solids. Ph.D. thesis, Massachusetts Institute of Technology (1963)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing & Computer Assisted Intervention (MICCAI) (2015)
Google Scholar
Stein, A.N., Hebert, M.: Occlusion boundaries from motion: low-level detection and mid-level reasoning. Int. J. Comput. Vis. (IJCV) 82, 325–357 (2008)
Article Google Scholar
Su, H., Jampani, V., Sun, D., Gallo, O., Learned-Miller, E., Kautz, J.: Pixel-adaptive convolutional neural networks. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11166–11175 (2019)
Google Scholar
Sugihara, K.: Machine Interpretation of Line Drawings, vol. 1. MIT press Cambridge (1986)
Google Scholar
Teo, C., Fermuller, C., Aloimonos, Y.: Fast 2D border ownership assignment. In: Conference on Computer Vision and Pattern Recognition (CVPR) pp. 5117–5125 (2015)
Google Scholar
Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. In: International Conference on Computer Vision (ICCV), pp. 839–846 (1998)
Google Scholar
Wang, G., Liang, X., Li, F.W.B.: DOOBNet: deep object occlusion boundary detection from an image. In: Asian Conference on Computer Vision (ACCV) (2018)
Google Scholar
Wang, P., Shen, X., Russell, B., Cohen, S., Price, B., Yuille, A.L.: Surge: surface regularized geometry estimation from a single image. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 172–180 (2016)
Google Scholar
Wang, P., Yuille, A.: DOC: deep OCclusion estimation from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 545–561. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_33
Chapter Google Scholar
Wang, Y., Yang, Y., Yang, Z., Zhao, L., Wang, P., Xu, W.: Occlusion aware unsupervised learning of optical flow. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4884–4893 (2018)
Google Scholar
Wu, H., Zheng, S., Zhang, J., Huang, K.: Fast end-to-end trainable guided filter. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1838–1847 (2018)
Google Scholar
Xie, S., Tu, Z.: Holistically-nested edge detection. In: International Conference on Computer Vision (ICCV), pp. 1395–1403 (2015)
Google Scholar
Xu, D., Ricci, E., Ouyang, W., Wang, X., Sebe, N.: Multi-scale continuous CRFs as sequential deep networks for monocular depth estimation. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5354–5362 (2017)
Google Scholar
Yin, W., Liu, Y., Shen, C., Yan, Y.: Enforcing geometric constraints of virtual normal for depth prediction. In: International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Yu, Z., Feng, C., Liu, M.Y., Ramalingam, S.: CASENet: deep category-aware semantic edge detection. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5964–5973 (2017)
Google Scholar
Yu, Z., et al.: Simultaneous edge alignment and learning. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 400–417. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_24
Chapter Google Scholar
Zheng, C., Cham, T.-J., Cai, J.: T$^2$Net: synthetic-to-realistic translation for solving single-image depth estimation tasks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 798–814. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_47
Chapter Google Scholar
Zitnick, C.L., Kanade, T.: A cooperative algorithm for stereo matching and occlusion detection. IEEE Trans. Pattern Anal. Mach. Intell.(PAMI) 22(7), 675–684 (2000)
Article Google Scholar

Download references

Acknowledgements.

We thank Yuming Du and Michael Ramamonjisoa for helpful discussions and for offering their GT annotations of occlusion boundaries for a large part of NYUv2, which we completed (NYUv2-OC++) [39]. This work was partly funded by the I-Site FUTURE initiative, through the DiXite project.

Author information

Authors and Affiliations

LIGM, Ecole des Ponts, Univ Gustave Eiffel, CNRS, ESIEE Paris, Champs-sur-Marne, France
Xuchong Qiu, Yang Xiao, Chaohui Wang & Renaud Marlet
valeo.ai, Paris, France
Renaud Marlet

Authors

Xuchong Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Yang Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Chaohui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Renaud Marlet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chaohui Wang .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 18170 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qiu, X., Xiao, Y., Wang, C., Marlet, R. (2020). Pixel-Pair Occlusion Relationship Map (P2ORM): Formulation, Inference and Application. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12349. Springer, Cham. https://doi.org/10.1007/978-3-030-58548-8_40

Download citation

DOI: https://doi.org/10.1007/978-3-030-58548-8_40
Published: 29 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58547-1
Online ISBN: 978-3-030-58548-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics