Abstract
We formalize concepts around geometric occlusion in 2D images (i.e., ignoring semantics), and propose a novel unified formulation of both occlusion boundaries and occlusion orientations via a pixel-pair occlusion relation. The former provides a way to generate large-scale accurate occlusion datasets while, based on the latter, we propose a novel method for task-independent pixel-level occlusion relationship estimation from single images. Experiments on a variety of datasets demonstrate that our method outperforms existing ones on this task. To further illustrate the value of our formulation, we also propose a new depth map refinement method that consistently improve the performance of state-of-the-art monocular depth estimation methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
As DOOBNet and OFNet are coded in Caffe, in order to have an unified platform for experimenting them on new datasets, we carefully re-implemented them in PyTorch (following the Caffe code). We could not reproduce exactly the same quantitative values provided in the original papers (ODS and OIS metrics are a bit less while AP is a bit better), probably due to some intrinsic differences between frameworks Caffe and PyTorch, however, the difference is very small (less than 0.03, cf. Table 2).
References
Acuna, D., Kar, A., Fidler, S.: Devil is in the edges: learning semantic boundaries from noisy annotations. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11075–11083 (2019)
Apostoloff, N., Fitzgibbon, A.: Learning spatiotemporal t-junctions for occlusion detection. In: Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 553–559. IEEE (2005)
Barron, J.T., Poole, B.: The fast bilateral solver. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 617–632. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_38
Boulch, A., Marlet, R.: Fast and robust normal estimation for point clouds with sharp features. Comput. Graph. Forum (CGF) 31(5), 1765–1774 (2012)
Cooper, M.C.: Interpreting line drawings of curved objects with tangential edges and surfaces. Image Vis. Comput. 15(4), 263–276 (1997)
Dollár, P., Zitnick, C.L.: Fast edge detection using structured forests. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 37(8), 1558–1570 (2014)
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2650–2658 (2015)
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems (NeurIPS), pp. 2366–2374. Curran Associates, Inc. (2014)
Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2002–2011 (2018)
Fu, H., Wang, C., Tao, D., Black, M.J.: Occlusion boundary detection via deep exploration of context. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 241–250 (2016)
Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 270–279 (2017)
He, K., Sun, J., Tang, X.: Guided image filtering. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 1–14. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15549-9_1
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
He, X., Yuille, A.: Occlusion boundary detection using pseudo-depth. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 539–552. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_39
Heise, P., Klose, S., Jensen, B., Knoll, A.: PM-Huber: patchmatch with Huber regularization for stereo matching. In: International Conference on Computer Vision (ICCV), pp. 2360–2367 (2013)
Heo, M., Lee, J., Kim, K.-R., Kim, H.-U., Kim, C.-S.: Monocular depth estimation using whole strip masking and reliability-based refinement. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 39–55. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_3
Hoiem, D., Efros, A.A., Hebert, M.: Recovering occlusion boundaries from an image. Int. J. Comput. Vis. (IJCV) 91, 328–346 (2010)
Hong, Z., Chen, Z., Wang, C., Mei, X., Prokhorov, D., Tao, D.: Multi-store tracker (muster): a cognitive psychology inspired approach to object tracking. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 749–758 (2015)
Ilg, E., Saikia, T., Keuper, M., Brox, T.: Occlusions, motion and depth boundaries with a generic network for disparity, optical flow or scene flow estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11216, pp. 626–643. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01258-8_38
Jiao, J., Cao, Y., Song, Y., Lau, R.: Look deeper into depth: monocular depth estimation with semantic booster and attention-driven loss. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 55–71. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_4
Koch, T., Liebel, L., Fraundorfer, F., Körner, M.: Evaluation of CNN-based single-image depth estimation methods. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11131, pp. 331–348. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11015-4_25
Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: International Conference on 3D Vision (3DV), pp. 239–248. IEEE (2016)
Lee, J.H., Kim, C.S.: Monocular depth estimation using relative depth maps. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9729–9738 (2019)
Leichter, I., Lindenbaum, M.: Boundary ownership by lifting to 2.1-D. In: International Conference on Computer Vision (ICCV), pp. 9–16. IEEE (2008)
Li, J.Y., Klein, R., Yao, A.: A two-streamed network for estimating fine-scaled depth maps from single RGB images. In: International Conference on Computer Vision (ICCV), pp. 3392–3400 (2016)
Li, W., et al.: InteriorNet: mega-scale multi-sensor photo-realistic indoor scenes dataset. In: British Machine Vision Conference (BMVC) (2018)
Liu, C., Yang, J., Ceylan, D., Yumer, E., Furukawa, Y.: PlaneNet: piece-wise planar reconstruction from a single RGB image. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2579–2588 (2018)
Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Liu, Y., Cheng, M.M., Fan, D.P., Zhang, L., Bian, J., Tao, D.: Semantic edge detection with diverse deep supervision. arXiv preprint arXiv:1804.02864 (2018)
Lu, R., Xue, F., Zhou, M., Ming, A., Zhou, Y.: Occlusion-shared and feature-separated network for occlusion relationship reasoning. In: International Conference on Computer Vision (ICCV) (2019)
Martin, D.R., Fowlkes, C.C., Malik, J.: Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 26(5), 530–549 (2004)
Massa, F., Marlet, R., Aubry, M.: Crafting a multi-task CNN for viewpoint estimation. In: British Machine Vision Conference (BMVC) (2016)
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54
Nitzberg, M., Mumford, D.B.: The 2.1-D Sketch. IEEE Computer Society Press (1990)
Oberweger, M., Rad, M., Lepetit, V.: Making deep heatmaps robust to partial occlusions for 3D object pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 125–141. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_8
Peng, S., Liu, Y., Huang, Q., Zhou, X., Bao, H.: PVNet: pixel-wise voting network for 6DoF pose estimation. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4561–4570 (2019)
Rad, M., Lepetit, V.: BB8: a scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: International Conference on Computer Vision (ICCV), pp. 3828–3836 (2017)
Rafi, U., Gall, J., Leibe, B.: A semantic occlusion model for human pose estimation from a single depth image. In: Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops), pp. 67–74 (2015)
Ramamonjisoa, M., Du, Y., Lepetit, V.: Predicting sharp and accurate occlusion boundaries in monocular depth estimation using displacement fields. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14648–14657 (2020)
Ramamonjisoa, M., Lepetit, V.: SharpNet: fast and accurate recovery of occluding contours in monocular depth estimation. In: International Conference on Computer Vision Workshops (ICCV Workshops) (2019)
Raskar, R., Tan, K.H., Feris, R., Yu, J., Turk, M.: Non-photorealistic camera: depth edge detection and stylized rendering using multi-flash imaging. ACM Trans. Graph. (TOG) 23(3), 679–688 (2004)
Ren, X., Fowlkes, C.C., Malik, J.: Figure/ground assignment in natural images. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 614–627. Springer, Heidelberg (2006). https://doi.org/10.1007/11744047_47
Ricci, E., Ouyang, W., Wang, X., Sebe, N., et al.: Monocular depth estimation using multi-scale continuous CRFs as sequential deep networks. IEEE Trans. Pattern Anal. Mach. Intell.(PAMI) 41(6), 1426–1440 (2018)
Roberts, L.G.: Machine perception of three-dimensional solids. Ph.D. thesis, Massachusetts Institute of Technology (1963)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing & Computer Assisted Intervention (MICCAI) (2015)
Stein, A.N., Hebert, M.: Occlusion boundaries from motion: low-level detection and mid-level reasoning. Int. J. Comput. Vis. (IJCV) 82, 325–357 (2008)
Su, H., Jampani, V., Sun, D., Gallo, O., Learned-Miller, E., Kautz, J.: Pixel-adaptive convolutional neural networks. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11166–11175 (2019)
Sugihara, K.: Machine Interpretation of Line Drawings, vol. 1. MIT press Cambridge (1986)
Teo, C., Fermuller, C., Aloimonos, Y.: Fast 2D border ownership assignment. In: Conference on Computer Vision and Pattern Recognition (CVPR) pp. 5117–5125 (2015)
Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. In: International Conference on Computer Vision (ICCV), pp. 839–846 (1998)
Wang, G., Liang, X., Li, F.W.B.: DOOBNet: deep object occlusion boundary detection from an image. In: Asian Conference on Computer Vision (ACCV) (2018)
Wang, P., Shen, X., Russell, B., Cohen, S., Price, B., Yuille, A.L.: Surge: surface regularized geometry estimation from a single image. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 172–180 (2016)
Wang, P., Yuille, A.: DOC: deep OCclusion estimation from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 545–561. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_33
Wang, Y., Yang, Y., Yang, Z., Zhao, L., Wang, P., Xu, W.: Occlusion aware unsupervised learning of optical flow. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4884–4893 (2018)
Wu, H., Zheng, S., Zhang, J., Huang, K.: Fast end-to-end trainable guided filter. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1838–1847 (2018)
Xie, S., Tu, Z.: Holistically-nested edge detection. In: International Conference on Computer Vision (ICCV), pp. 1395–1403 (2015)
Xu, D., Ricci, E., Ouyang, W., Wang, X., Sebe, N.: Multi-scale continuous CRFs as sequential deep networks for monocular depth estimation. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5354–5362 (2017)
Yin, W., Liu, Y., Shen, C., Yan, Y.: Enforcing geometric constraints of virtual normal for depth prediction. In: International Conference on Computer Vision (ICCV) (2019)
Yu, Z., Feng, C., Liu, M.Y., Ramalingam, S.: CASENet: deep category-aware semantic edge detection. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5964–5973 (2017)
Yu, Z., et al.: Simultaneous edge alignment and learning. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 400–417. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_24
Zheng, C., Cham, T.-J., Cai, J.: T\(^2\)Net: synthetic-to-realistic translation for solving single-image depth estimation tasks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 798–814. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_47
Zitnick, C.L., Kanade, T.: A cooperative algorithm for stereo matching and occlusion detection. IEEE Trans. Pattern Anal. Mach. Intell.(PAMI) 22(7), 675–684 (2000)
Acknowledgements.
We thank Yuming Du and Michael Ramamonjisoa for helpful discussions and for offering their GT annotations of occlusion boundaries for a large part of NYUv2, which we completed (NYUv2-OC++) [39]. This work was partly funded by the I-Site FUTURE initiative, through the DiXite project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Qiu, X., Xiao, Y., Wang, C., Marlet, R. (2020). Pixel-Pair Occlusion Relationship Map (P2ORM): Formulation, Inference and Application. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12349. Springer, Cham. https://doi.org/10.1007/978-3-030-58548-8_40
Download citation
DOI: https://doi.org/10.1007/978-3-030-58548-8_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58547-1
Online ISBN: 978-3-030-58548-8
eBook Packages: Computer ScienceComputer Science (R0)