KIL: Knowledge Interactiveness Learning for Joint Depth Estimation and Semantic Segmentation

Zhou, Ling; Xu, Chunyan; Cui, Zhen; Yang, Jian

doi:10.1007/978-3-030-41404-7_59

Ling Zhou¹²,
Chunyan Xu¹²,
Zhen Cui¹² &
…
Jian Yang¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12046))

Included in the following conference series:

Asian Conference on Pattern Recognition

1436 Accesses
1 Citations

Abstract

Depth estimation and semantic segmentation are two important yet challenging tasks in the field of pixel-level scene understanding. Previous works often solve the two tasks as the parallel decoding/modeling process, but cannot well consider strongly correlated relationships between these tasks. In this paper, given an input image, we propose to learn knowledge interactiveness of depth estimation and semantic segmentation for jointly predicting results in an end-to-end way. Especially, the key Knowledge Interactiveness Learning (KIL) module can mine and leverage the connections/complementations of these two tasks effectively. Furthermore, the network parameters can be jointly optimized for boosting the final predictions of depth estimation and semantic segmentation with the coarse-to-fine strategy. Extensive experiments on SUN-RGBD and NYU Depth-V2 datasets demonstrate state-of-the-art performance of the proposed unified framework for both depth estimation and semantic segmentation tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kendall, A., Badrinarayanan, V., Cipolla, R.: Bayesian segNet: model uncertainty in deep convolutional encoder-decoder architectures for scene understanding. arXiv preprint arXiv:1511.02680 (2015)
Jalali, A., Sanghavi, S., Ruan, C., Ravikumar, P.K.: A dirty model for multi-task learning. In: Advances in Neural Information Processing Systems, pp. 964–972 (2010)
Google Scholar
Roy, A., Todorovic, S.: Monocular depth estimation using neural regression forest. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5506–5514 (2016)
Google Scholar
Saxena, A., Sun, M., Ng, A.Y.: Make3D: learning 3d scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2008)
Article Google Scholar
Saxena, A., Chung, S.H., Ng, A.Y.: Learning depth from single monocular images. In: Advances in Neural Information Processing Systems, pp. 1161–1168 (2006)
Google Scholar
Li, B., Shen, C., Dai, Y., Van Den Hengel, A., He, M.: Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1119–1127 (2015)
Google Scholar
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
Article MathSciNet Google Scholar
Xu, D., Ricci, E., Ouyang, W., Wang, X., Sebe, N.: Multi-scale continuous crfs as sequential deep networks for monocular depth estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5354–5362 (2017)
Google Scholar
Xu, D., Ouyang, W., Wang, X., Sebe, N.: Pad-Net: multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 675–684 (2018)
Google Scholar
Ilea, D.E., Whelan, P.F.: Image segmentation based on the integration of colour-texture descriptors-a review. Pattern Recogn. 44(10–11), 2479–2501 (2011)
Article Google Scholar
Daniel, W., Remi, R., Edmond, B.: A survey of vision-based methods for action representation, segmentation and recognition. Comput. Vis. Image Understand. 115(2), 224–241 (2011)
Article Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in Neural Information Processing Systems, pp. 2366–2374 (2014)
Google Scholar
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2650–2658 (2015)
Google Scholar
Lin, D., Chen, G., Cohen-Or, D., Heng, P.A., Huang, H.: Cascaded feature network for semantic segmentation of RGB-D images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1311–1319 (2017)
Google Scholar
Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5162–5170 (2015)
Google Scholar
Liu, F., Shen, C., Lin, G., Reid, I.: Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2024–2039 (2015)
Article Google Scholar
Lin, G., Milan, A., Shen, C., Reid, I.: RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1925–1934 (2017)
Google Scholar
Lin, G., Shen, C., Van Den Hengel, A., Reid, I.: Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3194–3203 (2016)
Google Scholar
Hadsell, R., et al.: Learning long-range vision for autonomous off-road driving. J. Field Robot. 26, 120–144 (2009)
Article Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2002–2011 (2018)
Google Scholar
Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 239–248 (2016)
Google Scholar
Misra, I., Shrivastava, A., Gupta, A., Hebert, M.: Cross-stitch networks for multi-task learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3994–4003 (2016)
Google Scholar
Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Advances in Neural Information Processing Systems, pp. 3320–3328 (2014)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Google Scholar
Nath Kundu, J., Krishna Uppala, P., Pahuja, A., Venkatesh Babu, R.: AdaDepth: unsupervised content congruent adaptation for depth estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2656–2665 (2018)
Google Scholar
Baxter, J.: A Bayesian/information theoretic model of learning to learn via multiple task sampling. Mach. Learn. 28(1), 7–39 (1997)
Article Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Tateno, K., Tombari, F., Laina, I., Navab, N.: CNN-SLAM: real-time dense monocular slam with learned depth prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6243–6252 (2017)
Google Scholar
Long, M., Wang, J.: Learning multiple tasks with deep relationship networks. arXiv preprint arXiv:1506.02117 2 (2015)
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54
Chapter Google Scholar
Wang, P., Shen, X., Lin, Z., Cohen, S., Price, B., Yuille, A.L.: Towards unified depth and semantic prediction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2800–2809 (2015)
Google Scholar
Jarvis, R.A.: A perspective on range finding techniques for computer vision. IEEE Trans. Pattern Anal. Mach. Intell. PAMI–5(2), 122–139 (1983)
Article Google Scholar
Geman, S., Geman, D.: Stochastic relaxation, Gibbs distribution and tlle Bayesian restoration of images. IEEE Trans. Pattern Anal. Machine Intell. 721–741
Google Scholar
Park, S.J., Hong, K.S. and Lee, S.: RDFNet: RGB-D multi-level residual feature fusion for indoor semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4980–4989 (2017)
Google Scholar
Liu, S., Johns, E., Davison, A.J.: End-to-end multi-task learning with attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1871–1880 (2019)
Google Scholar
Song, S., Lichtenberg, S.P., Xiao, J.: SUN RGB-D: A RGB-D scene understanding benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 567–576 (2015)
Google Scholar
Evgeniou, T., Micchelli, C.A., Pontil, M.: Learning multiple tasks with kernel methods. J. Mach. Learn. Res. 6(Apr), 615–637 (2005)
MathSciNet MATH Google Scholar
Gebru, T., Hoffman, J., Fei-Fei, L.: Fine-grained recognition in the wild: a multi-task domain adaptation approach. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1349–1358 (2017)
Google Scholar
Qi, X., Liao, R., Jia, J., Fidler, S., Urtasun, R.: 3d graph neural networks for RGBD semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5199–5208 (2017)
Google Scholar
Yang, X., Gao, Y., Luo, H., Liao, C., Cheng, K.T.: Bayesian DeNet: monocular depth prediction and frame-wise fusion with synchronized uncertainty. IEEE Trans. Multimed. 21(11), 2701–2713 (2019)
Article Google Scholar
Cheng, Y., Cai, R., Li, Z., Zhao, X., Huang, K.: Locality-sensitive deconvolution networks with gated fusion for RGB-D indoor semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3029–3037 (2017)
Google Scholar
Yang, Y., Hospedales, T.M.: Trace norm regularised deep multi-task learning. arXiv preprint arXiv:1606.04038 (2016)
Liu, Y., Guo, Y., Lew, M.S.: On the exploration of convolutional fusion networks for visual recognition. In: International Conference on Multimedia Modeling, pp. 277–289 (2017)
Google Scholar
Chen, Y., Li, W., Chen, X., Gool, L.V.: Learning semantic segmentation from synthetic data: a geometrically guided input-output adaptation approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1841–1850 (2019)
Google Scholar
Li, Z., Gan, Y., Liang, X., Yu, Y., Cheng, H., Lin, L.,: LSTM-CF: unifying context modeling and fusion with LSTMs for RGB-D scene labeling. In: European Conference on Computer Vision, pp. 541–557 (2016)
Chapter Google Scholar
Zhang, Z., Cui, Z., Xu, C., Yan, Y., Sebe, N., Yang, J.: Pattern-affinitive propagation across depth, surface normal and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4106–4115 (2019)
Google Scholar
Zhang, Z., Cui, Z., Xu, C., Jie, Z., Li, X., Yang, J.: Joint task-recursive learning for semantic segmentation and depth estimation. In: Proceedings of the European Conference on Computer Vision, pp. 235–251 (2018)
Chapter Google Scholar

Download references

Acknowlegdement

This work was supported by the National Natural Science Foundation of China (Grants Nos. 61972204, 61772276, 61906094), the Natural Science Foundation of Jiangsu Province (Grant No. BK20191283), the fundamental research funds for the central universities (No. 30919011232).

Author information

Authors and Affiliations

Key Lab of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
Ling Zhou, Chunyan Xu, Zhen Cui & Jian Yang

Authors

Ling Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Chunyan Xu
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Cui
View author publications
You can also search for this author in PubMed Google Scholar
Jian Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chunyan Xu .

Editor information

Editors and Affiliations

University of Malaya, Kuala Lumpur, Malaysia
Shivakumara Palaiahnakote
Consiglio Nazionale delle Ricerche, ICAR, Naples, Italy
Gabriella Sanniti di Baja
Chinese Academy of Sciences, Beijing, China
Liang Wang
Auckland University of Technology, Auckland, New Zealand
Wei Qi Yan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, L., Xu, C., Cui, Z., Yang, J. (2020). KIL: Knowledge Interactiveness Learning for Joint Depth Estimation and Semantic Segmentation. In: Palaiahnakote, S., Sanniti di Baja, G., Wang, L., Yan, W. (eds) Pattern Recognition. ACPR 2019. Lecture Notes in Computer Science(), vol 12046. Springer, Cham. https://doi.org/10.1007/978-3-030-41404-7_59

Download citation

DOI: https://doi.org/10.1007/978-3-030-41404-7_59
Published: 23 February 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41403-0
Online ISBN: 978-3-030-41404-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics