Skip to main content

MD-ST: Monocular Depth Estimation Based on Spatio-Temporal Correlation Features

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12305))

Included in the following conference series:

  • 2544 Accesses

Abstract

Monocular depth estimation methods based on deep learning have shown very promising results recently, most of which exploit deep convolutional neural networks (CNNs) with scene geometric constraints. However, the depth maps estimated by most existing methods still have problems such as unclear object contours and unsmooth depth gradients. In this paper, we propose a novel encoder-decoder network, named Monocular Depth estimation with Spatio-Temporal features (MD-ST), based on recurrent convolutional neural networks for monocular video depth estimation with spatio-temporal correlation features. Specifically, we put forward a novel encoder with convolutional long short-term memory (Conv-LSTM) structure for monocular depth estimation, which not only captures the spatial features of the scene but also focuses on collecting the temporal features from video sequences. In decoder, we learn four scales depth maps for multi-scale estimation to fine-tune the outputs. Additionally, in order to enhance and maintain the spatio-temporal consistency, we constraint our network with a flow consistency loss to penalize the errors between the estimated and ground-truth maps by learning residual flow vectors. Experiments conducted on the KITTI dataset demonstrate that the proposed MD-ST can effectively estimate scene depth maps, especially in dynamic scenes, which is superior to existing monocular depth estimation methods.

The work presented in this paper is supported by Beijing Natural Science Foundation of China (Grant No. L182033), Fund for Beijing University of Posts and Telecommunications (2019PTB-001).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Atapour-Abarghouei, A., Breckon, T.P.: Veritatem dies aperit-temporally consistent depth prediction enabled by a multi-task geometric and semantic scene understanding approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3373–3384 (2019)

    Google Scholar 

  2. CS Kumar, A., Bhandarkar, S.M., Prasad, M.: Depthnet: a recurrent neural network architecture for monocular depth prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 283–291 (2018)

    Google Scholar 

  3. Doan, A.D., Latif, Y., Chin, T.J., Liu, Y., Do, T.T., Reid, I.: Scalable place recognition under appearance change for autonomous driving. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9319–9328 (2019)

    Google Scholar 

  4. Dosovitskiy, A., et al.: Flownet: learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2758–2766 (2015)

    Google Scholar 

  5. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in Neural Information Processing Systems, pp. 2366–2374 (2014)

    Google Scholar 

  6. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012)

    Google Scholar 

  7. Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 270–279 (2017)

    Google Scholar 

  8. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  9. Kim, Y., Jung, H., Min, D., Sohn, K.: Deep monocular depth estimation via integration of global and local predictions. IEEE Trans. Image Process. 27(8), 4131–4144 (2018)

    Article  MathSciNet  Google Scholar 

  10. Li, J., Klein, R., Yao, A.: A two-streamed network for estimating fine-scaled depth maps from single rgb images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3372–3380 (2017)

    Google Scholar 

  11. Li, Y., Zhu, J., Hoi, S.C., Song, W., Wang, Z., Liu, H.: Robust estimation of similarity transformation for visual object tracking. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8666–8673 (2019)

    Google Scholar 

  12. Liu, F., Shen, C., Lin, G., Reid, I.: Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2024–2039 (2015)

    Article  Google Scholar 

  13. Mancini, M., Costante, G., Valigi, P., Ciarfuglia, T.A., Delmerico, J., Scaramuzza, D.: Toward domain independence for learning-based monocular depth estimation. IEEE Rob. Autom. Lett. 2(3), 1778–1785 (2017)

    Article  Google Scholar 

  14. Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4040–4048 (2016)

    Google Scholar 

  15. Meng, X., Fan, C., Ming, Y., Shen, Y., Yu, H.: Un-vdnet: unsupervised network for visual odometry and depth estimation. J. Electron. Imaging 28(6), 063015 (2019)

    Article  Google Scholar 

  16. Paszke, A., et al.: Automatic differentiation in pytorch (2017)

    Google Scholar 

  17. Wang, R., Pizer, S.M., Frahm, J.M.: Recurrent neural network for (un-) supervised learning of monocular video visual odometry and depth. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5555–5564 (2019)

    Google Scholar 

  18. Wang, Y., Wang, P., Yang, Z., Luo, C., Yang, Y., Xu, W.: Unos: unified unsupervised optical-flow and stereo-depth estimation by watching videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8071–8081 (2019)

    Google Scholar 

  19. Xingjian, S., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional lstm network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems, pp. 802–810 (2015)

    Google Scholar 

  20. Yu, Z., Zheng, J., Lian, D., Zhou, Z., Gao, S.: Single-image piece-wise planar 3D reconstruction via associative embedding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1029–1037 (2019)

    Google Scholar 

  21. Zhan, H., Garg, R., Saroj Weerasekera, C., Li, K., Agarwal, H., Reid, I.: Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 340–349 (2018)

    Google Scholar 

  22. Zhang, K., Chen, J., Li, Y., Zhang, X.: Visual tracking and depth estimation of mobile robots without desired velocity information. IEEE Trans. Cybern. 50(1), 361–373 (2018)

    Article  Google Scholar 

  23. Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1851–1858 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuyang Meng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Meng, X., Fan, C., Ming, Y., Zhang, R., Zhao, P. (2020). MD-ST: Monocular Depth Estimation Based on Spatio-Temporal Correlation Features. In: Peng, Y., et al. Pattern Recognition and Computer Vision. PRCV 2020. Lecture Notes in Computer Science(), vol 12305. Springer, Cham. https://doi.org/10.1007/978-3-030-60633-6_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60633-6_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60632-9

  • Online ISBN: 978-3-030-60633-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics