Dual Attention Feature Fusion Network for Monocular Depth Estimation

Xu, Yifang; Li, Ming; Peng, Chenglei; Li, Yang; Du, Sidan

doi:10.1007/978-3-030-93046-2_39

Yifang Xu^14,15,
Ming Li^14,15,
Chenglei Peng^14,15,
Yang Li^14,15 &
…
Sidan Du^14,15

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13069))

Included in the following conference series:

CAAI International Conference on Artificial Intelligence

2137 Accesses
2 Citations

Abstract

Monocular depth estimation is a traditional computer vision task, which plays a crucial role in 3D reconstruction and scene understanding. Recent works based on deep convolutional neural networks (DCNNs) have achieved great success. However, these works did not make full use of structural information, resulting in discontinuity of depth and ambiguity of boundaries. This paper proposes Dual Attention Feature Fusion Network (DAFFNet), which utilizes the structural relationship between the RGB image and the predicted depth to solve the above problems. It contains two critical modules, Dual Attention Fusion Module (DAFM) and Iterative Dual Attention Fusion Module (IDAFM). Specifically, DAFM includes two blocks (spatial attention block and channel attention block), which fuse global context and local information respectively. To aggregate information of different encoder and decoder levels, we design IDAFM, an iterative version of DAFM. Extensive experimental results on the KITTI dataset show that our model achieves the competitive performance with recent state-of-the-art models .

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Saxena, A., Chung, S.H., Ng, A.: Learning depth from single monocular images. In: NIPS (2005)
Google Scholar
Godard, C., Aodha, O.M., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR, pp. 6602–6611. IEEE Computer Society (2017)
Google Scholar
Ranftl, R., Vineet, V., Chen, Q., Koltun, V.: Dense monocular depth estimation in complex dynamic scenes. In: CVPR, pp. 4058–4066. IEEE Computer Society (2016). https://doi.org/10.1109/CVPR.2016.440
Yang, J., Mao, W., Alvarez, J.M., Liu, M.: Cost volume pyramid based depth inference for multi-view stereo. In: CVPR, pp. 4876–4885. IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.00493
Lee, J.H., Han, M., Ko, D.W., Suh, I.H.: From big to small: multi-scale local planar guidance for monocular depth estimation. CoRR abs/1907.10326 (2019)
Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: NIPS, pp. 2366–2374 (2014)
Google Scholar
Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: CVPR, pp. 2002–2011. IEEE Computer Society (2018)
Google Scholar
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the Kitti dataset. Int. J. Robot. Res. 32, 1231–1237 (2013)
Article Google Scholar
Fácil, J.M., Ummenhofer, B., Zhou, H., Montesano, L., Brox, T., Civera, J.: CAM-Convs: camera-aware multi-scale convolutions for single-view depth. In: CVPR, pp. 11826–11835. Computer Vision Foundation. IEEE (2019)
Google Scholar
Lee, J., Kim, C.: Monocular depth estimation using relative depth maps. In: CVPR, pp. 9729–9738. Computer Vision Foundation/IEEE (2019)
Google Scholar
Zhou, T., Brown, M., Snavely, N., Lowe, D.: Unsupervised learning of depth and ego-motion from video. In: CVPR, pp. 6612–6619 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778. IEEE Computer Society (2016)
Google Scholar
Huang, G., Liu, Z., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR, pp. 2261–2269 (2017)
Google Scholar
Xie, S., Girshick, R.B., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: CVPR, pp. 5987–5995. IEEE Computer Society (2017)
Google Scholar
Alhashim, I., Wonka, P.: High quality monocular depth estimation via transfer learning. CoRR abs/1812.11941 (2018)
Google Scholar
Aich, S., Vianney, J.M.U., Islam, M.A., Kaur, M., Liu, B.: Bidirectional attention network for monocular depth estimation. CoRR abs/2009.00743 (2020)
Google Scholar
Chen, T., et al.: Improving monocular depth estimation by leveraging structural awareness and complementary datasets (2020)
Google Scholar
Ummenhofer, B., et al.: Demon: depth and motion network for learning monocular stereo. In: CVPR, pp. 5622–5631. IEEE Computer Society (2017)
Google Scholar
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Chapter Google Scholar
Saxena, A., Sun, M., Ng, A.Y.: Make3D: learning 3D scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2009). https://doi.org/10.1109/TPAMI.2008.132
Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: 3DV, pp. 239–248. IEEE Computer Society (2016)
Google Scholar
Li, R., Xian, K., Shen, C., Cao, Z., Lu, H., Hang, L.: Deep attention-based classification network for robust depth prediction. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11364, pp. 663–678. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20870-7_41
Chapter Google Scholar
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: ICCV, pp. 2650–2658. IEEE Computer Society (2015)
Google Scholar
Ramamonjisoa, M., Du, Y., Lepetit, V.: Predicting sharp and accurate occlusion boundaries in monocular depth estimation using displacement fields. In: CVPR, pp. 14636–14645. IEEE (2020)
Google Scholar
Xu, D., Ricci, E., Ouyang, W., Wang, X., Sebe, N.: Multi-scale continuous CRFS as sequential deep networks for monocular depth estimation. In: CVPR, pp. 161–169. IEEE Computer Society (2017)
Google Scholar
Garg, R., B.G., V.K., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 740–756. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_45
Chapter Google Scholar
Mahjourian, R., Wicke, M., Angelova, A.: Unsupervised learning of depth and ego-motion from monocular video using 3D geometric constraints. In: CVPR, pp. 5667–5675. IEEE Computer Society (2018)
Google Scholar
Yang, Z., Wang, P., Wang, Y., Xu, W., Nevatia, R.: LEGO: learning edge with geometry all at once by watching videos. In: CVPR, pp. 225–234. IEEE Computer Society (2018)
Google Scholar
Zheng, C., Cham, T., Cai, J.: T2Net: synthetic-to-realistic translation for solving single-image depth estimation tasks. In: ECCV (2018)
Google Scholar
Wang, Q., Guo, G.: LS-CNN: characterizing local patches at multiple scales for face recognition. IEEE Trans. Inf. Forensics Secur. 15, 1640–1653 (2020)
Article Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR, pp. 7132–7141. IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00745
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Kong, S., Fowlkes, C.C.: Pixel-wise attentional gating for parsimonious pixel labeling. CoRR abs/1805.01556 (2018)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) ICLR (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Nanjing University, Nanjing, China
Yifang Xu, Ming Li, Chenglei Peng, Yang Li & Sidan Du
Nanjing Institute of Advanced Artificial Intelligence, Nanjing, China
Yifang Xu, Ming Li, Chenglei Peng, Yang Li & Sidan Du

Authors

Yifang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Ming Li
View author publications
You can also search for this author in PubMed Google Scholar
Chenglei Peng
View author publications
You can also search for this author in PubMed Google Scholar
Yang Li
View author publications
You can also search for this author in PubMed Google Scholar
Sidan Du
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Chenglei Peng or Sidan Du .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Lu Fang
Duke University, Durham, NC, USA
Yiran Chen
Shanghai Jiao Tong University, Shanghai, China
Guangtao Zhai
University of British Columbia, Vancouver, BC, Canada
Jane Wang
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Ruiping Wang
Xidian University, Xi’an, China
Weisheng Dong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, Y., Li, M., Peng, C., Li, Y., Du, S. (2021). Dual Attention Feature Fusion Network for Monocular Depth Estimation. In: Fang, L., Chen, Y., Zhai, G., Wang, J., Wang, R., Dong, W. (eds) Artificial Intelligence. CICAI 2021. Lecture Notes in Computer Science(), vol 13069. Springer, Cham. https://doi.org/10.1007/978-3-030-93046-2_39

Download citation

DOI: https://doi.org/10.1007/978-3-030-93046-2_39
Published: 01 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93045-5
Online ISBN: 978-3-030-93046-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics