APAC-Net: Unsupervised Learning of Depth and Ego-Motion from Monocular Video

Lin, Rui; Lu, Yao; Lu, Guangming

doi:10.1007/978-3-030-36189-1_28

Rui Lin¹³,
Yao Lu¹³ &
Guangming Lu¹³

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11935))

Included in the following conference series:

International Conference on Intelligent Science and Big Data Engineering

1556 Accesses
1 Citations

Abstract

We propose an unsupervised novel method, Attention-Pixel and Attention-Channel Network (APAC-Net), for unsupervised monocular learning of estimating scene depth and ego-motion. Our model only utilizes monocular image sequences and does not need additional sensor information, such as IMU and GPS, for supervising. The attention mechanism is employed in APAC-Net to improve the networks’ efficiency. Specifically, three attention modules are proposed to adjust feature weights when training. Moreover, to minimum the effect of noise, which is produced in the reconstruction processing, the Image-reconstruction loss based on PSNR $L_{PSNR}$ is used to evaluation the reconstruction quality. In addition, due to the fail depth estimation of the objects closed to camera, the Temporal-consistency loss $L_{Temp}$ between adjacent frames and the Scale-based loss $L_{Scale}$ among different scales are proposed. Experimental results showed APAC-Net can perform well in both the depth and ego-motion tasks, and it even behaved better in several items on KITTI and Cityscapes.

G. Lu—The work is supported by the NSFC fund (61332011), Shenzhen Fundamental Research fund (JCYJ20170811155442454, JCYJ20180306172023949), and Medical Biometrics Perception and Analysis Engineering Laboratory, Shenzhen, China.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

REAL-NET: A Monochromatic Depth Estimation Using REgional Attention and Local Feature Mapping

PAR-mono: monocular video depth estimation network based on channel separation and dynamic attention

Article 10 December 2024

Self-supervised coarse-to-fine monocular depth estimation using a lightweight attention module

Article Open access 16 June 2022

References

Choi, S., Min, D., Ham, B., Kim, Y., Oh, C., Sohn, K.: Depth analogy: data-driven approach for single image depth estimation using gradient samples. IEEE Trans. Image Process. 24(12), 5953–5966 (2015)
Article MathSciNet Google Scholar
Clark, R., Wang, S., Wen, H., Markham, A., Trigoni, N.: VINet: visual-inertial odometry as a sequence-to-sequence learning problem. National Conference on Artificial Intelligence, pp. 3995–4001 (2017)
Google Scholar
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)
Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Neural Information Processing Systems, pp. 2366–2374 (2014)
Google Scholar
Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 611–625 (2018)
Article Google Scholar
Garg, R., Kumar, B.G.V., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 740–756. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_45
Chapter Google Scholar
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)
Article Google Scholar
Godard, C., Aodha, O.M., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: Computer Vision and Pattern Recognition, pp. 6602–6611 (2017)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2015)
Google Scholar
Klein, G., Murray, D.W.: Parallel tracking and mapping for small AR workspaces, pp. 1–10 (2007)
Google Scholar
Li, R., Wang, S., Long, Z., Gu, D.: UnDeepVO: monocular visual odometry through unsupervised deep learning. In: International Conference on Robotics and Automation, pp. 7286–7291 (2018)
Google Scholar
Liu, B., Gould, S., Koller, D.: Single image depth estimation from predicted semantic labels, pp. 1253–1260 (2010)
Google Scholar
Mahjourian, R., Wicke, M., Angelova, A.: Unsupervised learning of depth and ego-motion from monocular video using 3D geometric constraints. In: Computer Vision and Pattern Recognition, pp. 5667–5675 (2018)
Google Scholar
Murartal, R., Montiel, J.M.M., Tardos, J.D.: ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)
Article Google Scholar
Newcombe, R.A., Lovegrove, S., Davison, A.J.: DTAM: dense tracking and mapping in real-time, pp. 2320–2327 (2011)
Google Scholar
Pillai, S., Leonard, J.J.: Towards visual ego-motion learning in robots, pp. 5533–5540 (2017)
Google Scholar
Pinard, C., Chevalley, L., Manzanera, A., Filliat, D.: Learning structure-from-motion from motion. Computer Vision and Pattern Recognition, pp. 363–376 (2018). arXiv
Google Scholar
Repala, V.K., Dubey, S.R.: Dual CNN models for unsupervised monocular depth estimation. Computer Vision and Pattern Recognition (2018). arXiv
Google Scholar
Saxena, A., Sun, M., Ng, A.Y.: Make3D: learning 3D scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2009)
Article Google Scholar
Vijayanarasimhan, S., Ricco, S., Schmid, C., Sukthankar, R., Fragkiadaki, K.: SfM-Net: learning of structure and motion from video. Computer Vision and Pattern Recognition (2017). arXiv
Google Scholar
Wang, S., Clark, R., Wen, H., Trigoni, N.: DeepVO: towards end-to-end visual odometry with deep recurrent convolutional neural networks. In: International Conference on Robotics and Automation, pp. 2043–2050 (2017)
Google Scholar
Yin, Z., Shi, J.: GeoNet: unsupervised learning of dense depth, optical flow and camera pose. In: Computer Vision and Pattern Recognition, pp. 1983–1992 (2018)
Google Scholar
Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: Computer Vision and Pattern Recognition, pp. 6612–6619 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Harbin Institute of Technology (ShenZhen), ShenZhen, 518055, China
Rui Lin, Yao Lu & Guangming Lu

Authors

Rui Lin
View author publications
You can also search for this author in PubMed Google Scholar
Yao Lu
View author publications
You can also search for this author in PubMed Google Scholar
Guangming Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guangming Lu .

Editor information

Editors and Affiliations

Nanjing University of Science and Technology, Nanjing, China
Zhen Cui
Nanjing University of Science and Technology, Nanjing, China
Jinshan Pan
Nanjing University of Science and Technology, Nanjing, China
Shanshan Zhang
Nanjing University of Science and Technology, Nanjing, China
Liang Xiao
Nanjing University of Science and Technology, Nanjing, China
Jian Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lin, R., Lu, Y., Lu, G. (2019). APAC-Net: Unsupervised Learning of Depth and Ego-Motion from Monocular Video. In: Cui, Z., Pan, J., Zhang, S., Xiao, L., Yang, J. (eds) Intelligence Science and Big Data Engineering. Visual Data Engineering. IScIDE 2019. Lecture Notes in Computer Science(), vol 11935. Springer, Cham. https://doi.org/10.1007/978-3-030-36189-1_28

Download citation

DOI: https://doi.org/10.1007/978-3-030-36189-1_28
Published: 29 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36188-4
Online ISBN: 978-3-030-36189-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

APAC-Net: Unsupervised Learning of Depth and Ego-Motion from Monocular Video

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

REAL-NET: A Monochromatic Depth Estimation Using REgional Attention and Local Feature Mapping

PAR-mono: monocular video depth estimation network based on channel separation and dynamic attention

Self-supervised coarse-to-fine monocular depth estimation using a lightweight attention module

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

APAC-Net: Unsupervised Learning of Depth and Ego-Motion from Monocular Video

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

REAL-NET: A Monochromatic Depth Estimation Using REgional Attention and Local Feature Mapping

PAR-mono: monocular video depth estimation network based on channel separation and dynamic attention

Self-supervised coarse-to-fine monocular depth estimation using a lightweight attention module

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation