Skip to main content

APAC-Net: Unsupervised Learning of Depth and Ego-Motion from Monocular Video

  • Conference paper
  • First Online:
Intelligence Science and Big Data Engineering. Visual Data Engineering (IScIDE 2019)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11935))

Abstract

We propose an unsupervised novel method, Attention-Pixel and Attention-Channel Network (APAC-Net), for unsupervised monocular learning of estimating scene depth and ego-motion. Our model only utilizes monocular image sequences and does not need additional sensor information, such as IMU and GPS, for supervising. The attention mechanism is employed in APAC-Net to improve the networks’ efficiency. Specifically, three attention modules are proposed to adjust feature weights when training. Moreover, to minimum the effect of noise, which is produced in the reconstruction processing, the Image-reconstruction loss based on PSNR \(L_{PSNR}\) is used to evaluation the reconstruction quality. In addition, due to the fail depth estimation of the objects closed to camera, the Temporal-consistency loss \(L_{Temp}\) between adjacent frames and the Scale-based loss \(L_{Scale}\) among different scales are proposed. Experimental results showed APAC-Net can perform well in both the depth and ego-motion tasks, and it even behaved better in several items on KITTI and Cityscapes.

G. Lu—The work is supported by the NSFC fund (61332011), Shenzhen Fundamental Research fund (JCYJ20170811155442454, JCYJ20180306172023949), and Medical Biometrics Perception and Analysis Engineering Laboratory, Shenzhen, China.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Choi, S., Min, D., Ham, B., Kim, Y., Oh, C., Sohn, K.: Depth analogy: data-driven approach for single image depth estimation using gradient samples. IEEE Trans. Image Process. 24(12), 5953–5966 (2015)

    Article  MathSciNet  Google Scholar 

  2. Clark, R., Wang, S., Wen, H., Markham, A., Trigoni, N.: VINet: visual-inertial odometry as a sequence-to-sequence learning problem. National Conference on Artificial Intelligence, pp. 3995–4001 (2017)

    Google Scholar 

  3. Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)

    Google Scholar 

  4. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Neural Information Processing Systems, pp. 2366–2374 (2014)

    Google Scholar 

  5. Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 611–625 (2018)

    Article  Google Scholar 

  6. Garg, R., Kumar, B.G.V., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 740–756. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_45

    Chapter  Google Scholar 

  7. Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)

    Article  Google Scholar 

  8. Godard, C., Aodha, O.M., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: Computer Vision and Pattern Recognition, pp. 6602–6611 (2017)

    Google Scholar 

  9. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

    Google Scholar 

  10. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2015)

    Google Scholar 

  11. Klein, G., Murray, D.W.: Parallel tracking and mapping for small AR workspaces, pp. 1–10 (2007)

    Google Scholar 

  12. Li, R., Wang, S., Long, Z., Gu, D.: UnDeepVO: monocular visual odometry through unsupervised deep learning. In: International Conference on Robotics and Automation, pp. 7286–7291 (2018)

    Google Scholar 

  13. Liu, B., Gould, S., Koller, D.: Single image depth estimation from predicted semantic labels, pp. 1253–1260 (2010)

    Google Scholar 

  14. Mahjourian, R., Wicke, M., Angelova, A.: Unsupervised learning of depth and ego-motion from monocular video using 3D geometric constraints. In: Computer Vision and Pattern Recognition, pp. 5667–5675 (2018)

    Google Scholar 

  15. Murartal, R., Montiel, J.M.M., Tardos, J.D.: ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)

    Article  Google Scholar 

  16. Newcombe, R.A., Lovegrove, S., Davison, A.J.: DTAM: dense tracking and mapping in real-time, pp. 2320–2327 (2011)

    Google Scholar 

  17. Pillai, S., Leonard, J.J.: Towards visual ego-motion learning in robots, pp. 5533–5540 (2017)

    Google Scholar 

  18. Pinard, C., Chevalley, L., Manzanera, A., Filliat, D.: Learning structure-from-motion from motion. Computer Vision and Pattern Recognition, pp. 363–376 (2018). arXiv

    Google Scholar 

  19. Repala, V.K., Dubey, S.R.: Dual CNN models for unsupervised monocular depth estimation. Computer Vision and Pattern Recognition (2018). arXiv

    Google Scholar 

  20. Saxena, A., Sun, M., Ng, A.Y.: Make3D: learning 3D scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2009)

    Article  Google Scholar 

  21. Vijayanarasimhan, S., Ricco, S., Schmid, C., Sukthankar, R., Fragkiadaki, K.: SfM-Net: learning of structure and motion from video. Computer Vision and Pattern Recognition (2017). arXiv

    Google Scholar 

  22. Wang, S., Clark, R., Wen, H., Trigoni, N.: DeepVO: towards end-to-end visual odometry with deep recurrent convolutional neural networks. In: International Conference on Robotics and Automation, pp. 2043–2050 (2017)

    Google Scholar 

  23. Yin, Z., Shi, J.: GeoNet: unsupervised learning of dense depth, optical flow and camera pose. In: Computer Vision and Pattern Recognition, pp. 1983–1992 (2018)

    Google Scholar 

  24. Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: Computer Vision and Pattern Recognition, pp. 6612–6619 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guangming Lu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lin, R., Lu, Y., Lu, G. (2019). APAC-Net: Unsupervised Learning of Depth and Ego-Motion from Monocular Video. In: Cui, Z., Pan, J., Zhang, S., Xiao, L., Yang, J. (eds) Intelligence Science and Big Data Engineering. Visual Data Engineering. IScIDE 2019. Lecture Notes in Computer Science(), vol 11935. Springer, Cham. https://doi.org/10.1007/978-3-030-36189-1_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-36189-1_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-36188-4

  • Online ISBN: 978-3-030-36189-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics