Abstract
A new framework called Rigid Depth Constructor (RDC) is proposed, allowing a user to create his own dataset for the validation of depth map estimation algorithms in the context of autonomous navigation. Compared to the existing tools that rely on high quality fixed Lidar sensor, RDC is usable in low-cost setups requiring only a camera and any (e.g. handheld, or UAV-carried) Lidar sensor, which implies more flexible - and much faster - scene scan. Furthermore, unlike photogrammetry tools that use sparse RGB views, it can be applied to smooth videos while remaining computationally tractable. The framework includes a test suite to get insightful information from the evaluated algorithm. As examples, validation videos made from UAV footage are provided to evaluate two depth prediction algorithms initially tested on in-car driving video datasets, which shows that the drone context is dramatically different. This supports the need to benchmark depth estimation algorithms on a dataset that fits one’s particular context, which often means creating a brand new one. An open source implementation accompanies the paper, designed to be as user-friendly as possible, to make depth dataset creation possible even for small teams. The key contributions are the following: (1) a complete, open-source and almost fully automatic software application for creating validation datasets with densely annotated depth, adaptable to a wide variety of image, video and range data; (2) selection tools to adapt the dataset to specific validation needs, and conversion tools to other dataset formats; (3) as use case examples, two new real datasets, outdoor and indoor, readily usable in UAV navigation context are provided, and used as test sets in the evaluation of two depth prediction algorithms, using a collection of comprehensive (e.g. distribution based) metrics.
Similar content being viewed by others
Notes
i.e. when the 3d scene and/or the camera position produces an image where perception of distance or object sizes is ambiguous, or even deceptive (famous extreme examples are the Ame’s room or the Corridor illusion).
References
Besl PJ, McKay ND (1992) A method for registration of 3-d shapes. IEEE Trans Pattern Anal Mach Intell 14(2):239–256
Bian J, Li Z, Wang N, Zhan H, Shen C, Cheng M-M, Reid I (2019) Unsupervised scale-consistent depth and ego-motion learning from monocular video. In: Advances in neural information processing systems. Curran associates inc., vol 32
Butler DJ, Wulff J, Stanley GB, Black MJ (2012) A naturalistic open source movie for optical flow evaluation. In: Fitzgibbon A et al (eds) European conf on computer vision (ECCV), Part IV, LNCS 7577. Springer, pp 611–625
Cai Z, Han J, Liu L, Shao L (2017) RGB-D datasets using Microsoft Kinect or similar sensors: a survey. Multimed Tools Appl 76:4313–4355
Chen Y, Medioni G (1992) Object modelling by registration of multiple range images. Image Vis Comput 10(3):145–155. Range image understanding
Chen Y, Schmid C, Sminchisescu C (2019) Self-supervised learning with geometric constraints in monocular video: connecting flow, depth, and camera. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV)
Clément P (2019) Robust learning of a depth map for obstacle avoidance with a monocular stabilized flying camera. Theses, Université Paris Saclay (COmUE)
Dosovitskiy A, Fischer P, Ilg E, Häusser P, Hazirbaş C, Golkov V, Smagt PVD, Cremers D, Brox T (2015) Flownet: learning optical flow with convolutional networks. In: IEEE international conference on computer vision (ICCV)
Eigen D, Puhrsch RFC (2014) Depth map prediction from a single image using a multi-scale deep network. Adv Neural Inf Process Syst 27:2366–2374
Fragkiadaki A, Seybold B, Schmid C, Sukthankar R, Vijayanarasimhan S, Ricco S (2017) Self-supervised learning of structure and motion from video. In: arxiv (2017)
Fu H, Gong M, Wang C, Batmanghelich K, Tao D (2018) Deep ordinal regression network for monocular depth estimation. In: IEEE conference on computer vision and pattern recognition (CVPR)
Garg R, Kumar BGV, Carneiro G, Reid I (2016) Unsupervised cnn for single view depth estimation: geometryto the rescue. In: European conference on computer vision. Springer, pp 740–756
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 3354–3361
Godard C, Aodha OM, Firman M, Brostow GJ (2019) Digging into self-supervised monocular depth prediction. In: The international conference on computer vision (ICCV), October
Gollob C, Ritter T, Nothdurft A (2020) Comparison of 3D point clouds obtained by terrestrial laser scanning and personal laser scanning on forest inventory sample plots. MDPI - Data, vol 5(4)
Gordon A, Li H, Jonschkowski R, Angelova A (2019) Depth from videos in the wild: Unsupervised monocular depth learning from unknown cameras. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV)
Hanhan L, Ariel G, Hang Z, Vincent C, Anelia A (2021) Unsupervised monocular depth learning in dynamic scenes. In: Kober J, Ramos F, Tomlin C (eds) Proceedings of the 2020 conference on robot learning, vol 155 of proceedings of machine learning research, pp 1908–1917
Hartley R, Zisserman A (2004) Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge
Jonas U, Nick S, Lukas S, Uwe F, Thomas B, Andreas G (2017) Sparsity invariant cnns. In: International conference on 3D vision (3DV)
Kazhdan M, Bolitho M, Hoppe H (2006) Poisson surface reconstruction. In: Proceedings of the fourth Eurographics symposium on Geometry processing, vol 7
Keyang Z, Kailun Y, Kaiwei W (2021) Panoramic depth estimation via supervised and unsupervised learning in indoor scenes. Appl Optics 60 (26):8188–8197
Knapitsch A, Park J, Zhou Q-Y, Koltun V (2017) Tanks and temples: benchmarking large-scale scene reconstruction. ACM Trans Graph, vol 36(4)
Kraus K, Harley IA, Kyle S (2011) Photogrammetry: Geometry from Images and Laser Scans. De Gruyter, Berlin
Kristan M, Matas J, Leonardis A, Vojir T, Pflugfelder R, Fernandez G, Nebehay G, Porikli F, Čehovin L (2016) A novel performance evaluation methodology for single-target trackers. IEEE Trans Pattern Anal Mach Intell 38(11):2137–2155
Labatut P, Pons J-P, Keriven R (2009) Robust and efficient surface reconstruction from range. Data Comput Graph Forum
Lee JH, Han M-K, Ko DW, Suh IH (2019) From big to small: multi-scale local planar guidance for monocular depth estimation. arXiv:1907.10326
Li H, Gordon A, Zhao H, Casser V, Angelova A (2020) Unsupervised monocular depth learning in dynamic scenes. In: Conference on robot learning (CoRL)
Lopez BT, Jonathan PH (2017) Aggressive 3-d collision avoidance for high-speed navigation. In: IEEE international conference on robotics and automation ICRA, IEEE, pp 5759–5765
Macqueen J (1967) Some methods for classification and analysis of multivariate observations. In: 5-th Berkeley symposium on mathematical statistics and probability, pp 281–297
Matteo P, Filippo A, Fabio T, Stefano M (2020) On the uncertainty of self-supervised monocular depth estimation. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Mur-Artal R, Montiel JMM, Tardós JD (2015) ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans Robot 31(5):1147–1163
Nalpantidis L, Kostavelis I, Gasteratos A (2009) Stereovision-based algorithm for obstacle avoidance. In: International conference on intelligent robotics and applications. Springer, pp 195–204
Payen de La Garanderie G, Atapour-Abarghouei A, Breckon TP (2018) Eliminating the blind spot: adapting 3D object detection and monocular depth estimation to 360∘ panoramic imagery. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision – ECCV 2018. Cham, Springer international publishing, pp 812–830
Pinard C, Chevalley L, Manzanera A, Filliat D (2017) End-to-end depth from motion with stabilized monocular videos. In: ISPRS annals of photogrammetry remote sensing and spatial information sciences, IV-2/W3, pp 67–74
Pinard C, Chevalley L, Manzanera A, Filliat D (2018) Learning structure-from-motion from motion. In: Proceedings of the european conference on computer vision (ECCV)
Savitzky A, Golay MJE (1964) Smoothing and differentiation of data by simplified least squares procedures. Anal Chem 36(8):1627–1639
Saxena A, Chung SH, AY N (2008) 3-d depth reconstruction from a single still image. Int J Comp Vision 76(1):53–69
Schilling H, Gutsche M, Brock A, Späth D, Rother C, Krispin K (2020) Mind the gap - a benchmark for dense depth prediction beyond lidar. In: 2020 IEEE conference on computer vision and pattern recognition workshops (CVPRW), volume in press
Schönberger JL, Frahm J-M (2016) Structure-from-motion revisited. In: European conference on computer vision and pattern recognition (CVPR)
Schönberger JL, Price T, Sattler T, Frahm J-M, Pollefeys M (2016) A vote-and-verify strategy for fast spatial verification in image retrieval. In: Asian conference on computer vision (ACCV)
Schönberger JL, Zheng E, Pollefeys M, Frahm J-M (2016) Pixelwise view selection for unstructured multi-view stereo. In: European conference on computer vision (ECCV)
Schöps T, Schönberger JL, Galliani S, Sattler T, Schindler K, Pollefeys M, Geiger A (2017) A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: Conference on computer vision and pattern recognition (CVPR)
Serdar GM, Panus N (2014) New Technique for distance estimation using SIFT for mobile robots. In: 2014 International electrical engineering congress (iEECON) pp 1–4
Shan T, Englot B, Meyers D, Wang W, Ratti C, Daniela R (2020) Lio-sam: tightly-coupled lidar inertial odometry via smoothing and mapping. In: IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp 5135–5142
Silberman PKN, Hoiem D, Fergus R (2012) Indoor segmentation and support inference from rgbd images. In: ECCV
Tam GKL, Cheng Z, Lai Y, Langbein FC, Liu Y, Marshall D, Martin RR, Sun X, Rosin PL (2013) Registrationof 3d point clouds and meshes: a survey from rigid to nonrigid. IEEE Trans Visual Comput Graph 19(7):1199–1217
Van Dijk T, De Croon G (2019) How do neural networks see depth in single images. In: Proceedings of the IEEE CVF international conference on computer vision (ICCV)
Vasiljevic I, Kolkin N, Zhang S, Luo R, Wang H, Dai FZ, Daniele AF, Mostajabi M, Basart S, Walter MR, Shakhnarovich G (2019) DIODE: a Dense Indoor and Outdoor DEpth Dataset 1908.0463
Yin Z, GeoNet JS (2018) Unsupervised learning of dense depth, optical flow and camera pose. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego-motion from video. In: CVPR
Acknowledgements
Acquisitions for the Manoir dataset were made in collaboration with AIRD’ECO-DroneFootnote 8 company, thanks to the financial support of ParrotFootnote 9 company. Acquisitions for the University hall dataset were made entirely by Clément Pinard, thanks to the equipment and training provided by GeomesureFootnote 10 company.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest or competing interest related to this work.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Pinard, C., Manzanera, A. Does it work outside this benchmark? Introducing the rigid depth constructor tool. Multimed Tools Appl 82, 41641–41667 (2023). https://doi.org/10.1007/s11042-023-14743-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-14743-0