Abstract
Stereo estimation has made many advancements in recent years with the introduction of deep-learning. However the traditional supervised approach to deep-learning requires the creation of accurate and plentiful ground-truth data, which is expensive to create and not available in many situations. This is especially true for remote sensing applications, where there is an excess of available data without proper ground truth. To tackle this problem, we propose a self-supervised CNN with self-improving adaptive abilities. In the first iteration, the created disparity map is inaccurate and noisy. Leveraging the left-right consistency check, we get a sparse but more accurate disparity map which is used as an initial pseudo ground-truth. This pseudo ground-truth is then adapted and updated after every epoch in the training step of the network. We use the sum of inconsistent points in order to track the network convergence. The code for our method will be made available after acceptance at https://github.com/thedodo/SAda-Net
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Hirschmueller, H.: Accurate and efficient stereo processing by semi-global matching and mutual information. IEEE Computer Society Conf. Comput. Vis. Pattern Recog. (CVPR 2005), vol. 2, pp. 807-814 (2005)
Facciolo, G., et al.: MGM: A significantly more global matching for stereovision. In: BMVC 2015 (2015)
Hu, L., et al.: Model generalization in deep learning applications for land cover mapping, arXiv preprint arXiv:2008.10351 (2020)
Godard, C., et al.: Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Godard, C., et al.: Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2019)
Chang, C., et al.: On an analysis of static occlusion in stereo vision. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 722-723 (1991)
Zabih, R., Woodfill, J.: Non-parametric local transforms for computing visual correspondence. In: Eklundh, J.-O. (ed.) ECCV 1994. LNCS, vol. 801, pp. 151–158. Springer, Heidelberg (1994). https://doi.org/10.1007/BFb0028345
Min, D., Sohn, K.: Cost aggregation and occlusion handling with WLS in stereo matching. IEEE Trans. Image Process. 17(8), 1431–1442 (2008)
Ohta, Y., Kanade, T.: (1985) Stereo by intra-and inter-scanline search using dynamic programming. IEEE Trans. Pattern Anal. Mach. Intell. 2, 139–154 (1985)
Sun, J., et al.: Stereo matching using belief propagation. IEEE Trans. Pattern Analy. Mach. Intell. 25(7), 787–800 (2003)
Le Saux, B., et al.: IEEE Dataport Data Fusion Contest 2019 (DFC2019) (2019). https://dx.doi.org/10.21227/c6tm-vw12
Mari, R., et al.: Disparity estimation networks for aerial and high-resolution satellite images: a review. Image Process. Line 12, 501–526 (2022)
De Franchis, C., et al.: An automatic and modular stereo pipeline for pushbroom images. ISPRS Annals Photogrammetry, Remote Sensing Spatial Inform. Sci. 2, 49–56 (2014)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. Adv. Neural. Inf. Process. Syst. 32, 8024–8035 (2019)
Zbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17, 1–32 (2016)
Hirner, D., Fraundorfer, F.: FC-DCNN: a densely connected neural network for stereo estimation. In: 2020 25th International Conference on Pattern Recognition (ICPR). IEEE (2021)
Hirner. D., Fraundorfer, F.: FCDSN-DC: an accurate and lightweight convolutional neural network for stereo estimation with depth completion. In: 2022 26th International Conference on Pattern Recognition (ICPR). IEEE (2022)
Zhang, F., et al.: Ga-net: Guided aggregation net for end-to-end stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)
Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Kendall, A., et al.: End-to-end learning of geometry and context for deep stereo regression. In: Proceedings of the IEEE International Conference on Computer Vision (2017)
G. Bradski. Open Source Computer Vision Library. 2015
Wang, H., et al.: PVStereo: pyramid voting module for end-to-end self-supervised stereo matching. IEEE Robot. Autom. Lett. 6(3), 4353-4360 (2021)
Lipson, L., et al.: Raft-stereo: Multilevel recurrent field transforms for stereo matching. In: 2021 International Conference on 3D Vision (3DV). IEEE (2021)
Bosch, M., et al.: Metric evaluation pipeline for 3d modeling of urban scenes. Inter. Arch. Photogrammetry, Remote Sensing Spatial Inform. Sci. 42, 239–246 (2017)
Longbotham, N., et al.: Measuring the spatial and spectral performance of WorldView-3. Hyperspectral Imaging and Sounding of the Environment. Optica Publishing Group (2015)
SpaceNet on Amazon Web Services (AWS). “Datasets.” The SpaceNet Catalog. Last modified October 1st (2018). https://spacenet.ai/datasets/, Accessed 5 November 2024
Miclea, V.C., et al.: New sub-pixel interpolation functions for accurate real-time stereo-matching algorithms. In: 2015 IEEE International Conference on Intelligent Computer Communication and Processing (ICCP). IEEE (2015)
Ma F., et al.: Self-supervised sparse-to-dense: self-supervised depth completion from lidar and monocular camera. In: 2019 International Conference on Robotics and Automation (ICRA). IEEE (2019)
Knöbelreiter, P., et al.: Self-supervised learning for stereo reconstruction on aerial images. In: IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium. IEEE (2018)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010) (2010)
Rottensteiner, F., et al.: Isprs test project on urban classification and 3d building reconstruction. Commission III-Photogrammetric Computer Vision and Image Analysis (2013)
Knobelreiter, P., et al.: End-to-end training of hybrid CNN-CRF models for stereo. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Scharstein, D., et al.: High-resolution stereo datasets with subpixel-accurate ground truth. In: German Conference on Pattern Recognition. Springer, Cham (2014)
Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hirner, D., Fraundorfer, F. (2025). SAda-Net: A Self-supervised Adaptive Stereo Estimation CNN For Remote Sensing Image Data. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15310. Springer, Cham. https://doi.org/10.1007/978-3-031-78192-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-78192-6_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78191-9
Online ISBN: 978-3-031-78192-6
eBook Packages: Computer ScienceComputer Science (R0)