MEStereo-Du2CNN: a dual-channel CNN for learning robust depth estimates from multi-exposure stereo images for HDR 3D applications

Choudhary, Rohit; Sharma, Mansi; Uma, T. V.; Anil, Rithvik

doi:10.1007/s00371-023-02912-z

MEStereo-Du2CNN: a dual-channel CNN for learning robust depth estimates from multi-exposure stereo images for HDR 3D applications

Original article
Published: 05 July 2023

Volume 40, pages 2219–2233, (2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

225 Accesses
1 Citation
Explore all metrics

Abstract

Display technologies have evolved over the years. It is critical to develop practical HDR capturing, processing, and display solutions to bring 3D technologies to the next level. Depth estimation of multi-exposure stereo image sequences is an essential task in the development of cost-effective 3D HDR video content. In this paper, we develop a deep architecture for multi-exposure stereo depth estimation. The proposed architecture has two novel components. First, the stereo matching technique used in traditional stereo depth estimation is revamped. For the stereo depth estimation component of our architecture, a mono-to-stereo transfer learning approach is deployed. The proposed formulation circumvents the cost volume construction requirement, which is replaced by a dual-encoder single-decoder CNN with different weights for feature fusion. EfficientNet-based blocks are used to learn the disparity. Secondly, we combine disparity maps obtained from the stereo images at different exposure levels using a robust disparity feature fusion approach. The disparity maps obtained at different exposures are merged using weight maps calculated for different quality measures. The final predicted disparity map obtained is more robust and retains best features that preserve the depth discontinuities. The proposed CNN offers flexibility to train using standard dynamic range stereo data or with multi-exposure low dynamic range stereo sequences. In terms of performance, the proposed model surpasses state-of-the-art monocular and stereo depth estimation methods, both quantitatively and qualitatively, on challenging Scene flow and differently exposed Middlebury stereo datasets. The architecture performs exceedingly well on complex natural scenes, demonstrating its usefulness for diverse 3D HDR applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A self-supervised method of single-image depth estimation by feeding forward information using max-pooling layers

Article 09 April 2020

Du2Net: Learning Depth Estimation from Dual-Cameras and Dual-Pixels

DSC-MVSNet: attention aware cost volume regularization based on depthwise separable convolution for multi-view stereo

Article Open access 07 June 2023

Data Availability

All data generated or analyzed during this study are included in this article. Additionally, a supplementary information file is provided [25]. We strongly encourage readers to see the supplementary results, which clearly reveal the effectiveness of the proposed approach with respect to state-of-the-art monocular and stereo depth estimation methods.

References

Akhavan, T., Kaufmann, H.: Backward compatible HDR stereo matching: a hybrid tone-mapping-based framework. EURASIP JIVP 1–12, 2015 (2015)
Google Scholar
Akhavan, T., Yoo, H., Gelautz, M.: A framework for HDR stereo matching using multi-exposed images. In: HDRi2013, pp. 1–4 (2013)
Alhashim, I., Wonka, P.: High quality monocular depth estimation via transfer learning. arXiv: 1812.11941 (2018)
Anil, R., Sharma, M., Choudhary, R.: Sde-dualenet: a novel dual efficient convolutional neural network for robust stereo depth estimation. In: 2021 International Conference on Visual Communications and Image Processing (VCIP), pp. 1–5 (2021)
Burt, P., Adelson, E.: The Laplacian pyramid as a compact image code. IEEE Trans. Commun. 31(4), 532–540 (1983)
Article Google Scholar
Cantrell, K.J., Miller, C.D., Morato, C.W.: Practical depth estimation with image segmentation and serial U-Nets. In: VEHITS, vol. I, pp. 406–414. INSTICC, SciTePress (2020)
Chang, J.-R., Chen, Y.-S.: Pyramid stereo matching network. In: IEEE CVPR 5410–5418 (2018)
Chari, P., Vadathya, A.K., Mitra, K.: Optimal HDR and depth from dual cameras. arXiv:2003.05907 (2020)
Choudhary, R., Sharma, M., Uma, T.V., Anil, R.: Mestereo-du2cnn: a novel dual channel CNN for learning robust depth estimates from multi-exposure stereo images for HDR 3d applications. arXiv preprint arXiv:2206.10375 (2022)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Diebel, J., Thrun, S.: An application of Markov random fields to range sensing. Adv. Neural Inf. Process. Syst. 18 (2005)
Duggal, S., Wang, S., Ma, W.-C., Hu, R., Urtasun, R.: Deeppruner: learning efficient stereo matching via differentiable patchmatch. In ICCV (2019)
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advance Neural Information Processing System, vol. 27, Curran Associates, Inc. (2014)
Eilertsen, G., Kronander, J., Denes, G., Mantiuk, R., Unger, J.: HDR image reconstruction from a single exposure using deep CNNs. ACM Trans. Graph. 36(6), 1–5 (2017)
Article Google Scholar
Farooq Bhat, S., Alhashim, I., Wonka, P.: Adabins: depth estimation using adaptive bins. In: 2021 IEEE/CVF CVPR, pp. 4008–4017 (2021)
Ferstl, D., Ruther, M., Bischof, H.: Variational depth superresolution using example-based edge representations. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 513–521 (2015)
Garcia, F., Aouada, D., Mirbach, B., Ottersten, B.: Real-time distance-dependent mapping for a hybrid ToF multi-camera rig. IEEE J. Sel. Top. Signal Process. 6(5), 425–436 (2012)
Article ADS Google Scholar
Hao, Z., Li, Y., You, S., Lu, F.: Detail preserving depth estimation from a single image using attention guided networks. In: 2018 International Conference on 3D Vision (3DV), pp. 304–313 (2018)
Hasinoff, S.W., Durand, F., Freeman, W.T.: Noise-optimal capture for high dynamic range photography. In: IEEE CVPR, pp. 553–560 (2010)
Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 328–341 (2008)
Article PubMed Google Scholar
Hirschmuller, H., Scharstein, D.: Evaluation of cost functions for stereo matching. In: IEEE CVPR, pp. 1–8 (2007). https://vision.middlebury.edu/stereo/data/scenes2006/
Hu, J., Ozay, M., Zhang, Y., Okatani, T.: Revisiting single image depth estimation: toward higher resolution maps with accurate object boundaries. In: IEEE WACV (2019)
Hui, T.-W., Loy, C.C., Tang, X.: Depth map super-resolution by deep multi-scale guidance. In: European conference on computer vision, pp. 353–369. Springer (2016)
Im, S., Jeon, H.-G., Kweon, I.S.: Robust depth estimation using auto-exposure bracketing. IEEE Trans. Image Process. 28(5), 2451–2464 (2019)
Article MathSciNet ADS Google Scholar
Supplementary information file. Available: tinyurl.com/2p95b2dz
Jeong, Y., Park, J., Cho, D., Hwang, Y., Choi, S.B., Kweon, I.S.: Lightweight depth completion network with local similarity-preserving knowledge distillation. Sensors 22(19), 7388 (2022)
Article PubMed PubMed Central ADS Google Scholar
Kalantari, N.K., Ramamoorthi, R.: Deep high dynamic range imaging of dynamic scenes. ACM Trans. Graph. 36(4), 144 (2017)
Article Google Scholar
Kendall, A., Martirosyan, H., Dasgupta, S., Henry P., Kennedy, R.: Abraham Bachrach, and Adam Bry. End-to-end learning of geometry and context for deep stereo regression. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 66–75 (2017)
Kingma, D. P., Ba, J.: Adam: a method for stochastic optimization. Preprint arXiv:1412.6980 (2014)
Kopf, J., Cohen, M., Lischinski, D., Uyttendaele, M.: Joint bilateral upsampling. ACM Trans. Gr. (ToG) 26(3), 96–es (2007)
Kwon, H., Tai, Y.-W., Lin, S.: Data-driven depth map refinement via multi-scale sparse representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 159–167 (2015)
Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: 3DV, pp. 239–248 (2016)
Li, Y., Luo, F., Li, W., Zheng, S., Huan-huan, W., Xiao, C.: Self-supervised monocular depth estimation based on image texture detail enhancement. Vis. Comput. 37(9–11), 2567–2580 (2021)
Article Google Scholar
Li, Z., Liu, X., Drenkow, N., Ding, A., Creighton, F.X., Taylor, R.H., Unberath, M.: Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers. arXiv:2011.02910 (2020)
Liang, Z., Feng, Y., Guo, Y., Liu, H., Chen, W., Qiao, L., Zhou, L., Zhang, J.: Learning for disparity estimation through feature constancy. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2811–2820 (2018)
Lin, G., Milan, A., Shen, C., Reid, I.: Refinenet: multi-path refinement networks for high-resolution semantic segmentation. In: IEEE CVPR 5168–5177 (2017)
Lin, H.-Y., Kao, C.-C.: Stereo matching techniques for high dynamic range image pairs. In: Image and Video Technology 605–616 (2016)
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: IEEE CVPR, pp. 936–944 (2017)
Liu, Y.-L., Lai, W.-S., Chen, Y.-S., Kao, Y.-L., Yang, M.-H., Chuang, Y.-Y., Huang, J.-B.: Single-image HDR reconstruction by learning to reverse the camera pipeline. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1648–1657 (2020)
Malik, J., Perona, P.: Preattentive texture discrimination with early vision mechanisms. J. Opt. Soc. Am. A 7(5), 923–932 (1990)
Article CAS PubMed ADS Google Scholar
Mayer, N., Ilg, E., Häusser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: 2016 IEEE CVPR, pp. 4040–4048 (2016). https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets.en.html
Mertens, T., Kautz, J., Van Reeth, F.: Exposure fusion. In: 15th Pacific Conference on Computer Graphics and Applications (PG’07), pp. 382–390 (2007)
Min, D., Jiangbo, L., Do, M.N.: Depth video enhancement based on weighted mode filtering. IEEE Trans. Image Process. 21(3), 1176–1190 (2011)
MathSciNet PubMed ADS Google Scholar
Mozerov, M.G., van de Weijer, J.: Accurate stereo matching by two-step energy minimization. IEEE Trans. Image Process. 24(3), 1153–1163 (2015)
Article MathSciNet PubMed ADS Google Scholar
Nayana, A., Johnson, A.K.: High dynamic range imaging-a review. Int. J. Image Process. 9(4), 198 (2015)
Google Scholar
Ning, S., Xu, H., Song, L., Xie, R., Zhang, W.: Learning an inverse tone mapping network with a generative adversarial regularizer. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1383–1387 (2018)
Ohta, Y., Kanade, T.: Stereo by intra- and inter-scanline search using dynamic programming. IEEE Trans. Pattern Anal. Mach. Intell. PAMI 7(2), 139–154 (1985)
Article CAS Google Scholar
Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., Koltun, V.: Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. IEEE TPAMI 1 (2020)
Riegler, G., Ferstl, D., Rüther, M., Bischof, H.: A deep primal-dual network for guided depth super-resolution. Preprint arXiv: 1607.08569 (2016)
Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P., Shotton, J., Hodges, Steve, F., Dustin, D., Andrew, et al.: Kinectfusion: real-time 3d reconstruction and interaction using a moving depth camera. In: Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, pp. 559–568 (2011)
Scharstein, D., Szeliski, R., Zabih, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. In: Proceedings IEEE Workshop on Stereo and Multi-baseline Vision (SMBV 2001), pp. 131–140 (2001)
Scharstein, D., Hirschmüller, H., Kitajima, Y., Krathwohl, G., Nešić, N., Wang, X., Westling, P.: High-resolution stereo datasets with subpixel-accurate ground truth. In: Xiaoyi Jiang, Joachim Hornegger, and Reinhard Koch, editors, Pattern Recognition, pp. 31–42 (2014). https://vision.middlebury.edu/stereo/data/scenes2014
Scharstein, D., Pal, C.: Learning conditional random fields for stereo. In: IEEE CVPR, pp. 1–8 (2007). https://vision.middlebury.edu/stereo/data/scenes2005/
Schuon, S., Theobalt, C., Davis, J., Thrun, S.: Lidarboost: depth superresolution for tof 3d shape scanning. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 343–350. IEEE (2009)
Sharma, M., Sharma, A., Tushar, K.R., Panneer, A.: A novel 3d-unet deep learning framework based on high-dimensional bilateral grid for edge consistent single image depth estimation. In: IC3D, pp. 01–08 (2020)
Tan, M., Le, Q.V.: Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv: 1905.11946 (2019)
Wadaskar, A., Sharma, M., Lal, R.: A rich stereoscopic 3d high dynamic range image & video database of natural scenes. In IC3D, pp. 1–8 (2019). https://ieeexplore.ieee.org/document/8975903
Wang, L., Yoon, K.-J..: Deep learning for HDR imaging: state-of-the-art and future trends. IEEE Transactions on Pattern Analysis and Machine Intelligence 1 (2021)
Watson, J., Firman, M., Brostow, G.J., Turmukhambetov, D.: Self-supervised monocular depth hints. In: IEEE ICCV (2019)
Xian, K., Shen, C., Cao, Z., Hao, L., Xiao, Y., Li, R., Luo, Z.: Monocular relative depth perception with web stereo data supervision. In: IEEE CVPR 311–320 (2018)
Xu, D., Ricci, E., Ouyang, W., Wang, X., Sebe, N.: Multi-scale continuous crfs as sequential deep networks for monocular depth estimation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 161–169 (2017)
Xu, D., Wang, W., Tang, H., Liu, H., Sebe, N., Ricci, E.: Structured attention guided convolutional neural fields for monocular depth estimation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3917–3925 (2018)
Yan, J., Zhao, H., Bu, P., Jin, Y.: Channel-wise attention-based network for self-supervised monocular depth estimation. In: 2021 International Conference on 3D Vision (3DV), pp. 464–473 (2021)
Yan, Q., Zhang, L., Liu, Yu., Zhu, Yu., Sun, J., Shi, Q., Zhang, Y.: Deep HDR imaging via a non-local network. IEEE Trans. Image Process. 29, 4308–4322 (2020)
Article ADS Google Scholar
Yang, G., Manela, J., Happold, M., Ramanan, D.: Hierarchical deep stereo matching on high-resolution images. In: IEEE CVPR 5510–5519 (2019)
Yang, J., Wright, J., Huang, T.S., Ma, Y.: Image super-resolution via sparse representation. IEEE Trans. Image Process. 19(11), 2861–2873 (2010)
Article MathSciNet PubMed ADS Google Scholar
Yang, J., Ye, X., Li, K., Hou, C., Wang, Y.: Color-guided depth recovery from RGB-D data using an adaptive autoregressive model. IEEE Trans. Image Process. 23(8), 3443–3458 (2014)
Article MathSciNet PubMed Google Scholar
Zhao, T., Pan, S., Gao, W., Sheng, C., Sun, Y., Wei, J.: Attention unet++ for lightweight depth estimation from sparse depth samples and a single RGB image. Vis. Comput. 38(5), 1619–1630 (2022)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Coimbatore, India
Mansi Sharma
Department of Electrical Engineering, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
Rohit Choudhary & Mansi Sharma
Department of Mechanical Engineering, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
T. V. Uma & Rithvik Anil

Authors

Rohit Choudhary
View author publications
You can also search for this author in PubMed Google Scholar
Mansi Sharma
View author publications
You can also search for this author in PubMed Google Scholar
T. V. Uma
View author publications
You can also search for this author in PubMed Google Scholar
Rithvik Anil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mansi Sharma.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Choudhary, R., Sharma, M., Uma, T.V. et al. MEStereo-Du2CNN: a dual-channel CNN for learning robust depth estimates from multi-exposure stereo images for HDR 3D applications. Vis Comput 40, 2219–2233 (2024). https://doi.org/10.1007/s00371-023-02912-z

Download citation

Accepted: 27 April 2023
Published: 05 July 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s00371-023-02912-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MEStereo-Du2CNN: a dual-channel CNN for learning robust depth estimates from multi-exposure stereo images for HDR 3D applications

Abstract

Access this article

Similar content being viewed by others

A self-supervised method of single-image depth estimation by feeding forward information using max-pooling layers

Du2Net: Learning Depth Estimation from Dual-Cameras and Dual-Pixels

DSC-MVSNet: attention aware cost volume regularization based on depthwise separable convolution for multi-view stereo

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

MEStereo-Du2CNN: a dual-channel CNN for learning robust depth estimates from multi-exposure stereo images for HDR 3D applications

Abstract

Access this article

Similar content being viewed by others

A self-supervised method of single-image depth estimation by feeding forward information using max-pooling layers

Du2Net: Learning Depth Estimation from Dual-Cameras and Dual-Pixels

DSC-MVSNet: attention aware cost volume regularization based on depthwise separable convolution for multi-view stereo

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation