Skip to main content
Log in

MEStereo-Du2CNN: a dual-channel CNN for learning robust depth estimates from multi-exposure stereo images for HDR 3D applications

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Display technologies have evolved over the years. It is critical to develop practical HDR capturing, processing, and display solutions to bring 3D technologies to the next level. Depth estimation of multi-exposure stereo image sequences is an essential task in the development of cost-effective 3D HDR video content. In this paper, we develop a deep architecture for multi-exposure stereo depth estimation. The proposed architecture has two novel components. First, the stereo matching technique used in traditional stereo depth estimation is revamped. For the stereo depth estimation component of our architecture, a mono-to-stereo transfer learning approach is deployed. The proposed formulation circumvents the cost volume construction requirement, which is replaced by a dual-encoder single-decoder CNN with different weights for feature fusion. EfficientNet-based blocks are used to learn the disparity. Secondly, we combine disparity maps obtained from the stereo images at different exposure levels using a robust disparity feature fusion approach. The disparity maps obtained at different exposures are merged using weight maps calculated for different quality measures. The final predicted disparity map obtained is more robust and retains best features that preserve the depth discontinuities. The proposed CNN offers flexibility to train using standard dynamic range stereo data or with multi-exposure low dynamic range stereo sequences. In terms of performance, the proposed model surpasses state-of-the-art monocular and stereo depth estimation methods, both quantitatively and qualitatively, on challenging Scene flow and differently exposed Middlebury stereo datasets. The architecture performs exceedingly well on complex natural scenes, demonstrating its usefulness for diverse 3D HDR applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availability

All data generated or analyzed during this study are included in this article. Additionally, a supplementary information file is provided [25]. We strongly encourage readers to see the supplementary results, which clearly reveal the effectiveness of the proposed approach with respect to state-of-the-art monocular and stereo depth estimation methods.

References

  1. Akhavan, T., Kaufmann, H.: Backward compatible HDR stereo matching: a hybrid tone-mapping-based framework. EURASIP JIVP 1–12, 2015 (2015)

    Google Scholar 

  2. Akhavan, T., Yoo, H., Gelautz, M.: A framework for HDR stereo matching using multi-exposed images. In: HDRi2013, pp. 1–4 (2013)

  3. Alhashim, I., Wonka, P.: High quality monocular depth estimation via transfer learning. arXiv: 1812.11941 (2018)

  4. Anil, R., Sharma, M., Choudhary, R.: Sde-dualenet: a novel dual efficient convolutional neural network for robust stereo depth estimation. In: 2021 International Conference on Visual Communications and Image Processing (VCIP), pp. 1–5 (2021)

  5. Burt, P., Adelson, E.: The Laplacian pyramid as a compact image code. IEEE Trans. Commun. 31(4), 532–540 (1983)

    Article  Google Scholar 

  6. Cantrell, K.J., Miller, C.D., Morato, C.W.: Practical depth estimation with image segmentation and serial U-Nets. In: VEHITS, vol. I, pp. 406–414. INSTICC, SciTePress (2020)

  7. Chang, J.-R., Chen, Y.-S.: Pyramid stereo matching network. In: IEEE CVPR 5410–5418 (2018)

  8. Chari, P., Vadathya, A.K., Mitra, K.: Optimal HDR and depth from dual cameras. arXiv:2003.05907 (2020)

  9. Choudhary, R., Sharma, M., Uma, T.V., Anil, R.: Mestereo-du2cnn: a novel dual channel CNN for learning robust depth estimates from multi-exposure stereo images for HDR 3d applications. arXiv preprint arXiv:2206.10375 (2022)

  10. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)

  11. Diebel, J., Thrun, S.: An application of Markov random fields to range sensing. Adv. Neural Inf. Process. Syst. 18 (2005)

  12. Duggal, S., Wang, S., Ma, W.-C., Hu, R., Urtasun, R.: Deeppruner: learning efficient stereo matching via differentiable patchmatch. In ICCV (2019)

  13. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advance Neural Information Processing System, vol. 27, Curran Associates, Inc. (2014)

  14. Eilertsen, G., Kronander, J., Denes, G., Mantiuk, R., Unger, J.: HDR image reconstruction from a single exposure using deep CNNs. ACM Trans. Graph. 36(6), 1–5 (2017)

    Article  Google Scholar 

  15. Farooq Bhat, S., Alhashim, I., Wonka, P.: Adabins: depth estimation using adaptive bins. In: 2021 IEEE/CVF CVPR, pp. 4008–4017 (2021)

  16. Ferstl, D., Ruther, M., Bischof, H.: Variational depth superresolution using example-based edge representations. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 513–521 (2015)

  17. Garcia, F., Aouada, D., Mirbach, B., Ottersten, B.: Real-time distance-dependent mapping for a hybrid ToF multi-camera rig. IEEE J. Sel. Top. Signal Process. 6(5), 425–436 (2012)

    Article  ADS  Google Scholar 

  18. Hao, Z., Li, Y., You, S., Lu, F.: Detail preserving depth estimation from a single image using attention guided networks. In: 2018 International Conference on 3D Vision (3DV), pp. 304–313 (2018)

  19. Hasinoff, S.W., Durand, F., Freeman, W.T.: Noise-optimal capture for high dynamic range photography. In: IEEE CVPR, pp. 553–560 (2010)

  20. Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 328–341 (2008)

    Article  PubMed  Google Scholar 

  21. Hirschmuller, H., Scharstein, D.: Evaluation of cost functions for stereo matching. In: IEEE CVPR, pp. 1–8 (2007). https://vision.middlebury.edu/stereo/data/scenes2006/

  22. Hu, J., Ozay, M., Zhang, Y., Okatani, T.: Revisiting single image depth estimation: toward higher resolution maps with accurate object boundaries. In: IEEE WACV (2019)

  23. Hui, T.-W., Loy, C.C., Tang, X.: Depth map super-resolution by deep multi-scale guidance. In: European conference on computer vision, pp. 353–369. Springer (2016)

  24. Im, S., Jeon, H.-G., Kweon, I.S.: Robust depth estimation using auto-exposure bracketing. IEEE Trans. Image Process. 28(5), 2451–2464 (2019)

    Article  MathSciNet  ADS  Google Scholar 

  25. Supplementary information file. Available: tinyurl.com/2p95b2dz

  26. Jeong, Y., Park, J., Cho, D., Hwang, Y., Choi, S.B., Kweon, I.S.: Lightweight depth completion network with local similarity-preserving knowledge distillation. Sensors 22(19), 7388 (2022)

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  27. Kalantari, N.K., Ramamoorthi, R.: Deep high dynamic range imaging of dynamic scenes. ACM Trans. Graph. 36(4), 144 (2017)

    Article  Google Scholar 

  28. Kendall, A., Martirosyan, H., Dasgupta, S., Henry P., Kennedy, R.: Abraham Bachrach, and Adam Bry. End-to-end learning of geometry and context for deep stereo regression. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 66–75 (2017)

  29. Kingma, D. P., Ba, J.: Adam: a method for stochastic optimization. Preprint arXiv:1412.6980 (2014)

  30. Kopf, J., Cohen, M., Lischinski, D., Uyttendaele, M.: Joint bilateral upsampling. ACM Trans. Gr. (ToG) 26(3), 96–es (2007)

  31. Kwon, H., Tai, Y.-W., Lin, S.: Data-driven depth map refinement via multi-scale sparse representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 159–167 (2015)

  32. Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: 3DV, pp. 239–248 (2016)

  33. Li, Y., Luo, F., Li, W., Zheng, S., Huan-huan, W., Xiao, C.: Self-supervised monocular depth estimation based on image texture detail enhancement. Vis. Comput. 37(9–11), 2567–2580 (2021)

    Article  Google Scholar 

  34. Li, Z., Liu, X., Drenkow, N., Ding, A., Creighton, F.X., Taylor, R.H., Unberath, M.: Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers. arXiv:2011.02910 (2020)

  35. Liang, Z., Feng, Y., Guo, Y., Liu, H., Chen, W., Qiao, L., Zhou, L., Zhang, J.: Learning for disparity estimation through feature constancy. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2811–2820 (2018)

  36. Lin, G., Milan, A., Shen, C., Reid, I.: Refinenet: multi-path refinement networks for high-resolution semantic segmentation. In: IEEE CVPR 5168–5177 (2017)

  37. Lin, H.-Y., Kao, C.-C.: Stereo matching techniques for high dynamic range image pairs. In: Image and Video Technology 605–616 (2016)

  38. Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: IEEE CVPR, pp. 936–944 (2017)

  39. Liu, Y.-L., Lai, W.-S., Chen, Y.-S., Kao, Y.-L., Yang, M.-H., Chuang, Y.-Y., Huang, J.-B.: Single-image HDR reconstruction by learning to reverse the camera pipeline. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1648–1657 (2020)

  40. Malik, J., Perona, P.: Preattentive texture discrimination with early vision mechanisms. J. Opt. Soc. Am. A 7(5), 923–932 (1990)

    Article  CAS  PubMed  ADS  Google Scholar 

  41. Mayer, N., Ilg, E., Häusser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: 2016 IEEE CVPR, pp. 4040–4048 (2016). https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets.en.html

  42. Mertens, T., Kautz, J., Van Reeth, F.: Exposure fusion. In: 15th Pacific Conference on Computer Graphics and Applications (PG’07), pp. 382–390 (2007)

  43. Min, D., Jiangbo, L., Do, M.N.: Depth video enhancement based on weighted mode filtering. IEEE Trans. Image Process. 21(3), 1176–1190 (2011)

    MathSciNet  PubMed  ADS  Google Scholar 

  44. Mozerov, M.G., van de Weijer, J.: Accurate stereo matching by two-step energy minimization. IEEE Trans. Image Process. 24(3), 1153–1163 (2015)

    Article  MathSciNet  PubMed  ADS  Google Scholar 

  45. Nayana, A., Johnson, A.K.: High dynamic range imaging-a review. Int. J. Image Process. 9(4), 198 (2015)

    Google Scholar 

  46. Ning, S., Xu, H., Song, L., Xie, R., Zhang, W.: Learning an inverse tone mapping network with a generative adversarial regularizer. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1383–1387 (2018)

  47. Ohta, Y., Kanade, T.: Stereo by intra- and inter-scanline search using dynamic programming. IEEE Trans. Pattern Anal. Mach. Intell. PAMI 7(2), 139–154 (1985)

    Article  CAS  Google Scholar 

  48. Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., Koltun, V.: Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. IEEE TPAMI 1 (2020)

  49. Riegler, G., Ferstl, D., Rüther, M., Bischof, H.: A deep primal-dual network for guided depth super-resolution. Preprint arXiv: 1607.08569 (2016)

  50. Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P., Shotton, J., Hodges, Steve, F., Dustin, D., Andrew, et al.: Kinectfusion: real-time 3d reconstruction and interaction using a moving depth camera. In: Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, pp. 559–568 (2011)

  51. Scharstein, D., Szeliski, R., Zabih, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. In: Proceedings IEEE Workshop on Stereo and Multi-baseline Vision (SMBV 2001), pp. 131–140 (2001)

  52. Scharstein, D., Hirschmüller, H., Kitajima, Y., Krathwohl, G., Nešić, N., Wang, X., Westling, P.: High-resolution stereo datasets with subpixel-accurate ground truth. In: Xiaoyi Jiang, Joachim Hornegger, and Reinhard Koch, editors, Pattern Recognition, pp. 31–42 (2014). https://vision.middlebury.edu/stereo/data/scenes2014

  53. Scharstein, D., Pal, C.: Learning conditional random fields for stereo. In: IEEE CVPR, pp. 1–8 (2007). https://vision.middlebury.edu/stereo/data/scenes2005/

  54. Schuon, S., Theobalt, C., Davis, J., Thrun, S.: Lidarboost: depth superresolution for tof 3d shape scanning. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 343–350. IEEE (2009)

  55. Sharma, M., Sharma, A., Tushar, K.R., Panneer, A.: A novel 3d-unet deep learning framework based on high-dimensional bilateral grid for edge consistent single image depth estimation. In: IC3D, pp. 01–08 (2020)

  56. Tan, M., Le, Q.V.: Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv: 1905.11946 (2019)

  57. Wadaskar, A., Sharma, M., Lal, R.: A rich stereoscopic 3d high dynamic range image & video database of natural scenes. In IC3D, pp. 1–8 (2019). https://ieeexplore.ieee.org/document/8975903

  58. Wang, L., Yoon, K.-J..: Deep learning for HDR imaging: state-of-the-art and future trends. IEEE Transactions on Pattern Analysis and Machine Intelligence 1 (2021)

  59. Watson, J., Firman, M., Brostow, G.J., Turmukhambetov, D.: Self-supervised monocular depth hints. In: IEEE ICCV (2019)

  60. Xian, K., Shen, C., Cao, Z., Hao, L., Xiao, Y., Li, R., Luo, Z.: Monocular relative depth perception with web stereo data supervision. In: IEEE CVPR 311–320 (2018)

  61. Xu, D., Ricci, E., Ouyang, W., Wang, X., Sebe, N.: Multi-scale continuous crfs as sequential deep networks for monocular depth estimation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 161–169 (2017)

  62. Xu, D., Wang, W., Tang, H., Liu, H., Sebe, N., Ricci, E.: Structured attention guided convolutional neural fields for monocular depth estimation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3917–3925 (2018)

  63. Yan, J., Zhao, H., Bu, P., Jin, Y.: Channel-wise attention-based network for self-supervised monocular depth estimation. In: 2021 International Conference on 3D Vision (3DV), pp. 464–473 (2021)

  64. Yan, Q., Zhang, L., Liu, Yu., Zhu, Yu., Sun, J., Shi, Q., Zhang, Y.: Deep HDR imaging via a non-local network. IEEE Trans. Image Process. 29, 4308–4322 (2020)

    Article  ADS  Google Scholar 

  65. Yang, G., Manela, J., Happold, M., Ramanan, D.: Hierarchical deep stereo matching on high-resolution images. In: IEEE CVPR 5510–5519 (2019)

  66. Yang, J., Wright, J., Huang, T.S., Ma, Y.: Image super-resolution via sparse representation. IEEE Trans. Image Process. 19(11), 2861–2873 (2010)

    Article  MathSciNet  PubMed  ADS  Google Scholar 

  67. Yang, J., Ye, X., Li, K., Hou, C., Wang, Y.: Color-guided depth recovery from RGB-D data using an adaptive autoregressive model. IEEE Trans. Image Process. 23(8), 3443–3458 (2014)

    Article  MathSciNet  PubMed  Google Scholar 

  68. Zhao, T., Pan, S., Gao, W., Sheng, C., Sun, Y., Wei, J.: Attention unet++ for lightweight depth estimation from sparse depth samples and a single RGB image. Vis. Comput. 38(5), 1619–1630 (2022)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mansi Sharma.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Choudhary, R., Sharma, M., Uma, T.V. et al. MEStereo-Du2CNN: a dual-channel CNN for learning robust depth estimates from multi-exposure stereo images for HDR 3D applications. Vis Comput 40, 2219–2233 (2024). https://doi.org/10.1007/s00371-023-02912-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-02912-z

Keywords

Navigation