Skip to main content
Log in

CMSNet: Deep Color and Monochrome Stereo

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

In this paper, we propose an end-to-end convolutional neural network for stereo matching with color and monochrome cameras, called CMSNet (Color and Monochrome Stereo Network). Both cameras have the same structure except for the presence of a Bayer filter, but have a fundamental trade-off. The Bayer filter allows capturing chrominance information of scenes, but limits a quantum efficiency of cameras, which causes severe image noise. It seems ideal if we can take advantage of both the cameras so that we obtain noise-free images with their corresponding disparity maps. However, image luminance recorded from a color camera is not consistent with that from a monochrome camera due to spatially-varying illumination and different spectral sensitivities of the cameras. This degrades the performance of stereo matching. To solve this problem, we design CMSNet for disparity estimation from noisy color and relatively clean monochrome images. CMSNet also infers a noise-free image with the estimated disparity map. We leverage a data augmentation to simulate realistic signal-dependent noise and various radiometric distortions between input stereo pairs to train CMSNet effectively. CMSNet is evaluated using various datasets and the performance of our disparity estimation and image enhancement consistently outperforms state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Downloaded from https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets.en.html.

  2. We randomly crop the images to augment and generate occlusion maps by crosschecking a pair of disparity maps.

  3. http://www.vision.caltech.edu/bouguetj/calib_doc/index.html.

  4. Downloaded from https://github.com/tiancheng-zhi/cs-stereo.

References

  • Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. https://www.tensorflow.org/, software available from tensorflow.org

  • Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & Susstrunk, S. (2007). Multiplexing for optimal lighting. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 29(8), 1339–1354.

    Article  Google Scholar 

  • Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & Susstrunk, S. (2012). Slic superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 34(11), 2274–2282.

    Article  Google Scholar 

  • Buades, A., Coll, B., Morel, J. M. (2005). A non-local algorithm for image denoising. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).

  • Chakrabarti, A., Freeman, W.T., Zickler, T. (2014). Rethinking color cameras. In Proceedings of IEEE International Conference on Computational Photography (ICCP).

  • Chang, J. R., Chen, Y. S. (2018). Pyramid stereo matching network. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5410–5418.

  • Dabov, K., Foi, A., Katkovnik, V., & Egiazarian, K. (2007). Image denoising by sparse 3-d transform-domain collaborative filtering. IEEE Transactions on Image Processing (TIP), 16(8), 2080–2095.

    Article  MathSciNet  Google Scholar 

  • Dalal, N., Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).

  • Dong, X., Li, W., Wang, X., Wang, Y. (2019). Learning a deep convolutional network for colorization in monochrome-color dual-lens system. In Proceedings of AAAI Conference on Artificial Intelligence (AAAI).

  • Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., Van Der Smagt, P., Cremers, D., Brox, T. (2015). Flownet: Learning optical flow with convolutional networks. In Proceedings of IEEE International Conference on Computer Vision (ICCV).

  • Flea3. (2017). GigE imaging performance specification. http://www.ptgrey.com/support/downloads/10109/

  • Gallo, O., Gelfandz, N., Chen, W. C., Tico, M., Pulli, K. (2009). Artifact-free high dynamic range imaging. In Proceedings of IEEE International Conference on Computational Photography (ICCP).

  • Gastal, E. S., & Oliveira, M. M. (2011). Domain transform for edge-aware image and video processing. ACM Transactions on Graphics (TOG), 30, 69.

    Article  Google Scholar 

  • Hasinoff, S. W., Sharlet, D., Geiss, R., Adams, A., Barron, J. T., Kainz, F., et al. (2016). Burst photography for high dynamic range and low-light imaging on mobile cameras. ACM Transactions on Graphics (TOG), 35(6), 192.

    Article  Google Scholar 

  • He, M., Chen, D., Liao, J., Sander, P. V., & Yuan, L. (2018). Deep exemplar-based colorization. ACM Transactions on Graphics (TOG), 37(4), 47.

    Google Scholar 

  • Heo, Y. S., Lee, K. M., & Lee, S. U. (2011). Robust stereo matching using adaptive normalized cross-correlation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 33(4), 807–822.

    Article  Google Scholar 

  • Heo, Y. S., Lee, K. M., & Lee, S. U. (2013). Joint depth map and color consistency estimation for stereo images with different illuminations and cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 35(5), 1094–1106.

    Article  Google Scholar 

  • Hirschmüller, H., & Scharstein, D. (2009). Evaluation of stereo matching costs on images with radiometric differences. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 31(9), 1582–1599.

    Article  Google Scholar 

  • Holloway, J., Mitra, K., Koppal, S. J., & Veeraraghavan, A. N. (2015). Generalized assorted camera arrays: Robust cross-channel registration and applications. IEEE Transactions on Image Processing (TIP), 24(3), 823–835.

    Article  MathSciNet  Google Scholar 

  • Hu, J., Gallo, O., Pulli, K., Sun, X. (2013). Hdr deghosting: How to deal with saturation? In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).

  • Huawei P9. (2016). https://consumer.huawei.com/uk/phones/p9/

  • Im, S., Jeon, H. G., Lin, S., Kweon, I. S. (2019b). Dpsnet: End-to-end deep plane sweep stereo. In International Conference on Learning Representations (ICLR).

  • Im, S., Jeon, H. G., & Kweon, I. S. (2019). Robust depth estimation using auto-exposure bracketing. IEEE Transactions on Image Processing (TIP), 28(5), 2451–2464.

    Article  MathSciNet  Google Scholar 

  • Immerkaer, J. (1996). Fast noise variance estimation. Computer Vision and Image Understanding (CVIU), 64(2), 300–302.

    Article  Google Scholar 

  • iphone XS. (2018). https://www.apple.com/iphone-xs/

  • Irony, R., Cohen-Or, D., Lischinski, D. (2005). Colorization by example. In Eurographics Symposium on Rendering, vol. 2.

  • Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K. (2015). Spatial transformer networks. In Annual Conference on Neural Information Processing Systems (NeurIPS).

  • Jeon, H. G., Lee, J.Y., Im, S., Ha, H., So Kweon, I. (2016), Stereo matching with color and monochrome cameras in low-light conditions. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).

  • Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., Bry, A. (2017), End-to-end learning of geometry and context for deep stereo regression. In Proceedings of IEEE International Conference on Computer Vision (ICCV), pp 66–75.

  • Kim, S., Min, D., Ham, B., Ryu, S., Do, M.N., Sohn, K, (2015), Dasc: Dense adaptive self-correlation descriptor for multi-modal and multi-spectral correspondence. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).

  • Kimmel, R. (1999). Demosaicing: Image reconstruction from color ccd samples. IEEE Transactions on Image Processing (TIP), 8(9), 1221–1228.

    Article  Google Scholar 

  • Levin, A., Lischinski, D., & Weiss, Y. (2004). Colorization using optimization. ACM Transactions on Graphics (TOG), 23, 689–694.

    Article  Google Scholar 

  • LG V50. (2019). https://www.lg.com/us/mobile-phones/v50-thinq-5g/sprint

  • Li, A., Yuan, Z. (2018). Occlusion aware stereo matching via cooperative unsupervised learning. In Proceedings of Asian Conference on Computer Vision (ACCV).

  • Liang, M., Guo, X., Li, H., Wang, X., Song, Y. (2019). Unsupervised cross-spectral stereo matching by learning to synthesize. In Proceedings of AAAI Conference on Artificial Intelligence (AAAI).

  • Liu, X., Tanaka, M., & Okutomi, M. (2013). Single-image noise level estimation for blind denoising. IEEE Transactions on Image Processing (TIP), 22(12), 5226–5237.

    Article  Google Scholar 

  • Liu, Z., Yuan, L., Tang, X., Uyttendaele, M., & Sun, J. (2014). Fast burst images denoising. ACM Transactions on Graphics (TOG), 33(6), 232.

    Google Scholar 

  • Malvar, H. S., He, L. w., Cutler, R. (2004). High-quality linear interpolation for demosaicing of bayer-patterned color images. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

  • Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T. (2016), A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).

  • Menze, M., Geiger, A. (2015). Object scene flow for autonomous vehicles. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).

  • Mertens, T., Kautz, J., & Van Reeth, F. (2009). Exposure fusion: A simple and practical alternative to high dynamic range photography. Computer Graphics Forum, Wiley Online Library, 28, 161–171.

    Article  Google Scholar 

  • Owen, A. B. (2007). A robust hybrid of lasso and ridge regression. Contemporary Mathematics, 443(7), 59–72.

    Article  MathSciNet  Google Scholar 

  • Pinggera, P., Breckon, T., Bischof, H. (2012). On cross-spectral stereo matching using dense gradient features. In Proceedings of British Machine Vision Conference (BMVC).

  • Quan, D., Liang, X., Wang, S., Wei, S., Li, Y., Huyan, N., Jiao, L. (2019). Afd-net: Aggregated feature difference learning for cross-spectral image patch matching. In Proceedings of IEEE International Conference on Computer Vision (ICCV).

  • Reinhard, E., Heidrich, W., Debevec, P., Pattanaik, S., Ward, G., & Myszkowski, K. (2010). High dynamic range imaging: acquisition, display, and image-based lighting. Burlington: Morgan Kaufmann.

    Google Scholar 

  • Ronneberger, O., Fischer, P., Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI).

  • Samsung Galaxy S10. (2019). https://www.samsung.com/ca/smartphones/galaxy-s10/

  • Shen, X., Gao, H., Tao, X., Zhou, C., Jia, J. (2017), High-quality correspondence and segmentation estimation for dual-lens smart-phone portraits. In Proceedings of IEEE International Conference on Computer Vision (ICCV).

  • Shin, C., Jeon, H. G., Yoon, Y., Kweon, I. S., Kim, S. J. (2018). Epinet: A fully-convolutional neural network using epipolar geometry for depth from light field images. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).

  • Tos,i F., Aleotti, F., Poggi, M., Mattoccia, S. (2019). Learning monocular depth estimation infusing traditional stereo knowledge. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).

  • Wang, Z., Bovik, A. C., Sheikh, H. R., Simoncelli, E. P., et al. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing (TIP), 13(4), 600–612.

    Article  Google Scholar 

  • Yatziv, L., & Sapiro, G. (2006). Fast image and video colorization using chrominance blending. IEEE Transactions on Image Processing (TIP), 15(5), 1120–1129.

    Article  Google Scholar 

  • Zhang, R., Isola, P., Efros, A. A. (2016). Colorful image colorization. In Proceedings of European Conference on Computer Vision (ECCV).

  • Zhang, R., Zhu, J. Y., Isola, P., Geng, X., Lin, A. S., Yu, T., Efros, A. A. (2017b). Real-time user-guided image colorization with learned deep priors. ACM Transactions on Graphics (TOG) 9(4)

  • Zhang, K., Zuo, W., Chen, Y., Meng, D., & Zhang, L. (2017). Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising. IEEE Transactions on Image Processing (TIP), 26(7), 3142–3155.

    Article  MathSciNet  Google Scholar 

  • Zhang, K., Zuo, W., & Zhang, L. (2018). Ffdnet: Toward a fast and flexible solution for cnn-based image denoising. IEEE Transactions on Image Processing (TIP), 27(9), 4608–4622.

    Article  MathSciNet  Google Scholar 

  • Zhao, S., Sheng, Y., Dong, Y., Chang, E. I., Xu, Y. et al. (2020). Maskflownet: Asymmetric feature matching with learnable occlusion mask. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).

  • Zhi, T., Pires, B. R., Hebert, M., Narasimhan, S. G. (2018). Deep material-aware cross-spectral stereo matching. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).

Download references

Acknowledgements

This work is partly supported by the Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2019-0-01842, Artificial Intelligence Graduate School Program (GIST), No.2014-3-00077, AI National Strategy Project, No.2021-0-02068, Artificial Intelligence Innovation Hub, No.2020-0-00231, Development of Low Latency VR/AR Streaming Technology based on 5G edge cloud), Vehicles AI Convergence Research & Development Program through the National IT Industry Promotion Agency of Korea (NIPA) funded by the Ministry of Science and ICT (No. S1602-20-1001), the International Collaborative Research and Development Program funded by the Korea Institute for Advancement of Technology (KIAT) (No. P146500035), and the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No.2020 R1C1C1012635, 2020R1C1C1013210). In addition, this research was financially supported by the Ministry of Trade, Industry and Energy(MOTIE) and Korea Institute for Advancement of Technology(KIAT) through the International Cooperative R&D program in part (P0019797).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sunghoon Im.

Additional information

Communicated by Jun Sato.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Details of CMSNet Architecture

Appendix: Details of CMSNet Architecture

figure a

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jeon, HG., Im, S., Choe, J. et al. CMSNet: Deep Color and Monochrome Stereo. Int J Comput Vis 130, 652–668 (2022). https://doi.org/10.1007/s11263-021-01565-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-021-01565-6

Keywords

Navigation