CMSNet: Deep Color and Monochrome Stereo

Jeon, Hae-Gon; Im, Sunghoon; Choe, Jaesung; Kang, Minjun; Lee, Joon-Young; Hebert, Martial

doi:10.1007/s11263-021-01565-6

CMSNet: Deep Color and Monochrome Stereo

Published: 22 January 2022

Volume 130, pages 652–668, (2022)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Hae-Gon Jeon¹,
Sunghoon Im ORCID: orcid.org/0000-0001-9776-8101²,
Jaesung Choe³,
Minjun Kang³,
Joon-Young Lee⁴ &
…
Martial Hebert⁵

1127 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

In this paper, we propose an end-to-end convolutional neural network for stereo matching with color and monochrome cameras, called CMSNet (Color and Monochrome Stereo Network). Both cameras have the same structure except for the presence of a Bayer filter, but have a fundamental trade-off. The Bayer filter allows capturing chrominance information of scenes, but limits a quantum efficiency of cameras, which causes severe image noise. It seems ideal if we can take advantage of both the cameras so that we obtain noise-free images with their corresponding disparity maps. However, image luminance recorded from a color camera is not consistent with that from a monochrome camera due to spatially-varying illumination and different spectral sensitivities of the cameras. This degrades the performance of stereo matching. To solve this problem, we design CMSNet for disparity estimation from noisy color and relatively clean monochrome images. CMSNet also infers a noise-free image with the estimated disparity map. We leverage a data augmentation to simulate realistic signal-dependent noise and various radiometric distortions between input stereo pairs to train CMSNet effectively. CMSNet is evaluated using various datasets and the performance of our disparity estimation and image enhancement consistently outperforms state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Du2Net: Learning Depth Estimation from Dual-Cameras and Dual-Pixels

End-to-End Disparity Estimation with Multi-granularity Fully Convolutional Network

Cross-Camera Deep Colorization

Notes

Downloaded from https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets.en.html.
We randomly crop the images to augment and generate occlusion maps by crosschecking a pair of disparity maps.
http://www.vision.caltech.edu/bouguetj/calib_doc/index.html.
Downloaded from https://github.com/tiancheng-zhi/cs-stereo.

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. https://www.tensorflow.org/, software available from tensorflow.org
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & Susstrunk, S. (2007). Multiplexing for optimal lighting. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 29(8), 1339–1354.
Article Google Scholar
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & Susstrunk, S. (2012). Slic superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 34(11), 2274–2282.
Article Google Scholar
Buades, A., Coll, B., Morel, J. M. (2005). A non-local algorithm for image denoising. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
Chakrabarti, A., Freeman, W.T., Zickler, T. (2014). Rethinking color cameras. In Proceedings of IEEE International Conference on Computational Photography (ICCP).
Chang, J. R., Chen, Y. S. (2018). Pyramid stereo matching network. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5410–5418.
Dabov, K., Foi, A., Katkovnik, V., & Egiazarian, K. (2007). Image denoising by sparse 3-d transform-domain collaborative filtering. IEEE Transactions on Image Processing (TIP), 16(8), 2080–2095.
Article MathSciNet Google Scholar
Dalal, N., Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
Dong, X., Li, W., Wang, X., Wang, Y. (2019). Learning a deep convolutional network for colorization in monochrome-color dual-lens system. In Proceedings of AAAI Conference on Artificial Intelligence (AAAI).
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., Van Der Smagt, P., Cremers, D., Brox, T. (2015). Flownet: Learning optical flow with convolutional networks. In Proceedings of IEEE International Conference on Computer Vision (ICCV).
Flea3. (2017). GigE imaging performance specification. http://www.ptgrey.com/support/downloads/10109/
Gallo, O., Gelfandz, N., Chen, W. C., Tico, M., Pulli, K. (2009). Artifact-free high dynamic range imaging. In Proceedings of IEEE International Conference on Computational Photography (ICCP).
Gastal, E. S., & Oliveira, M. M. (2011). Domain transform for edge-aware image and video processing. ACM Transactions on Graphics (TOG), 30, 69.
Article Google Scholar
Hasinoff, S. W., Sharlet, D., Geiss, R., Adams, A., Barron, J. T., Kainz, F., et al. (2016). Burst photography for high dynamic range and low-light imaging on mobile cameras. ACM Transactions on Graphics (TOG), 35(6), 192.
Article Google Scholar
He, M., Chen, D., Liao, J., Sander, P. V., & Yuan, L. (2018). Deep exemplar-based colorization. ACM Transactions on Graphics (TOG), 37(4), 47.
Google Scholar
Heo, Y. S., Lee, K. M., & Lee, S. U. (2011). Robust stereo matching using adaptive normalized cross-correlation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 33(4), 807–822.
Article Google Scholar
Heo, Y. S., Lee, K. M., & Lee, S. U. (2013). Joint depth map and color consistency estimation for stereo images with different illuminations and cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 35(5), 1094–1106.
Article Google Scholar
Hirschmüller, H., & Scharstein, D. (2009). Evaluation of stereo matching costs on images with radiometric differences. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 31(9), 1582–1599.
Article Google Scholar
Holloway, J., Mitra, K., Koppal, S. J., & Veeraraghavan, A. N. (2015). Generalized assorted camera arrays: Robust cross-channel registration and applications. IEEE Transactions on Image Processing (TIP), 24(3), 823–835.
Article MathSciNet Google Scholar
Hu, J., Gallo, O., Pulli, K., Sun, X. (2013). Hdr deghosting: How to deal with saturation? In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
Huawei P9. (2016). https://consumer.huawei.com/uk/phones/p9/
Im, S., Jeon, H. G., Lin, S., Kweon, I. S. (2019b). Dpsnet: End-to-end deep plane sweep stereo. In International Conference on Learning Representations (ICLR).
Im, S., Jeon, H. G., & Kweon, I. S. (2019). Robust depth estimation using auto-exposure bracketing. IEEE Transactions on Image Processing (TIP), 28(5), 2451–2464.
Article MathSciNet Google Scholar
Immerkaer, J. (1996). Fast noise variance estimation. Computer Vision and Image Understanding (CVIU), 64(2), 300–302.
Article Google Scholar
iphone XS. (2018). https://www.apple.com/iphone-xs/
Irony, R., Cohen-Or, D., Lischinski, D. (2005). Colorization by example. In Eurographics Symposium on Rendering, vol. 2.
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K. (2015). Spatial transformer networks. In Annual Conference on Neural Information Processing Systems (NeurIPS).
Jeon, H. G., Lee, J.Y., Im, S., Ha, H., So Kweon, I. (2016), Stereo matching with color and monochrome cameras in low-light conditions. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., Bry, A. (2017), End-to-end learning of geometry and context for deep stereo regression. In Proceedings of IEEE International Conference on Computer Vision (ICCV), pp 66–75.
Kim, S., Min, D., Ham, B., Ryu, S., Do, M.N., Sohn, K, (2015), Dasc: Dense adaptive self-correlation descriptor for multi-modal and multi-spectral correspondence. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
Kimmel, R. (1999). Demosaicing: Image reconstruction from color ccd samples. IEEE Transactions on Image Processing (TIP), 8(9), 1221–1228.
Article Google Scholar
Levin, A., Lischinski, D., & Weiss, Y. (2004). Colorization using optimization. ACM Transactions on Graphics (TOG), 23, 689–694.
Article Google Scholar
LG V50. (2019). https://www.lg.com/us/mobile-phones/v50-thinq-5g/sprint
Li, A., Yuan, Z. (2018). Occlusion aware stereo matching via cooperative unsupervised learning. In Proceedings of Asian Conference on Computer Vision (ACCV).
Liang, M., Guo, X., Li, H., Wang, X., Song, Y. (2019). Unsupervised cross-spectral stereo matching by learning to synthesize. In Proceedings of AAAI Conference on Artificial Intelligence (AAAI).
Liu, X., Tanaka, M., & Okutomi, M. (2013). Single-image noise level estimation for blind denoising. IEEE Transactions on Image Processing (TIP), 22(12), 5226–5237.
Article Google Scholar
Liu, Z., Yuan, L., Tang, X., Uyttendaele, M., & Sun, J. (2014). Fast burst images denoising. ACM Transactions on Graphics (TOG), 33(6), 232.
Google Scholar
Malvar, H. S., He, L. w., Cutler, R. (2004). High-quality linear interpolation for demosaicing of bayer-patterned color images. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T. (2016), A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
Menze, M., Geiger, A. (2015). Object scene flow for autonomous vehicles. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
Mertens, T., Kautz, J., & Van Reeth, F. (2009). Exposure fusion: A simple and practical alternative to high dynamic range photography. Computer Graphics Forum, Wiley Online Library, 28, 161–171.
Article Google Scholar
Owen, A. B. (2007). A robust hybrid of lasso and ridge regression. Contemporary Mathematics, 443(7), 59–72.
Article MathSciNet Google Scholar
Pinggera, P., Breckon, T., Bischof, H. (2012). On cross-spectral stereo matching using dense gradient features. In Proceedings of British Machine Vision Conference (BMVC).
Quan, D., Liang, X., Wang, S., Wei, S., Li, Y., Huyan, N., Jiao, L. (2019). Afd-net: Aggregated feature difference learning for cross-spectral image patch matching. In Proceedings of IEEE International Conference on Computer Vision (ICCV).
Reinhard, E., Heidrich, W., Debevec, P., Pattanaik, S., Ward, G., & Myszkowski, K. (2010). High dynamic range imaging: acquisition, display, and image-based lighting. Burlington: Morgan Kaufmann.
Google Scholar
Ronneberger, O., Fischer, P., Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI).
Samsung Galaxy S10. (2019). https://www.samsung.com/ca/smartphones/galaxy-s10/
Shen, X., Gao, H., Tao, X., Zhou, C., Jia, J. (2017), High-quality correspondence and segmentation estimation for dual-lens smart-phone portraits. In Proceedings of IEEE International Conference on Computer Vision (ICCV).
Shin, C., Jeon, H. G., Yoon, Y., Kweon, I. S., Kim, S. J. (2018). Epinet: A fully-convolutional neural network using epipolar geometry for depth from light field images. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
Tos,i F., Aleotti, F., Poggi, M., Mattoccia, S. (2019). Learning monocular depth estimation infusing traditional stereo knowledge. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
Wang, Z., Bovik, A. C., Sheikh, H. R., Simoncelli, E. P., et al. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing (TIP), 13(4), 600–612.
Article Google Scholar
Yatziv, L., & Sapiro, G. (2006). Fast image and video colorization using chrominance blending. IEEE Transactions on Image Processing (TIP), 15(5), 1120–1129.
Article Google Scholar
Zhang, R., Isola, P., Efros, A. A. (2016). Colorful image colorization. In Proceedings of European Conference on Computer Vision (ECCV).
Zhang, R., Zhu, J. Y., Isola, P., Geng, X., Lin, A. S., Yu, T., Efros, A. A. (2017b). Real-time user-guided image colorization with learned deep priors. ACM Transactions on Graphics (TOG) 9(4)
Zhang, K., Zuo, W., Chen, Y., Meng, D., & Zhang, L. (2017). Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising. IEEE Transactions on Image Processing (TIP), 26(7), 3142–3155.
Article MathSciNet Google Scholar
Zhang, K., Zuo, W., & Zhang, L. (2018). Ffdnet: Toward a fast and flexible solution for cnn-based image denoising. IEEE Transactions on Image Processing (TIP), 27(9), 4608–4622.
Article MathSciNet Google Scholar
Zhao, S., Sheng, Y., Dong, Y., Chang, E. I., Xu, Y. et al. (2020). Maskflownet: Asymmetric feature matching with learnable occlusion mask. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
Zhi, T., Pires, B. R., Hebert, M., Narasimhan, S. G. (2018). Deep material-aware cross-spectral stereo matching. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).

Download references

Acknowledgements

This work is partly supported by the Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2019-0-01842, Artificial Intelligence Graduate School Program (GIST), No.2014-3-00077, AI National Strategy Project, No.2021-0-02068, Artificial Intelligence Innovation Hub, No.2020-0-00231, Development of Low Latency VR/AR Streaming Technology based on 5G edge cloud), Vehicles AI Convergence Research & Development Program through the National IT Industry Promotion Agency of Korea (NIPA) funded by the Ministry of Science and ICT (No. S1602-20-1001), the International Collaborative Research and Development Program funded by the Korea Institute for Advancement of Technology (KIAT) (No. P146500035), and the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No.2020 R1C1C1012635, 2020R1C1C1013210). In addition, this research was financially supported by the Ministry of Trade, Industry and Energy(MOTIE) and Korea Institute for Advancement of Technology(KIAT) through the International Cooperative R&D program in part (P0019797).

Author information

Authors and Affiliations

AI Graduate School & The School of Electrical Engineering and Computer Science, GIST, Gwangju, Korea
Hae-Gon Jeon
Department of Electrical Engineering and Computer Science, DGIST, Deagu, Korea
Sunghoon Im
KAIST, Daejeon, Korea
Jaesung Choe & Minjun Kang
Adobe Research, San Jose, CA, USA
Joon-Young Lee
The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA
Martial Hebert

Authors

Hae-Gon Jeon
View author publications
You can also search for this author in PubMed Google Scholar
Sunghoon Im
View author publications
You can also search for this author in PubMed Google Scholar
Jaesung Choe
View author publications
You can also search for this author in PubMed Google Scholar
Minjun Kang
View author publications
You can also search for this author in PubMed Google Scholar
Joon-Young Lee
View author publications
You can also search for this author in PubMed Google Scholar
Martial Hebert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sunghoon Im.

Additional information

Communicated by Jun Sato.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Details of CMSNet Architecture

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jeon, HG., Im, S., Choe, J. et al. CMSNet: Deep Color and Monochrome Stereo. Int J Comput Vis 130, 652–668 (2022). https://doi.org/10.1007/s11263-021-01565-6

Download citation

Received: 07 February 2021
Accepted: 26 November 2021
Published: 22 January 2022
Issue Date: March 2022
DOI: https://doi.org/10.1007/s11263-021-01565-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CMSNet: Deep Color and Monochrome Stereo

Abstract

Access this article

Similar content being viewed by others

Du2Net: Learning Depth Estimation from Dual-Cameras and Dual-Pixels

End-to-End Disparity Estimation with Multi-granularity Fully Convolutional Network

Cross-Camera Deep Colorization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Details of CMSNet Architecture

Rights and permissions

About this article

Cite this article

Keywords

Navigation

CMSNet: Deep Color and Monochrome Stereo

Abstract

Access this article

Similar content being viewed by others

Du2Net: Learning Depth Estimation from Dual-Cameras and Dual-Pixels

End-to-End Disparity Estimation with Multi-granularity Fully Convolutional Network

Cross-Camera Deep Colorization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Details of CMSNet Architecture

Appendix: Details of CMSNet Architecture

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation