Abstract
In this paper, we propose an end-to-end convolutional neural network for stereo matching with color and monochrome cameras, called CMSNet (Color and Monochrome Stereo Network). Both cameras have the same structure except for the presence of a Bayer filter, but have a fundamental trade-off. The Bayer filter allows capturing chrominance information of scenes, but limits a quantum efficiency of cameras, which causes severe image noise. It seems ideal if we can take advantage of both the cameras so that we obtain noise-free images with their corresponding disparity maps. However, image luminance recorded from a color camera is not consistent with that from a monochrome camera due to spatially-varying illumination and different spectral sensitivities of the cameras. This degrades the performance of stereo matching. To solve this problem, we design CMSNet for disparity estimation from noisy color and relatively clean monochrome images. CMSNet also infers a noise-free image with the estimated disparity map. We leverage a data augmentation to simulate realistic signal-dependent noise and various radiometric distortions between input stereo pairs to train CMSNet effectively. CMSNet is evaluated using various datasets and the performance of our disparity estimation and image enhancement consistently outperforms state-of-the-art methods.
Similar content being viewed by others
Notes
We randomly crop the images to augment and generate occlusion maps by crosschecking a pair of disparity maps.
Downloaded from https://github.com/tiancheng-zhi/cs-stereo.
References
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. https://www.tensorflow.org/, software available from tensorflow.org
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & Susstrunk, S. (2007). Multiplexing for optimal lighting. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 29(8), 1339–1354.
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & Susstrunk, S. (2012). Slic superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 34(11), 2274–2282.
Buades, A., Coll, B., Morel, J. M. (2005). A non-local algorithm for image denoising. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
Chakrabarti, A., Freeman, W.T., Zickler, T. (2014). Rethinking color cameras. In Proceedings of IEEE International Conference on Computational Photography (ICCP).
Chang, J. R., Chen, Y. S. (2018). Pyramid stereo matching network. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5410–5418.
Dabov, K., Foi, A., Katkovnik, V., & Egiazarian, K. (2007). Image denoising by sparse 3-d transform-domain collaborative filtering. IEEE Transactions on Image Processing (TIP), 16(8), 2080–2095.
Dalal, N., Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
Dong, X., Li, W., Wang, X., Wang, Y. (2019). Learning a deep convolutional network for colorization in monochrome-color dual-lens system. In Proceedings of AAAI Conference on Artificial Intelligence (AAAI).
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., Van Der Smagt, P., Cremers, D., Brox, T. (2015). Flownet: Learning optical flow with convolutional networks. In Proceedings of IEEE International Conference on Computer Vision (ICCV).
Flea3. (2017). GigE imaging performance specification. http://www.ptgrey.com/support/downloads/10109/
Gallo, O., Gelfandz, N., Chen, W. C., Tico, M., Pulli, K. (2009). Artifact-free high dynamic range imaging. In Proceedings of IEEE International Conference on Computational Photography (ICCP).
Gastal, E. S., & Oliveira, M. M. (2011). Domain transform for edge-aware image and video processing. ACM Transactions on Graphics (TOG), 30, 69.
Hasinoff, S. W., Sharlet, D., Geiss, R., Adams, A., Barron, J. T., Kainz, F., et al. (2016). Burst photography for high dynamic range and low-light imaging on mobile cameras. ACM Transactions on Graphics (TOG), 35(6), 192.
He, M., Chen, D., Liao, J., Sander, P. V., & Yuan, L. (2018). Deep exemplar-based colorization. ACM Transactions on Graphics (TOG), 37(4), 47.
Heo, Y. S., Lee, K. M., & Lee, S. U. (2011). Robust stereo matching using adaptive normalized cross-correlation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 33(4), 807–822.
Heo, Y. S., Lee, K. M., & Lee, S. U. (2013). Joint depth map and color consistency estimation for stereo images with different illuminations and cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 35(5), 1094–1106.
Hirschmüller, H., & Scharstein, D. (2009). Evaluation of stereo matching costs on images with radiometric differences. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 31(9), 1582–1599.
Holloway, J., Mitra, K., Koppal, S. J., & Veeraraghavan, A. N. (2015). Generalized assorted camera arrays: Robust cross-channel registration and applications. IEEE Transactions on Image Processing (TIP), 24(3), 823–835.
Hu, J., Gallo, O., Pulli, K., Sun, X. (2013). Hdr deghosting: How to deal with saturation? In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
Huawei P9. (2016). https://consumer.huawei.com/uk/phones/p9/
Im, S., Jeon, H. G., Lin, S., Kweon, I. S. (2019b). Dpsnet: End-to-end deep plane sweep stereo. In International Conference on Learning Representations (ICLR).
Im, S., Jeon, H. G., & Kweon, I. S. (2019). Robust depth estimation using auto-exposure bracketing. IEEE Transactions on Image Processing (TIP), 28(5), 2451–2464.
Immerkaer, J. (1996). Fast noise variance estimation. Computer Vision and Image Understanding (CVIU), 64(2), 300–302.
iphone XS. (2018). https://www.apple.com/iphone-xs/
Irony, R., Cohen-Or, D., Lischinski, D. (2005). Colorization by example. In Eurographics Symposium on Rendering, vol. 2.
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K. (2015). Spatial transformer networks. In Annual Conference on Neural Information Processing Systems (NeurIPS).
Jeon, H. G., Lee, J.Y., Im, S., Ha, H., So Kweon, I. (2016), Stereo matching with color and monochrome cameras in low-light conditions. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., Bry, A. (2017), End-to-end learning of geometry and context for deep stereo regression. In Proceedings of IEEE International Conference on Computer Vision (ICCV), pp 66–75.
Kim, S., Min, D., Ham, B., Ryu, S., Do, M.N., Sohn, K, (2015), Dasc: Dense adaptive self-correlation descriptor for multi-modal and multi-spectral correspondence. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
Kimmel, R. (1999). Demosaicing: Image reconstruction from color ccd samples. IEEE Transactions on Image Processing (TIP), 8(9), 1221–1228.
Levin, A., Lischinski, D., & Weiss, Y. (2004). Colorization using optimization. ACM Transactions on Graphics (TOG), 23, 689–694.
LG V50. (2019). https://www.lg.com/us/mobile-phones/v50-thinq-5g/sprint
Li, A., Yuan, Z. (2018). Occlusion aware stereo matching via cooperative unsupervised learning. In Proceedings of Asian Conference on Computer Vision (ACCV).
Liang, M., Guo, X., Li, H., Wang, X., Song, Y. (2019). Unsupervised cross-spectral stereo matching by learning to synthesize. In Proceedings of AAAI Conference on Artificial Intelligence (AAAI).
Liu, X., Tanaka, M., & Okutomi, M. (2013). Single-image noise level estimation for blind denoising. IEEE Transactions on Image Processing (TIP), 22(12), 5226–5237.
Liu, Z., Yuan, L., Tang, X., Uyttendaele, M., & Sun, J. (2014). Fast burst images denoising. ACM Transactions on Graphics (TOG), 33(6), 232.
Malvar, H. S., He, L. w., Cutler, R. (2004). High-quality linear interpolation for demosaicing of bayer-patterned color images. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T. (2016), A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
Menze, M., Geiger, A. (2015). Object scene flow for autonomous vehicles. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
Mertens, T., Kautz, J., & Van Reeth, F. (2009). Exposure fusion: A simple and practical alternative to high dynamic range photography. Computer Graphics Forum, Wiley Online Library, 28, 161–171.
Owen, A. B. (2007). A robust hybrid of lasso and ridge regression. Contemporary Mathematics, 443(7), 59–72.
Pinggera, P., Breckon, T., Bischof, H. (2012). On cross-spectral stereo matching using dense gradient features. In Proceedings of British Machine Vision Conference (BMVC).
Quan, D., Liang, X., Wang, S., Wei, S., Li, Y., Huyan, N., Jiao, L. (2019). Afd-net: Aggregated feature difference learning for cross-spectral image patch matching. In Proceedings of IEEE International Conference on Computer Vision (ICCV).
Reinhard, E., Heidrich, W., Debevec, P., Pattanaik, S., Ward, G., & Myszkowski, K. (2010). High dynamic range imaging: acquisition, display, and image-based lighting. Burlington: Morgan Kaufmann.
Ronneberger, O., Fischer, P., Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI).
Samsung Galaxy S10. (2019). https://www.samsung.com/ca/smartphones/galaxy-s10/
Shen, X., Gao, H., Tao, X., Zhou, C., Jia, J. (2017), High-quality correspondence and segmentation estimation for dual-lens smart-phone portraits. In Proceedings of IEEE International Conference on Computer Vision (ICCV).
Shin, C., Jeon, H. G., Yoon, Y., Kweon, I. S., Kim, S. J. (2018). Epinet: A fully-convolutional neural network using epipolar geometry for depth from light field images. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
Tos,i F., Aleotti, F., Poggi, M., Mattoccia, S. (2019). Learning monocular depth estimation infusing traditional stereo knowledge. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
Wang, Z., Bovik, A. C., Sheikh, H. R., Simoncelli, E. P., et al. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing (TIP), 13(4), 600–612.
Yatziv, L., & Sapiro, G. (2006). Fast image and video colorization using chrominance blending. IEEE Transactions on Image Processing (TIP), 15(5), 1120–1129.
Zhang, R., Isola, P., Efros, A. A. (2016). Colorful image colorization. In Proceedings of European Conference on Computer Vision (ECCV).
Zhang, R., Zhu, J. Y., Isola, P., Geng, X., Lin, A. S., Yu, T., Efros, A. A. (2017b). Real-time user-guided image colorization with learned deep priors. ACM Transactions on Graphics (TOG) 9(4)
Zhang, K., Zuo, W., Chen, Y., Meng, D., & Zhang, L. (2017). Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising. IEEE Transactions on Image Processing (TIP), 26(7), 3142–3155.
Zhang, K., Zuo, W., & Zhang, L. (2018). Ffdnet: Toward a fast and flexible solution for cnn-based image denoising. IEEE Transactions on Image Processing (TIP), 27(9), 4608–4622.
Zhao, S., Sheng, Y., Dong, Y., Chang, E. I., Xu, Y. et al. (2020). Maskflownet: Asymmetric feature matching with learnable occlusion mask. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
Zhi, T., Pires, B. R., Hebert, M., Narasimhan, S. G. (2018). Deep material-aware cross-spectral stereo matching. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
Acknowledgements
This work is partly supported by the Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2019-0-01842, Artificial Intelligence Graduate School Program (GIST), No.2014-3-00077, AI National Strategy Project, No.2021-0-02068, Artificial Intelligence Innovation Hub, No.2020-0-00231, Development of Low Latency VR/AR Streaming Technology based on 5G edge cloud), Vehicles AI Convergence Research & Development Program through the National IT Industry Promotion Agency of Korea (NIPA) funded by the Ministry of Science and ICT (No. S1602-20-1001), the International Collaborative Research and Development Program funded by the Korea Institute for Advancement of Technology (KIAT) (No. P146500035), and the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No.2020 R1C1C1012635, 2020R1C1C1013210). In addition, this research was financially supported by the Ministry of Trade, Industry and Energy(MOTIE) and Korea Institute for Advancement of Technology(KIAT) through the International Cooperative R&D program in part (P0019797).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Jun Sato.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Details of CMSNet Architecture
Appendix: Details of CMSNet Architecture
Rights and permissions
About this article
Cite this article
Jeon, HG., Im, S., Choe, J. et al. CMSNet: Deep Color and Monochrome Stereo. Int J Comput Vis 130, 652–668 (2022). https://doi.org/10.1007/s11263-021-01565-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-021-01565-6