Skip to main content
Log in

CVE-Net: cost volume enhanced network guided by sparse features for stereo matching

  • Soft computing in decision making and in modeling in economics
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Deep learning based on convolutional neural network (CNN) has been successfully applied to stereo matching as it can accelerate the training process and improve the matching accuracy. However, the existing stereo matching framework based on CNN often has two problems. The first problem is the generalization ability of training model. Stereo matching frameworks are usually pre-trained on a large synthetic Scene Flow dataset and then fine-tuned on evaluation dataset. However, the evaluation dataset may contain trivial training data or even do not have disparity label for some specified tasks. This adversely affects the generality of the training model. The second problem is the poor matching performance in ill-posed regions. It is difficult to distinguish the ill-posed regions, including weak texture area, repeated texture area, occlusion area, reflection structure, and fine structure, etc. To ameliorate the aforementioned problems, we propose the cost volume enhancement network (CVE-Net) guided by sparse features for stereo matching. CVE-Net use the edge information and saliency information for sparsely sampling the precise disparity labels during training. Furthermore, we enhance the cost volume by leveraging the precise disparity sparse label information to guide the direction of training. The experiment shows that the generalization ability is significantly improved. The domain-transferring problem on the new dataset is significantly alleviated. In addition, introducing the sparse multiple semantic features improves the matching performance in the ill-posed regions. Even without fine-tuning, the matching requirements can be met. These results demonstrate the effectiveness of the CVE-Net.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://www.cvlibs.net/datasets/kitti/eval_stereo_flow.php?benchmark=stereo.

  2. http://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=stereo.

References

  • Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 6:679–698

    Article  Google Scholar 

  • Chang J-R, Chen Y-S (2018) Pyramid stereo matching network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5410–5418

  • Gadekallu TR, Alazab M, Kaluri R, Maddikunta P, Parimala M (2021) Hand gesture classification using a novel CNN-crow search algorithm. Complex & Intelligent Systems, no. 6

  • Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 3354–3361

  • Guo X, Yang K, Yang W, Wang X, Li H (2019) Group-wise correlation stereo network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3273–3282

  • He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  • He J, Zhang S, Yang M, Shan Y, Huang T (2019) Bi-directional cascade network for perceptual edge detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3828–3837

  • Hirschmuller H (2007) Stereo processing by semiglobal matching and mutual information. IEEE Trans Pattern Anal Mach Intell 30(2):328–341

    Article  Google Scholar 

  • Huang G, Gong Y, Xu Q, Wattanachote K, Zeng K, Luo X (2020) A convolutional attention residual network for stereo matching. IEEE Access 8:50828–50842

    Article  Google Scholar 

  • Jie Z, Wang P, Ling Y, Zhao B, Wei Y, Feng J, Liu W (2018) Left-right comparative recurrent model for stereo matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3838–3846

  • Kendall A, Martirosyan H, Dasgupta S, Henry P, Kennedy R, Bachrach A, Bry A (2017) End-to-end learning of geometry and context for deep stereo regression. In: Proceedings of the IEEE international conference on computer vision, pp 66–75

  • Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

  • Knobelreiter P, Reinbacher C, Shekhovtsov A, Pock T (2017) End-to-end training of hybrid CNN-CRF models for stereo. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2339–2348

  • Liang Z, Feng Y, Guo Y, Liu H, Chen W, Qiao L, Zhou L, Zhang J (2018) Learning for disparity estimation through feature constancy. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2811–2820

  • Liu A, Nie W, Gao Y, Su Y (2018) View-based 3-d model retrieval: a benchmark. IEEE Trans Cybern 48(3):916–928

    Google Scholar 

  • Lu C, Uchiyama H, Thomas D, Shimada A, Taniguchi R-I (2018) Sparse cost volume for efficient stereo matching. Remote Sens 10(11):1844

    Article  Google Scholar 

  • Mayer N, Ilg E, Hausser P, Fischer P, Cremers D, Dosovitskiy A, Brox T (2016) A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4040–4048

  • Menze M, Geiger A (2015) Object scene flow for autonomous vehicles. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3061–3070

  • Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66

    Article  MathSciNet  Google Scholar 

  • Pang J, Sun W, Ren JS, Yang C, Yan Q (2017) Cascade residual learning: a two-stage convolutional neural network for stereo matching. In: Proceedings of the IEEE international conference on computer vision workshops, pp 887–895

  • Ren Y, Xie X, Li G, Wang Z (2018) A scan-line forest growing-based hand segmentation framework with multipriority vertex stereo matching for wearable devices. IEEE Trans Cybern 48(2):556–570

    Article  Google Scholar 

  • Scharstein D, Szeliski R (2002) A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int J Comput Vis 47(1–3):7–42

    Article  Google Scholar 

  • Seki A, Pollefeys M (2016) Patch based confidence prediction for dense disparity map. BMVC 2(3):4

    Google Scholar 

  • Seki A, Pollefeys M (2017) Sgm-nets: Semi-global matching with neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 231–240

  • Smolyanskiy N, Kamenev A, Birchfield S (2018) On the importance of stereo for accurate depth estimation: an efficient semi-supervised deep neural network approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 1007–1015

  • Song X, Zhao X, Hu H, Fang L (2018) Edgestereo: a context integrated residual pyramid network for stereo matching. Asian conference on computer vision. Springer, Berlin, pp 20–35

    Google Scholar 

  • Srivastava G, Reddy PK, Gadekallu TR, Siva SG, Ashokkumar P (2020) A two stage text feature selection algorithm for improving text classification

  • Tulyakov S, Ivanov A, Fleuret F (2018) Practical deep stereo (PDS): toward applications-friendly deep stereo matching. In: Advances in neural information processing systems, pp 5871–5881

  • Wang C, Bai X, Wang X, Liu X, Zhou J, Wu X, Li H, Tao D (2020) Self-supervised multiscale adversarial regression network for stereo disparity estimation. In: IEEE Transactions on Cybernetics, pp 1–14

  • Woo S, Park J, Lee J-Y, So Kweon I (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19

  • Wu Z, Su L, Huang Q (2019a) Cascaded partial decoder for fast and accurate salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3907–3916

  • Wu Z, Wu X, Zhang X, Wang S, Ju L (2019) Semantic stereo matching with pyramid cost volumes. In: Proceedings of the IEEE International conference on computer vision, pp 7484–7493

  • Xie S, Tu Z (2015) Holistically-nested edge detection. In: Proceedings of the IEEE international conference on computer vision, pp 1395–1403

  • Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500

  • Yang G, Zhao H, Shi J, Deng Z, Jia J (2018) Segstereo: Exploiting semantic information for disparity estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 636–651

  • Yang G, Manela J, Happold M, Ramanan D (2019) Hierarchical deep stereo matching on high-resolution images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5515–5524

  • Žbontar J, LeCun Y (2016) Stereo matching by training a convolutional neural network to compare image patches. J Mach Learn Res 17(1):2287–2318

    MATH  Google Scholar 

  • Zhang F, Wah BW (2017) Fundamental principles on learning new features for effective dense matching. IEEE Trans Image Process 27(2):822–836

    Article  MathSciNet  Google Scholar 

  • Zhang Y, Chen Y, Bai X, Zhou J, Yu K, Li Z, Yang K (2019) Adaptive unimodal cost volume filtering for deep stereo matching. arXiv preprint arXiv:1909.03751

  • Zhang F, Prisacariu V, Yang R, Torr PH (2019) Ga-net: Guided aggregation net for end-to-end stereo matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 185–194

Download references

Acknowledgements

The work was supported by Guangdong Basic and Applied Basic Research Foundation Grant No. 2019A1515011078, and Guangzhou Scientific and Technological Plan Project No. 201904010228.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the research, experiment and manuscript. Huang Guangyi and Gong Yongyi were responsible for the design of the algorithm and the preparation of the experiment. The experiment and related discussion were performed by Qingzhen Xu, Shuang Liu, Guangyi Huang, Kun Zeng, Yongyi Gong and Xiaonan Luo. Qingzhen Xu, Shuang Liu and Guangyi Huang wrote the manuscript. Kun Zeng, Yongyi Gong and Xiaonan Luo were responsible for the final optimization. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Yongyi Gong or Xiaonan Luo.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, Q., Liu, S., Huang, G. et al. CVE-Net: cost volume enhanced network guided by sparse features for stereo matching. Soft Comput 25, 15183–15199 (2021). https://doi.org/10.1007/s00500-021-06257-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-021-06257-4

Keywords

Navigation