Skip to main content
Log in

Area-based correlation and non-local attention network for stereo matching

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Stereo matching plays an essential role in various computer vision applications. Cost volume is the crucial part in disparity estimation for measuring the similarity between the left-right feature locations. However, most previous cost volume construction based on concatenation or pixel-wise correlation lack of local similarity, leads to an unsatisfactory performance on the large textureless regions. We propose a simple but efficient method for stereo matching to tackle the problem, called area-based correlation and non-local attention network (Abc-Net). First, we exploit the area-based correlation to capture more local similarity in cost volume. The left-right features are sliced into various size patches along the channel dimension. Correlation maps are calculated between the left feature patches and corresponding traversed right patches and then pack them into a 4D area-based cost volume. Second, based on the hourglass module, we combined it with the non-local attention module as the 3D feature matching module, which exploits various spatial relationships and global information. The experiments show that (1) the area-based correlation can capture local similarity to increase accuracy on the large textureless region, (2) the improved 3D feature matching module can exploit global context information to further improve performance, (3) our method achieves competitive results on the SceneFlow, KITTI 2012, and KITTI 2015 datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al.: Tensorflow: a system for large-scale machine learning. Operat. Syst. Des. Implement., 265–283 (2016)

  2. Birchfield, S., Tomasi, C.: A pixel dissimilarity measure that is insensitive to image sampling. IEEE Trans Patt Anal Mach Intell (TPAMI) 20(4), 401–406 (1998)

    Article  Google Scholar 

  3. Bryan, B., Gong, Y., Zhang, Y., Poellabauer, C.: Second-order non-local attention networks for person re-identification. In: IEEE International Conference on Computer Vision (ICCV), pp. 3760–3769 (2019)

  4. Chang, J., Chen, Y.: Pyramid stereo matching network. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5410–5418 (2018)

  5. Chen, X., Kundu, K., Zhu, Y., Berneshawi, A.G., Ma, H., Fidler, S., Urtasun, R.: 3d object proposals for accurate object class detection. In: Advances in Neural Information Processing Systems (NIPS), pp. 424–432 (2015)

  6. Cheng, X., Wang, P., Yang, R.: Learning depth with convolutional spatial propagation network. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 42(10), 2361–2379 (2019)

    Article  Google Scholar 

  7. Cheng, X., Zhong, Y., Harandi, M., Dai, Y., Chang, X., Li, H., Drummond, T., Ge, Z.: Hierarchical neural architecture search for deep stereo matching. In: Advances in Neural Information Processing Systems (NIPS), pp. 1–12 (2020)

  8. Du, X., El-Khamy, M., Lee, J.: Amnet: Deep atrous multiscale stereo disparity estimation networks. arXiv preprint:1904.09099 (2019)

  9. Duggal, S., Wang, S., Ma, W.C., Hu, R., Urtasun, R.: Deeppruner: Learning efficient stereo matching via differentiable patchmatch. In: IEEE International Conference on Computer Vision (ICCV), pp. 4384–4393 (2019)

  10. Fan, R., Ai, X., Dahnoun, N.: Road surface 3d reconstruction based on dense subpixel disparity map estimation. IEEE Trans. Image Process. (TIP) 27(6), 3025–3035 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  11. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3354–3361 (2012)

  12. Guney, F., Geiger, A.: Displets: Resolving stereo ambiguities using object knowledge. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4165–4175 (2015)

  13. Guo, X., Yang, K., Yang, W., Wang, X., Li, H.: Group-wise correlation stereo network. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3273–3282 (2019)

  14. Hamzah, R.A., Abd Rahim, R., Noh, Z.M.: Sum of absolute differences algorithm in stereo correspondence problem for stereo matching in computer vision application. In: International Conference on Computer Science and Information Technology (ICCSIT), pp. 652–657 (2010)

  15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

  16. Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 30(2), 328–341 (2008)

    Article  Google Scholar 

  17. Hu, W., Fu, Z., Guo, Z.: Local frequency interpretation and non-local self-similarity on graph for point cloud inpainting. IEEE Trans. Image Process. (TIP) 28(8), 4087–4100 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  18. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (ICML), pp. 448–456 (2015)

  19. Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., Bry, A.: End-to-end learning of geometry and context for deep stereo regression. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 66–75 (2017)

  20. Lee, S., Kim, Y., Lee, J., Kim, K., Lee, K., Noh, J.: Depth manipulation using disparity histogram analysis for stereoscopic 3d. Visual Comput. 30(4), 455–465 (2014)

    Article  Google Scholar 

  21. Li, X., Huang, H., Zhao, H., Wang, Y., Hu, M.: Learning a convolutional neural network for propagation-based stereo image segmentation. Visual Comput. 36(1), 39–52 (2020)

    Article  Google Scholar 

  22. Li, Y., Zhang, J., Zhong, Y., Wang, M.: An efficient stereo matching based on fragment matching. Visual Comput. 35(2), 257–269 (2019)

    Article  Google Scholar 

  23. Lin, G., Milan, A., Shen, C., Reid, I.: Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5168–5177 (2017)

  24. Liu, J., Li, C., Mei, F., Wang, Z.: 3d entity-based stereo matching with ground control points and joint second-order smoothness prior. Visual Comput. 31(9), 1253–1269 (2015)

    Article  Google Scholar 

  25. Liu, R., Yang, C., Sun, W., Wang, X., Li, H.: Stereogan: Bridging synthetic-to-real domain gap by joint optimization of domain translation and stereo matching. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12757–12766 (2020)

  26. Luo, W., Schwing, A.G., Urtasun, R.: Efficient deep learning for stereo matching. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5695–5703 (2016)

  27. Maninis, K., Ponttuset, J., Arbelaez, P., Van Gool, L.: Convolutional oriented boundaries: From image segmentation to high-level tasks. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 40(4), 819–833 (2018)

    Article  Google Scholar 

  28. Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4040–4048 (2016)

  29. Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3061–3070 (2015)

  30. Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: Dtam: Dense tracking and mapping in real-time. In: IEEE International Conference on Computer Vision (ICCV), pp. 2320–2327 (2011)

  31. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: IEEE European Conference on Computer Vision (ECCV), pp. 483–499 (2016)

  32. Nie, G., Cheng, M., Liu, Y., Liang, Z., Fan, D., Liu, Y., Wang, Y.: Multi-level context ultra-aggregation for stereo matching. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3283–3291 (2019)

  33. Rao, Z., He, M., Dai, Y., Shen, Z.: Patch attention network with generative adversarial model for semi-supervised binocular disparity prediction. Visual Comput. 1–17 (2020)

  34. Rao, Z., He, M., Dai, Y., Zhu, Z., Li, B., He, R.: Msdc-net: Multi-scale dense and contextual networks for stereo matching. In: 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 578–583 (2019)

  35. Rao, Z., He, M., Zhu, Z., Dai, Y., He, R.: Bidirectional guided attention network for 3-d semantic detection of remote sensing images. IEEE Trans. Geosci. Remote Sens. (TGRS), 1–16 (2020)

  36. Ren, H., El-Khamy, M., Lee, J.: Stereo disparity estimation via joint supervised, unsupervised, and weakly supervised learning. In: IEEE International Conference on Image Processing (ICIP), pp. 2760–2764 (2020)

  37. Ren, H., Raj, A., El-Khamy, M., Lee, J.: Suw-learn: Joint supervised, unsupervised, weakly supervised deep learning for monocular depth estimation. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 750–751 (2020)

  38. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 234–241 (2015)

  39. Scharstein, D., Szeliski, R., Zabih, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. (IJCV) 47(1), 7–42 (2001)

    MATH  Google Scholar 

  40. Schops, T., Schonberger, J.L., Galliani, S., Sattler, T., Schindler, K., Pollefeys, M., Geiger, A.: A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3260–3269 (2017)

  41. Song, X., Zhao, X., Fang, L., Hu, H., Yu, Y.: Edgestereo: An effective multi-task learning network for stereo matching and edge detection. Int. J. Comput. Vis. (IJCV) 128(4), 910–930 (2020)

    Article  Google Scholar 

  42. Tang, Y., Zhang, X., Wang, J., Chen, S., Ma, L., Jiang, Y.: Non-local netvlad encoding for video classification. In: IEEE European Conference on Computer Vision (ECCV), pp. 219–228 (2018)

  43. Tripathi, G., Singh, K., Vishwakarma, D.K.: Convolutional neural networks for crowd behaviour analysis: a survey. Visual Comput. 35(5), 753–776 (2019)

    Article  Google Scholar 

  44. Wang, J., Jampani, V., Sun, D., Loop, C., Birchfield, S., Kautz, J.: Improving deep stereo network generalization with geometric priors. arXiv preprint :2008.11098 (2020)

  45. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7794–7803 (2018)

  46. Woodford, O., Torr, P.H.S., Reid, I., Fitzgibbon, A.: Global stereo reconstruction under second-order smoothness priors. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 31(12), 2115–2128 (2009)

    Article  Google Scholar 

  47. Xu, H., Zhang, J.: Aanet: Adaptive aggregation network for efficient stereo matching. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1959–1968 (2020)

  48. Yang, C., Pu, J., Dong, Y., Xie, G., Si, Y., Liu, Z.: Scene classification-oriented saliency detection via the modularized prescription. Visual Comput. 35(4), 473–488 (2019)

    Article  Google Scholar 

  49. Yang, G., Ramanan, D.: Upgrading optical flow to 3d scene flow through optical expansion. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1334–1343 (2020)

  50. Yang, J., Mao, W., Alvarez, J.M., Liu, M.: Cost volume pyramid based depth inference for multi-view stereo. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4877–4886 (2020)

  51. Yao, Y., Luo, Z., Li, S., Fang, T., Quan, L.: Mvsnet: Depth inference for unstructured multi-view stereo. In: IEEE European Conference on Computer Vision (ECCV), pp. 785–801 (2018)

  52. Yin, Z., Darrell, T., Yu, F.: Hierarchical discrete distribution decomposition for match density estimation. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6044–6053 (2019)

  53. Zhang, C., Li, Z., Cheng, Y., Cai, R., Chao, H., Rui, Y.: Meshstereo: A global stereo model with mesh alignment regularization for view interpolation. In: IEEE International Conference on Computer Vision (ICCV), pp. 2057–2065 (2015)

  54. Zhang, F., Prisacariu, V.A., Yang, R., Torr, P.H.S.: Ga-net: Guided aggregation net for end-to-end stereo matching. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 185–194 (2019)

  55. Zhang, Y., Chen, Y., Bai, X., Yu, S., Yu, K., Li, Z., Yang, K.: Adaptive unimodal cost volume filtering for deep stereo matching. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp. 12926–12934 (2020)

  56. Zhao, H., Rosin, P.L., Lai, Y., Wang, Y.: Automatic semantic style transfer using deep convolutional neural networks and soft masks. The Visual Computer pp. 1–18 (2019)

  57. Zinner, C., Humenberger, M., Ambrosch, K., Kubinger, W.: An optimized software-based implementation of a census-based stereo matching algorithm. In: International Symposium on Visual Computing (ISVC), pp. 216–227 (2008)

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Xing Li or Guoyun Lv.

Ethics declarations

Funding

This work was supported by the Key Program of Research and Development Plan of Shaanxi Province under Grant 2020ZDLGY04-09 funded by the Department of Science and Technology of Shaanxi Province.

Conflict of Interest

The authors declared that they have no conflicts of interest to this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

Appendix A

Detailed network structure. The core architecture of Abc-Net framework contains 2D feature extraction, cost volume construction, hourglass and non-local module, output module, and geometry refinement. We illustrate the detailed structure of our method presented in Table 6. Each 2D or 3D convolutional layer contains three steps: convolution, batch normalization (BN), and ReLU nonlinearity (unless otherwise specified).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, X., Fan, Y., Lv, G. et al. Area-based correlation and non-local attention network for stereo matching. Vis Comput 38, 3881–3895 (2022). https://doi.org/10.1007/s00371-021-02228-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-021-02228-w

Keywords

Navigation