Skip to main content
Log in

Multiple attention networks for stereo matching

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Recent studies have shown that stereo matching can be considered a supervised learning task, in which several left and right images serve as inputs to the convolutional neural network for training, and a detailed disparity map can be obtained. However, the existing architecture for stereo matching is not suitable for estimating the depth of ill-posed regions. To address this problem, we propose a multiple attention network (MA-Net) for stereo matching, which mainly consists of four processes: feature extraction, cost volume construction, cost aggregation, and disparity prediction. For feature extraction, an hourglass position attention module that can effectively aggregate global context and multi-scale information at every position is adopted. In the cost volume construction, we combine cross-correlation volumes with concatenation volumes to ensure that the cost volume can provide efficient representations for measuring feature similarities. In cost aggregation, a multiscale disparity attention module is designed, which can aggregate the feature information of different scales and different disparity dimensions. As in other end-to-end methods, the final disparity is obtained through regression in the disparity prediction. Experimental results obtained on Scene Flow, KITT2012 and KITTI2015 benchmarks show that the proposed method has several advantages in terms of accuracy and speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Aleotti F, Poggi M, Tosi F, et al. (2019) Learning end-to-end scene flow by distilling single tasks knowledge[J]

  2. Bai M, Luo W, Kundu K, Urtasun R (2016) Exploiting semantic information and deep matching for optical flow[C]//European conference on computer vision. Springer, Cham, pp 154–170

    Google Scholar 

  3. Batsos K, Mordohai P (2018) Recresnet: a recurrent residual cnn architecture for disparity map enhancement[C]//2018 international conference on 3D vision (3DV). IEEE, 238–247

  4. Bleyer M, Gelautz M (2007) Graph-cut-based stereo matching using image segmentation with symmetrical treatment of occlusions[J]. Signal Process Image Commun 22(2):127–143

    Article  Google Scholar 

  5. Bleyer M, Rhemann C, Rother C (2011) Patchmatch stereo-stereo matching with slanted support windows[C]//Bmvc. 11: 1–11

  6. Bullinger S, Bodensteiner C, Arens M (2019) 3d object trajectory reconstruction using stereo matching and instance flow based multiple object tracking[C]//2019 16th international conference on machine vision applications (MVA). IEEE, 1–6

  7. Chang J R, Chen Y S (2018) Pyramid stereo matching network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5410–5418

  8. Ernst I, Hirschmüller H (2008) Mutual information based semi-global stereo matching on the GPU[C]//international symposium on visual computing. Springer, Berlin, Heidelberg, pp 228–239

    Google Scholar 

  9. Fan R, Liu Y, Yang X, Bocus, M J, Dahnoun (2018) et al. Real-time stereo vision for road surface 3-d reconstruction[C]//2018 IEEE International Conference on Imaging Systems and Techniques (IST). IEEE, 1–6

  10. Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, et al. (2019) Dual attention network for scene segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3146–3154

  11. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The kitti vision benchmark suite,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 3354–3361

  12. Guney F, Geiger A (2015) Displets: Resolving stereo ambiguities using object knowledge[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4165–4175

  13. Guo X, Yang K, Yang W, Wang X, Li H (2019) Group-wise Correlation Stereo Network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3273–3282

  14. Hirschmuller H (2007) Stereo processing by semiglobal matching and mutual information[J]. IEEE Trans Pattern Anal Mach Intell 30(2):328–341

    Article  Google Scholar 

  15. Hosni A, Rhemann C, Bleyer M, Rother C, Gelautz M (2012) Fast cost-volume filtering for visual correspondence and beyond[J]. IEEE Trans Pattern Anal Mach Intell 35(2):504–511

    Article  Google Scholar 

  16. Junming Z , Skinner K A , Vasudevan R , et al. (2019) DispSegNet: leveraging semantics for end-to-end learning of disparity estimation from stereo imagery[J]. IEEE Robot Autom Lett, 1–1

  17. Kanade T, Okutomi M (1991) A stereo matching algorithm with an adaptive window: Theory and experiment[C]//Proceedings. 1991 IEEE International Conference on Robotics and Automation. IEEE, 1088–1095

  18. Kendall A, Martirosyan H, Dasgupta S, Henry P, Kennedy R, Bachrach A (2017) End-to-end learning of geometry and context for deep stereo regression[C]//Proceedings of the IEEE International Conference on Computer Vision. 66–75

  19. Kerkaou Z, El Ansari M (2020) Support vector machines based stereo matching method for advanced driver assistance systems[J]. Multimed Tools Appl 79(37):27039–27055

    Article  Google Scholar 

  20. Kingma D, Ba J (2014) Adam: a method for stochastic optimization[J]. Computer ence

  21. Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks[C]//Advances in neural information processing systems. 1097–1105

  22. Liang Z, Guo Y, Feng Y, Chen W, Qiao L, Zhou L, et al. (2019) Stereo matching using multi-level cost volume and multi-scale feature Constancy[J]. IEEE Trans Pattern Anal Mach Intell.

  23. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 3431–3440

  24. Lu C, Uchiyama H, Thomas D, Shimada A, Taniguchi R (2018) Sparse cost volume for efficient stereo matching[J]. Remote Sens 10(11):1844

    Article  Google Scholar 

  25. Luo W, Schwing A G, Urtasun R (2016) Efficient deep learning for stereo matching[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5695–5703

  26. Mayer N, Ilg E, Hausser P, Fischer P, Cremers D, Dosovitskiy A (2016) A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4040–4048

  27. Nie GY, Cheng MM, Liu Y, et al. (2019) Multi-level context ultra-aggregation for stereo matching[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 3283–3291

  28. Ou X, Yan P, Zhang Y, Tu B, Zhang G, Wu J, Li W (2019) Moving object detection method via ResNet-18 with encoder–decoder structure in complex scenes[J]. IEEE Access 7:108152–108160

    Article  Google Scholar 

  29. Rao Z, He M, Dai Y, Zhu Z., Li B., He R. (2020) NLCA-net: a non-local context attention network for stereo matching[J]. APSIPA Trans Signal Inf Process, 9

  30. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 779–788

  31. Sang H, Wang Q, Zhao Y (2019) Multi-scale context attention network for stereo matching[J]. IEEE Access 7:15152–15161

    Article  Google Scholar 

  32. Scharstein D, Szeliski R (2002) A taxonomy and evaluation of dense two-frame stereo correspondence algorithms[J]. Int J Comput Vis 47(1–3):7–42

    Article  Google Scholar 

  33. Seki A, Pollefeys M (2017) Sgm-nets: Semi-global matching with neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 231–240

  34. Shaked A, Wolf L (2017) Improved stereo matching with constant highway networks and reflective confidence learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4641–4650

  35. Song X, Zhao X, Hu H, Fang L (2018) Edgestereo: a context integrated residual pyramid network for stereo matching[C]//Asian conference on computer vision. Springer, Cham, pp 20–35

    Google Scholar 

  36. Tulyakov S, Ivanov A, Fleuret F (2017) Weakly supervised learning of deep metrics for stereo reconstruction[C]//Proceedings of the IEEE International Conference on Computer Vision. 1339–1348

  37. Xu H, Zhang J (2020) AANet: Adaptive Aggregation Network for Efficient Stereo Matching[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1959-1968

  38. Yao M, Ouyang W, Xu B (2020) Hybrid cost aggregation for dense stereo matching[J]. Multimed Tools Appl 79(31–32):23189–23202

    Article  Google Scholar 

  39. Yee K, Chakrabarti A (2020) Fast Deep Stereo with 2D Convolutional Processing of Cost Signatures[C]//The IEEE Winter Conference on Applications of Computer Vision. 183–191

  40. Yin Z, Darrell T, Yu F (2019) Hierarchical discrete distribution decomposition for match density estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6044–6053

  41. Zbontar J, LeCun Y (2015) Computing the stereo matching cost with a convolutional neural network[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 1592–1599

  42. Zbontar J, LeCun Y (2016) Stereo matching by training a convolutional neural network to compare image patches[J]. J Mach Learn Res 17(1–32):2

    MATH  Google Scholar 

  43. Zhang F, Wah BW (2017) Fundamental principles on learning new features for effective dense matching[J]. IEEE Trans Image Process 27(2):822–836

    Article  MathSciNet  Google Scholar 

  44. Zhang F, Prisacariu V, Yang R, Torr P, H S. (2019) GA-Net: Guided Aggregation Net for End-to-end Stereo Matching[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 185–194

  45. Zhang Y, Chen Y, Bai X, et al. (2020) Adaptive Unimodal Cost Volume Filtering for Deep Stereo Matching[C]//AAAI. 12926–12934

  46. Zhang Z, Lin Z, Xu J, et al. (2020) Bilateral attention network for rgb-d salient object detection[J]. arXiv preprint arXiv:2004.14582

Download references

Funding

This work has been supported in part by the Scientific Research Fund of Education Department of Hunan Province(19B245,19A200,18B349,18B345), the Science and Technology Program of Hunan Province (2016TP1021), the Hunan Provincial Natural Science Foundation (2019JJ40104,2019JJ40110), Hunan postgraduate scientific research project of innovation (CX20190933,CX20190930) the Hunan Emergency Communication Engineering Technology Research Center(2018TP2022), the Engineering Research Center on 3D Reconstruction and Intelligent Application Technology of Hunan Province(2019–430602–73-03-006049).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Longyuan Guo.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Table 9 The description of symbols for all equations in the main text

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, L., Duan, H. & Zhou, W. Multiple attention networks for stereo matching. Multimed Tools Appl 80, 28583–28601 (2021). https://doi.org/10.1007/s11042-021-11102-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11102-9

Keywords

Navigation