Multiple attention networks for stereo matching

Guo, Longyuan; Duan, Houyu; Zhou, Wuwei

doi:10.1007/s11042-021-11102-9

Multiple attention networks for stereo matching

Published: 07 June 2021

Volume 80, pages 28583–28601, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

458 Accesses
1 Altmetric
Explore all metrics

Abstract

Recent studies have shown that stereo matching can be considered a supervised learning task, in which several left and right images serve as inputs to the convolutional neural network for training, and a detailed disparity map can be obtained. However, the existing architecture for stereo matching is not suitable for estimating the depth of ill-posed regions. To address this problem, we propose a multiple attention network (MA-Net) for stereo matching, which mainly consists of four processes: feature extraction, cost volume construction, cost aggregation, and disparity prediction. For feature extraction, an hourglass position attention module that can effectively aggregate global context and multi-scale information at every position is adopted. In the cost volume construction, we combine cross-correlation volumes with concatenation volumes to ensure that the cost volume can provide efficient representations for measuring feature similarities. In cost aggregation, a multiscale disparity attention module is designed, which can aggregate the feature information of different scales and different disparity dimensions. As in other end-to-end methods, the final disparity is obtained through regression in the disparity prediction. Experimental results obtained on Scene Flow, KITT2012 and KITTI2015 benchmarks show that the proposed method has several advantages in terms of accuracy and speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 10

Area-based correlation and non-local attention network for stereo matching

Article 19 July 2021

Hierarchical Correlation Stereo Matching Network

An Improved Stereo Matching Algorithm Based on AnyNet

References

Aleotti F, Poggi M, Tosi F, et al. (2019) Learning end-to-end scene flow by distilling single tasks knowledge[J]
Bai M, Luo W, Kundu K, Urtasun R (2016) Exploiting semantic information and deep matching for optical flow[C]//European conference on computer vision. Springer, Cham, pp 154–170
Google Scholar
Batsos K, Mordohai P (2018) Recresnet: a recurrent residual cnn architecture for disparity map enhancement[C]//2018 international conference on 3D vision (3DV). IEEE, 238–247
Bleyer M, Gelautz M (2007) Graph-cut-based stereo matching using image segmentation with symmetrical treatment of occlusions[J]. Signal Process Image Commun 22(2):127–143
Article Google Scholar
Bleyer M, Rhemann C, Rother C (2011) Patchmatch stereo-stereo matching with slanted support windows[C]//Bmvc. 11: 1–11
Bullinger S, Bodensteiner C, Arens M (2019) 3d object trajectory reconstruction using stereo matching and instance flow based multiple object tracking[C]//2019 16th international conference on machine vision applications (MVA). IEEE, 1–6
Chang J R, Chen Y S (2018) Pyramid stereo matching network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5410–5418
Ernst I, Hirschmüller H (2008) Mutual information based semi-global stereo matching on the GPU[C]//international symposium on visual computing. Springer, Berlin, Heidelberg, pp 228–239
Google Scholar
Fan R, Liu Y, Yang X, Bocus, M J, Dahnoun (2018) et al. Real-time stereo vision for road surface 3-d reconstruction[C]//2018 IEEE International Conference on Imaging Systems and Techniques (IST). IEEE, 1–6
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, et al. (2019) Dual attention network for scene segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3146–3154
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The kitti vision benchmark suite,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 3354–3361
Guney F, Geiger A (2015) Displets: Resolving stereo ambiguities using object knowledge[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4165–4175
Guo X, Yang K, Yang W, Wang X, Li H (2019) Group-wise Correlation Stereo Network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3273–3282
Hirschmuller H (2007) Stereo processing by semiglobal matching and mutual information[J]. IEEE Trans Pattern Anal Mach Intell 30(2):328–341
Article Google Scholar
Hosni A, Rhemann C, Bleyer M, Rother C, Gelautz M (2012) Fast cost-volume filtering for visual correspondence and beyond[J]. IEEE Trans Pattern Anal Mach Intell 35(2):504–511
Article Google Scholar
Junming Z , Skinner K A , Vasudevan R , et al. (2019) DispSegNet: leveraging semantics for end-to-end learning of disparity estimation from stereo imagery[J]. IEEE Robot Autom Lett, 1–1
Kanade T, Okutomi M (1991) A stereo matching algorithm with an adaptive window: Theory and experiment[C]//Proceedings. 1991 IEEE International Conference on Robotics and Automation. IEEE, 1088–1095
Kendall A, Martirosyan H, Dasgupta S, Henry P, Kennedy R, Bachrach A (2017) End-to-end learning of geometry and context for deep stereo regression[C]//Proceedings of the IEEE International Conference on Computer Vision. 66–75
Kerkaou Z, El Ansari M (2020) Support vector machines based stereo matching method for advanced driver assistance systems[J]. Multimed Tools Appl 79(37):27039–27055
Article Google Scholar
Kingma D, Ba J (2014) Adam: a method for stochastic optimization[J]. Computer ence
Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks[C]//Advances in neural information processing systems. 1097–1105
Liang Z, Guo Y, Feng Y, Chen W, Qiao L, Zhou L, et al. (2019) Stereo matching using multi-level cost volume and multi-scale feature Constancy[J]. IEEE Trans Pattern Anal Mach Intell.
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 3431–3440
Lu C, Uchiyama H, Thomas D, Shimada A, Taniguchi R (2018) Sparse cost volume for efficient stereo matching[J]. Remote Sens 10(11):1844
Article Google Scholar
Luo W, Schwing A G, Urtasun R (2016) Efficient deep learning for stereo matching[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5695–5703
Mayer N, Ilg E, Hausser P, Fischer P, Cremers D, Dosovitskiy A (2016) A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4040–4048
Nie GY, Cheng MM, Liu Y, et al. (2019) Multi-level context ultra-aggregation for stereo matching[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 3283–3291
Ou X, Yan P, Zhang Y, Tu B, Zhang G, Wu J, Li W (2019) Moving object detection method via ResNet-18 with encoder–decoder structure in complex scenes[J]. IEEE Access 7:108152–108160
Article Google Scholar
Rao Z, He M, Dai Y, Zhu Z., Li B., He R. (2020) NLCA-net: a non-local context attention network for stereo matching[J]. APSIPA Trans Signal Inf Process, 9
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 779–788
Sang H, Wang Q, Zhao Y (2019) Multi-scale context attention network for stereo matching[J]. IEEE Access 7:15152–15161
Article Google Scholar
Scharstein D, Szeliski R (2002) A taxonomy and evaluation of dense two-frame stereo correspondence algorithms[J]. Int J Comput Vis 47(1–3):7–42
Article Google Scholar
Seki A, Pollefeys M (2017) Sgm-nets: Semi-global matching with neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 231–240
Shaked A, Wolf L (2017) Improved stereo matching with constant highway networks and reflective confidence learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4641–4650
Song X, Zhao X, Hu H, Fang L (2018) Edgestereo: a context integrated residual pyramid network for stereo matching[C]//Asian conference on computer vision. Springer, Cham, pp 20–35
Google Scholar
Tulyakov S, Ivanov A, Fleuret F (2017) Weakly supervised learning of deep metrics for stereo reconstruction[C]//Proceedings of the IEEE International Conference on Computer Vision. 1339–1348
Xu H, Zhang J (2020) AANet: Adaptive Aggregation Network for Efficient Stereo Matching[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1959-1968
Yao M, Ouyang W, Xu B (2020) Hybrid cost aggregation for dense stereo matching[J]. Multimed Tools Appl 79(31–32):23189–23202
Article Google Scholar
Yee K, Chakrabarti A (2020) Fast Deep Stereo with 2D Convolutional Processing of Cost Signatures[C]//The IEEE Winter Conference on Applications of Computer Vision. 183–191
Yin Z, Darrell T, Yu F (2019) Hierarchical discrete distribution decomposition for match density estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6044–6053
Zbontar J, LeCun Y (2015) Computing the stereo matching cost with a convolutional neural network[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 1592–1599
Zbontar J, LeCun Y (2016) Stereo matching by training a convolutional neural network to compare image patches[J]. J Mach Learn Res 17(1–32):2
MATH Google Scholar
Zhang F, Wah BW (2017) Fundamental principles on learning new features for effective dense matching[J]. IEEE Trans Image Process 27(2):822–836
Article MathSciNet Google Scholar
Zhang F, Prisacariu V, Yang R, Torr P, H S. (2019) GA-Net: Guided Aggregation Net for End-to-end Stereo Matching[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 185–194
Zhang Y, Chen Y, Bai X, et al. (2020) Adaptive Unimodal Cost Volume Filtering for Deep Stereo Matching[C]//AAAI. 12926–12934
Zhang Z, Lin Z, Xu J, et al. (2020) Bilateral attention network for rgb-d salient object detection[J]. arXiv preprint arXiv:2004.14582

Download references

Funding

This work has been supported in part by the Scientific Research Fund of Education Department of Hunan Province(19B245,19A200,18B349,18B345), the Science and Technology Program of Hunan Province (2016TP1021), the Hunan Provincial Natural Science Foundation (2019JJ40104,2019JJ40110), Hunan postgraduate scientific research project of innovation (CX20190933,CX20190930) the Hunan Emergency Communication Engineering Technology Research Center(2018TP2022), the Engineering Research Center on 3D Reconstruction and Intelligent Application Technology of Hunan Province(2019–430602–73-03-006049).

Author information

Authors and Affiliations

School of Information Science and Engineering, Hunan Institute of Science and Technology, Yueyang, 414006, China
Longyuan Guo, Houyu Duan & Wuwei Zhou
Machine Vision & Artificial Intelligence Research Center, Hunan Institute of Science and Technology, Yueyang, 414006, China
Longyuan Guo, Houyu Duan & Wuwei Zhou

Authors

Longyuan Guo
View author publications
You can also search for this author in PubMed Google Scholar
Houyu Duan
View author publications
You can also search for this author in PubMed Google Scholar
Wuwei Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Longyuan Guo.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Table 9 The description of symbols for all equations in the main text

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, L., Duan, H. & Zhou, W. Multiple attention networks for stereo matching. Multimed Tools Appl 80, 28583–28601 (2021). https://doi.org/10.1007/s11042-021-11102-9

Download citation

Received: 20 August 2020
Revised: 06 January 2021
Accepted: 21 May 2021
Published: 07 June 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s11042-021-11102-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiple attention networks for stereo matching

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Area-based correlation and non-local attention network for stereo matching

Hierarchical Correlation Stereo Matching Network

An Improved Stereo Matching Algorithm Based on AnyNet

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Multiple attention networks for stereo matching

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Area-based correlation and non-local attention network for stereo matching

Hierarchical Correlation Stereo Matching Network

An Improved Stereo Matching Algorithm Based on AnyNet

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation