Abstract
Recent studies have shown that stereo matching can be considered a supervised learning task, in which several left and right images serve as inputs to the convolutional neural network for training, and a detailed disparity map can be obtained. However, the existing architecture for stereo matching is not suitable for estimating the depth of ill-posed regions. To address this problem, we propose a multiple attention network (MA-Net) for stereo matching, which mainly consists of four processes: feature extraction, cost volume construction, cost aggregation, and disparity prediction. For feature extraction, an hourglass position attention module that can effectively aggregate global context and multi-scale information at every position is adopted. In the cost volume construction, we combine cross-correlation volumes with concatenation volumes to ensure that the cost volume can provide efficient representations for measuring feature similarities. In cost aggregation, a multiscale disparity attention module is designed, which can aggregate the feature information of different scales and different disparity dimensions. As in other end-to-end methods, the final disparity is obtained through regression in the disparity prediction. Experimental results obtained on Scene Flow, KITT2012 and KITTI2015 benchmarks show that the proposed method has several advantages in terms of accuracy and speed.










Similar content being viewed by others
References
Aleotti F, Poggi M, Tosi F, et al. (2019) Learning end-to-end scene flow by distilling single tasks knowledge[J]
Bai M, Luo W, Kundu K, Urtasun R (2016) Exploiting semantic information and deep matching for optical flow[C]//European conference on computer vision. Springer, Cham, pp 154–170
Batsos K, Mordohai P (2018) Recresnet: a recurrent residual cnn architecture for disparity map enhancement[C]//2018 international conference on 3D vision (3DV). IEEE, 238–247
Bleyer M, Gelautz M (2007) Graph-cut-based stereo matching using image segmentation with symmetrical treatment of occlusions[J]. Signal Process Image Commun 22(2):127–143
Bleyer M, Rhemann C, Rother C (2011) Patchmatch stereo-stereo matching with slanted support windows[C]//Bmvc. 11: 1–11
Bullinger S, Bodensteiner C, Arens M (2019) 3d object trajectory reconstruction using stereo matching and instance flow based multiple object tracking[C]//2019 16th international conference on machine vision applications (MVA). IEEE, 1–6
Chang J R, Chen Y S (2018) Pyramid stereo matching network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5410–5418
Ernst I, Hirschmüller H (2008) Mutual information based semi-global stereo matching on the GPU[C]//international symposium on visual computing. Springer, Berlin, Heidelberg, pp 228–239
Fan R, Liu Y, Yang X, Bocus, M J, Dahnoun (2018) et al. Real-time stereo vision for road surface 3-d reconstruction[C]//2018 IEEE International Conference on Imaging Systems and Techniques (IST). IEEE, 1–6
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, et al. (2019) Dual attention network for scene segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3146–3154
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The kitti vision benchmark suite,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 3354–3361
Guney F, Geiger A (2015) Displets: Resolving stereo ambiguities using object knowledge[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4165–4175
Guo X, Yang K, Yang W, Wang X, Li H (2019) Group-wise Correlation Stereo Network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3273–3282
Hirschmuller H (2007) Stereo processing by semiglobal matching and mutual information[J]. IEEE Trans Pattern Anal Mach Intell 30(2):328–341
Hosni A, Rhemann C, Bleyer M, Rother C, Gelautz M (2012) Fast cost-volume filtering for visual correspondence and beyond[J]. IEEE Trans Pattern Anal Mach Intell 35(2):504–511
Junming Z , Skinner K A , Vasudevan R , et al. (2019) DispSegNet: leveraging semantics for end-to-end learning of disparity estimation from stereo imagery[J]. IEEE Robot Autom Lett, 1–1
Kanade T, Okutomi M (1991) A stereo matching algorithm with an adaptive window: Theory and experiment[C]//Proceedings. 1991 IEEE International Conference on Robotics and Automation. IEEE, 1088–1095
Kendall A, Martirosyan H, Dasgupta S, Henry P, Kennedy R, Bachrach A (2017) End-to-end learning of geometry and context for deep stereo regression[C]//Proceedings of the IEEE International Conference on Computer Vision. 66–75
Kerkaou Z, El Ansari M (2020) Support vector machines based stereo matching method for advanced driver assistance systems[J]. Multimed Tools Appl 79(37):27039–27055
Kingma D, Ba J (2014) Adam: a method for stochastic optimization[J]. Computer ence
Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks[C]//Advances in neural information processing systems. 1097–1105
Liang Z, Guo Y, Feng Y, Chen W, Qiao L, Zhou L, et al. (2019) Stereo matching using multi-level cost volume and multi-scale feature Constancy[J]. IEEE Trans Pattern Anal Mach Intell.
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 3431–3440
Lu C, Uchiyama H, Thomas D, Shimada A, Taniguchi R (2018) Sparse cost volume for efficient stereo matching[J]. Remote Sens 10(11):1844
Luo W, Schwing A G, Urtasun R (2016) Efficient deep learning for stereo matching[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5695–5703
Mayer N, Ilg E, Hausser P, Fischer P, Cremers D, Dosovitskiy A (2016) A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4040–4048
Nie GY, Cheng MM, Liu Y, et al. (2019) Multi-level context ultra-aggregation for stereo matching[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 3283–3291
Ou X, Yan P, Zhang Y, Tu B, Zhang G, Wu J, Li W (2019) Moving object detection method via ResNet-18 with encoder–decoder structure in complex scenes[J]. IEEE Access 7:108152–108160
Rao Z, He M, Dai Y, Zhu Z., Li B., He R. (2020) NLCA-net: a non-local context attention network for stereo matching[J]. APSIPA Trans Signal Inf Process, 9
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 779–788
Sang H, Wang Q, Zhao Y (2019) Multi-scale context attention network for stereo matching[J]. IEEE Access 7:15152–15161
Scharstein D, Szeliski R (2002) A taxonomy and evaluation of dense two-frame stereo correspondence algorithms[J]. Int J Comput Vis 47(1–3):7–42
Seki A, Pollefeys M (2017) Sgm-nets: Semi-global matching with neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 231–240
Shaked A, Wolf L (2017) Improved stereo matching with constant highway networks and reflective confidence learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4641–4650
Song X, Zhao X, Hu H, Fang L (2018) Edgestereo: a context integrated residual pyramid network for stereo matching[C]//Asian conference on computer vision. Springer, Cham, pp 20–35
Tulyakov S, Ivanov A, Fleuret F (2017) Weakly supervised learning of deep metrics for stereo reconstruction[C]//Proceedings of the IEEE International Conference on Computer Vision. 1339–1348
Xu H, Zhang J (2020) AANet: Adaptive Aggregation Network for Efficient Stereo Matching[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1959-1968
Yao M, Ouyang W, Xu B (2020) Hybrid cost aggregation for dense stereo matching[J]. Multimed Tools Appl 79(31–32):23189–23202
Yee K, Chakrabarti A (2020) Fast Deep Stereo with 2D Convolutional Processing of Cost Signatures[C]//The IEEE Winter Conference on Applications of Computer Vision. 183–191
Yin Z, Darrell T, Yu F (2019) Hierarchical discrete distribution decomposition for match density estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6044–6053
Zbontar J, LeCun Y (2015) Computing the stereo matching cost with a convolutional neural network[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 1592–1599
Zbontar J, LeCun Y (2016) Stereo matching by training a convolutional neural network to compare image patches[J]. J Mach Learn Res 17(1–32):2
Zhang F, Wah BW (2017) Fundamental principles on learning new features for effective dense matching[J]. IEEE Trans Image Process 27(2):822–836
Zhang F, Prisacariu V, Yang R, Torr P, H S. (2019) GA-Net: Guided Aggregation Net for End-to-end Stereo Matching[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 185–194
Zhang Y, Chen Y, Bai X, et al. (2020) Adaptive Unimodal Cost Volume Filtering for Deep Stereo Matching[C]//AAAI. 12926–12934
Zhang Z, Lin Z, Xu J, et al. (2020) Bilateral attention network for rgb-d salient object detection[J]. arXiv preprint arXiv:2004.14582
Funding
This work has been supported in part by the Scientific Research Fund of Education Department of Hunan Province(19B245,19A200,18B349,18B345), the Science and Technology Program of Hunan Province (2016TP1021), the Hunan Provincial Natural Science Foundation (2019JJ40104,2019JJ40110), Hunan postgraduate scientific research project of innovation (CX20190933,CX20190930) the Hunan Emergency Communication Engineering Technology Research Center(2018TP2022), the Engineering Research Center on 3D Reconstruction and Intelligent Application Technology of Hunan Province(2019–430602–73-03-006049).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Rights and permissions
About this article
Cite this article
Guo, L., Duan, H. & Zhou, W. Multiple attention networks for stereo matching. Multimed Tools Appl 80, 28583–28601 (2021). https://doi.org/10.1007/s11042-021-11102-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11102-9