STTR-3D: Stereo Transformer 3D Network for Video-Based Disparity Change Estimation

Yang, Qitong; Rakai, Lionel; Sun, Shijie; Song, Huansheng; Song, Xiangyu; Akhtar, Naveed

doi:10.1007/978-981-97-2421-5_15

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14334))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data

346 Accesses

Abstract

In the field of computer vision and stereo depth estimation, there has been little research in obtaining high-accuracy disparity change maps from two-dimensional images. This map offers information that fills the gap between optical flow and depth which is desirable for numerous academic research problems and industrial applications, such as navigation systems, driving assistance, and autonomous systems. We introduce STTR3D, a 3D extension of the STereo TRansformer (STTR) which leverages transformers and an attention mechanism to handle stereo depth estimation. We further make use of the Scene Flow FlyingThings3D dataset which openly includes data for disparity change and apply 1) refinements through the use of MLP over relative position encoding and 2) regression head with an entropy-regularized optimal transport to obtain a disparity change map. This model consistently demonstrates superior performance for depth estimation tasks as compared to the original model. Compared to the existing supervised learning methods for estimating stereo depth, our technique simultaneously handles disparity estimation and the disparity change problem with an end-to-end network, also establishing that the addition of our transformer yields improved performance that achieves high precision for both issues.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Dual-stream stereo network for depth estimation

Article 11 October 2022

Displacement-Invariant Cost Computation for Stereo Matching

Article Open access 16 March 2022

M $$^2$$ Depth: Self-supervised Two-Frame Multi-camera Metric Depth Estimation

References

Aich, S., Vianney, J.M.U., Islam, M.A., Liu, M.K.B.: Bidirectional attention network for monocular depth estimation. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 11746–11752. IEEE (2021)
Google Scholar
Badki, A., Troccoli, A., Kim, K., Kautz, J., Sen, P., Gallo, O.: Bi3D: stereo depth estimation via binary classifications. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1600–1608 (2020)
Google Scholar
Behl, A., Hosseini Jafari, O., Karthik Mustikovela, S., Abu Alhaija, H., Rother, C., Geiger, A.: Bounding boxes, segmentations and object coordinates: How important is recognition for 3D scene flow estimation in autonomous driving scenarios? In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2574–2583 (2017)
Google Scholar
Behl, A., Paschalidou, D., Donné, S., Geiger, A.: PointFlowNet: learning representations for rigid motion estimation from point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7962–7971 (2019)
Google Scholar
Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5410–5418 (2018)
Google Scholar
Diamantas, S.C., Oikonomidis, A., Crowder, R.M.: Depth estimation for autonomous robot navigation: a comparative approach. In: 2010 IEEE International Conference on Imaging Systems and Techniques, pp. 426–430. IEEE (2010)
Google Scholar
Dong, Q., Feng, J.: Outlier detection and disparity refinement in stereo matching. J. Vis. Commun. Image Represent. 60, 380–390 (2019)
Article Google Scholar
Fang, U., Li, J., Lu, X., Mian, A., Gu, Z.: Robust image clustering via context-aware contrastive graph learning. Pattern Recognit. 138, 109340 (2023)
Article Google Scholar
Fang, U., Li, M., Li, J., Gao, L., Jia, T., Zhang, Y.: A comprehensive survey on multi-view clustering. IEEE Trans. Knowl. Data Eng. 35, 12350–12368 (2023)
Article Google Scholar
Fletcher, L., Loy, G., Barnes, N., Zelinsky, A.: Correlating driver gaze with the road scene for driver assistance systems. Robot. Auton. Syst. 52(1), 71–84 (2005)
Article Google Scholar
Garg, D., Wang, Y., Hariharan, B., Campbell, M., Weinberger, K.Q., Chao, W.L.: Wasserstein distances for stereo disparity estimation. Adv. Neural. Inf. Process. Syst. 33, 22517–22529 (2020)
Google Scholar
Girshick, R.: Fast r-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Google Scholar
Griewank, A., Walther, A.: Algorithm 799: revolve: an implementation of checkpointing for the reverse or adjoint mode of computational differentiation. ACM Trans. Math. Softw. 26(1), 19–45 (2000)
Article Google Scholar
Gu, X., Wang, Y., Wu, C., Lee, Y.J., Wang, P.: HPLFlowNet: hierarchical permutohedral lattice FlowNet for scene flow estimation on large-scale point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3254–3263 (2019)
Google Scholar
Guo, X., Yang, K., Yang, W., Wang, X., Li, H.: Group-wise correlation stereo network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3273–3282 (2019)
Google Scholar
Hur, J., Roth, S.: Self-supervised monocular scene flow estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7396–7405 (2020)
Google Scholar
Ilg, E., Saikia, T., Keuper, M., Brox, T.: Occlusions, motion and depth boundaries with a generic network for disparity, optical flow or scene flow estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 614–630 (2018)
Google Scholar
Jia, X., et al.: Highly scalable deep learning training system with mixed-precision: training imagenet in four minutes. arXiv preprint arXiv:1807.11205 (2018)
Jiang, H., Sun, D., Jampani, V., Lv, Z., Learned-Miller, E., Kautz, J.: Sense: A shared encoder network for scene-flow estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3195–3204 (2019)
Google Scholar
Kukkala, V.K., Tunnell, J., Pasricha, S., Bradley, T.: Advanced driver-assistance systems: a path toward autonomous vehicles. IEEE Consum. Electron. Mag. 7(5), 18–25 (2018)
Article Google Scholar
Li, Z., et al.: Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6197–6206 (2021)
Google Scholar
Liu, X., et al.: Extremely dense point correspondences using a learned feature descriptor. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4847–4856 (2020)
Google Scholar
Ma, W.C., Wang, S., Hu, R., Xiong, Y., Urtasun, R.: Deep rigid instance scene flow. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3614–3622 (2019)
Google Scholar
Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4040–4048 (2016)
Google Scholar
Micikevicius, P., et al.: Mixed precision training. arXiv preprint arXiv:1710.03740 (2017)
Mukherjee, S., Guddeti, R.M.R.: A hybrid algorithm for disparity calculation from sparse disparity estimates based on stereo vision. In: 2014 International Conference on Signal Processing and Communications (SPCOM), pp. 1–6. IEEE (2014)
Google Scholar
Özçift, A., Akarsu, K., Yumuk, F., Söylemez, C.: Advancing natural language processing (NLP) applications of morphologically rich languages with bidirectional encoder representations from transformers (BERT): an empirical case study for Turkish. Automatika: časopis za automatiku, mjerenje, elektroniku, računarstvo i komunikacije 62(2), 226–238 (2021)
Google Scholar
Pajić, V., Govedarica, M., Amović, M.: Model of point cloud data management system in big data paradigm. ISPRS Int. J. Geo Inf. 7(7), 265 (2018)
Article Google Scholar
de Queiroz Mendes, R., Ribeiro, E.G., dos Santos Rosa, N., Grassi, V., Jr.: On deep learning techniques to boost monocular depth estimation for autonomous navigation. Robot. Auton. Syst. 136, 103701 (2021)
Article Google Scholar
Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: Superglue: learning feature matching with graph neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4938–4947 (2020)
Google Scholar
Shen, M., Gu, Y., Liu, N., Yang, G.Z.: Context-aware depth and pose estimation for bronchoscopic navigation. IEEE Robot. Autom. Lett. 4(2), 732–739 (2019)
Article Google Scholar
Vallender, S.: Calculation of the wasserstein distance between probability distributions on the line. Theory Probab. Appl. 18(4), 784–786 (1974)
Article Google Scholar
Vegeshna, V.P.K.V.: Stereo matching with color-weighted correlation, hierarchical belief propagation and occlusion handling. arXiv preprint arXiv:1708.07987 (2017)
Wang, L., Ren, J., Xu, B., Li, J., Luo, W., Xia, F.: Model: motif-based deep feature learning for link prediction. IEEE Trans. Comput. Soc. Syst. 7(2), 503–516 (2020)
Article Google Scholar
Wang, Z., Li, S., Howard-Jenkins, H., Prisacariu, V., Chen, M.: FlowNet3D++: geometric losses for deep scene flow estimation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 91–98 (2020)
Google Scholar
Xie, Z., Chen, S., Orchard, G.: Event-based stereo depth estimation using belief propagation. Front. Neurosci. 11, 535 (2017)
Article Google Scholar
Xu, C., Guan, Z., Zhao, W., Wu, H., Niu, Y., Ling, B.: Adversarial incomplete multi-view clustering. In: IJCAI, vol. 7, pp. 3933–3939 (2019)
Google Scholar
Xu, C., Zhao, W., Zhao, J., Guan, Z., Song, X., Li, J.: Uncertainty-aware multiview deep learning for internet of things applications. IEEE Trans. Industr. Inf. 19(2), 1456–1466 (2022)
Article Google Scholar
Xu, D., Wang, W., Tang, H., Liu, H., Sebe, N., Ricci, E.: Structured attention guided convolutional neural fields for monocular depth estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3917–3925 (2018)
Google Scholar
Xu, H., Zhang, J.: AANet: adaptive aggregation network for efficient stereo matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1959–1968 (2020)
Google Scholar
Yee, K., Chakrabarti, A.: Fast deep stereo with 2D convolutional processing of cost signatures. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 183–191 (2020)
Google Scholar
Yin, H., Yang, S., Song, X., Liu, W., Li, J.: Deep fusion of multimodal features for social media retweet time prediction. World Wide Web 24, 1027–1044 (2021)
Article Google Scholar
Zhang, F., Prisacariu, V., Yang, R., Torr, P.H.: GA-Net: guided aggregation net for end-to-end stereo matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 185–194 (2019)
Google Scholar
Zhou, C., Yan, Q., Shi, Y., Sun, L.: DoubleStar: long-range attack towards depth estimation based obstacle avoidance in autonomous systems. arXiv preprint arXiv:2110.03154 (2021)

Download references

Acknowledgment

This work was supported in part by the National Natural Science Foundation of China under Grant No. 62072053 and the National Natural Fund Joint Fund Project under Grant No. U21B2041.

Author information

Authors and Affiliations

Chang’an University, Xi’an, 710000, ShaanXi, China
Qitong Yang, Lionel Rakai, Shijie Sun & Huansheng Song
Swinburne University of Technology, Hawthorn, VIC, 3122, Australia
Xiangyu Song
The University of Western Australia, Crawley, WA, 6009, Australia
Naveed Akhtar

Authors

Qitong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Lionel Rakai
View author publications
You can also search for this author in PubMed Google Scholar
Shijie Sun
View author publications
You can also search for this author in PubMed Google Scholar
Huansheng Song
View author publications
You can also search for this author in PubMed Google Scholar
Xiangyu Song
View author publications
You can also search for this author in PubMed Google Scholar
Naveed Akhtar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huansheng Song .

Editor information

Editors and Affiliations

Peng Cheng Laboratory, Shenzhen, China
Xiangyu Song
China University of Geosciences, Wuhan, China
Ruyi Feng
China University of Geosciences, Wuhan, China
Yunliang Chen
Deakin University, Burwood, VIC, Australia
Jianxin Li
University of Exeter, Exeter, UK
Geyong Min

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, Q., Rakai, L., Sun, S., Song, H., Song, X., Akhtar, N. (2024). STTR-3D: Stereo Transformer 3D Network for Video-Based Disparity Change Estimation. In: Song, X., Feng, R., Chen, Y., Li, J., Min, G. (eds) Web and Big Data. APWeb-WAIM 2023. Lecture Notes in Computer Science, vol 14334. Springer, Singapore. https://doi.org/10.1007/978-981-97-2421-5_15

Download citation

DOI: https://doi.org/10.1007/978-981-97-2421-5_15
Published: 12 May 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2420-8
Online ISBN: 978-981-97-2421-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

STTR-3D: Stereo Transformer 3D Network for Video-Based Disparity Change Estimation