Journals & Magazines >IEEE Transactions on Computat... >Volume: 10

A Transformer-Based Architecture for High-Resolution Stereo Matching

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

The Transformer architecture is now widely used due to its superior parallel computing and global modelling capabilities. In this paper, We build a dense Feature Extracti...Show More

Metadata

Abstract:

The Transformer architecture is now widely used due to its superior parallel computing and global modelling capabilities. In this paper, We build a dense Feature Extraction Transformer (FET) for stereo matching tasks, incorporating Transformer and convolution blocks. In stereo matching tasks, FET has three advantages: 1) For stereo image pairs with high resolution, Transformer blocks joined with Spatial pyramidal pooling windows can obtain a wide range of contextual representations while maintaining linear computational complexity; 2) We use convolution and transposed convolution blocks to respectively implement overlapping patch embedding, which allows features to capture enough proximity information to facilitate fine-grained matching. 3) FET creatively utilizes the jump-query strategy to apply the transformer encoder and decoder structures to feature extraction tasks simultaneously. Furthermore, to obtain an architecture more thoroughly based on Transformer, we use STTR's (Li et al., 2021) attention-based pixel-matching strategy. Our model obtained 0.32 end-point error and 0.89% 3-px error on the Scene Flow benchmark (30.95% point and 29.36% point absolute improvement compared to STTR). On the KITTI 2015 benchmark, our model obtained 1.80 D1-bg in Estimated pixels (1.57 points of error reduction compared to STTR).

Published in: IEEE Transactions on Computational Imaging ( Volume: 10)

Page(s): 83 - 92

Date of Publication: 10 January 2024

ISSN Information:

DOI: 10.1109/TCI.2024.3350884

Funding Agency:

Contents

References is not available for this document.

A Transformer-Based Architecture for High-Resolution Stereo Matching

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

A Transformer-Based Architecture for High-Resolution Stereo Matching

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?