A multi-scale feature cross-dimensional interaction network for stereo image super-resolution

Zhang, Jingcheng; Zhu, Yu; Peng, Shengjun; Niu, Axi; Yan, Qingsen; Sun, Jinqiu; Zhang, Yanning

doi:10.1007/s00530-025-01714-8

A multi-scale feature cross-dimensional interaction network for stereo image super-resolution

Regular Paper
Published: 17 February 2025

Volume 31, article number 114, (2025)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Jingcheng Zhang¹,
Yu Zhu¹,
Shengjun Peng³,
Axi Niu¹,
Qingsen Yan¹,
Jinqiu Sun² &
…
Yanning Zhang¹

145 Accesses
1 Citation
Explore all metrics

Abstract

Recently, stereo image super-resolution (SSR) has achieved impressive performance by leveraging both intra-view and inter-view information. However, existing SSR methods often rely on single-scale features for stereo image feature extraction and overlook multi-dimensional feature interactions, resulting in poor visual quality with unclear and insufficiently sharp reconstruction of details. To address these issues and achieve better performance for stereo image super-resolution, we propose a multi-scale feature cross-dimensional interaction network (MFCINet) for SSR. Specifically, to fully exploit intra-view information, we design multi-scale feature extraction blocks to capture abundant multi-scale texture patterns, including the Local Feature Extraction Block (LFEB), Mesoscale Feature Extraction Block (MFEB), and Global Feature Extraction Block (GFEB). We progressively fuse smaller-scale features with larger-scale features, utilizing the local texture information contained in the smaller-scale features to refine the global structure information of the larger-scale features. To explore richer interactions of complementary features, we introduce the Cross-dimensional Attention Interaction Block (CAIB), which calculates attention between complementary features across different spatial positions and channels, facilitating comprehensive interaction among complementary features across various dimensions. Extensive experiments and ablation studies demonstrate that MFCINet better leverages intra-view and inter-view information to reconstruct clear texture details, achieving competitive results and outperforming state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSRT: Intra- and cross-view attention for stereo image super-resolution

Article 09 August 2024

CT-MVSNet: Efficient Multi-view Stereo with Cross-Scale Transformer

Image Super-Resolution with Multi-scale Hybrid Attention

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Data availability

The data supporting the findings of this study are openly available to the public. The KITTI2012 and KITTI2015 data set are accessible at https://www.cvlibs.net/datasets/kitti/eval_stereo.php. The Flickr1024 data set is accessible at https://yingqianwang.github.io/Flickr1024/. The Middlebury data set is accessible at https://vision.middlebury.edu/stereo/data/.

References

Lin, F., Lin, T., Li, Z., Chen, Q., Wu, J., Yao, Y.: Intelligent perception solution for construction machinery based on binocular stereo vision. In: International Conference on Intelligent Robotics and Applications, pp. 359–374 (2023). Springer
Alzayer, H., Abuolaim, A., Chan, L.C., Yang, Y., Lou, Y.C., Huang, J.-B., Kar, A.: Dc2: Dual-camera defocus control by learning to refocus. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21488–21497 (2023)
Niu, A., Zhu, Y., Zhang, C., Sun, J., Wang, P., Kweon, I.S., Zhang, Y.: Ms2net: Multi-scale and multi-stage feature fusion for blurred image super-resolution. IEEE Trans. Circuits Syst. Video Technol. 32(8), 5137–5150 (2022)
Article Google Scholar
Sun, L., Dong, J., Tang, J., Pan, J.: Spatially-adaptive feature modulation for efficient image super-resolution. arXiv preprint arXiv:2302.13800 (2023)
Niu, A., Wang, P., Zhu, Y., Sun, J., Yan, Q., Zhang, Y.: Gran: Ghost residual attention network for single image super resolution. Multimed. Tools Appl. 83(10), 28505–22 (2023)
Article Google Scholar
Liu, B., Sun, J., Zhu, B., Li, T., Sun, F.: Madformer: multi-attention-driven image super-resolution method based on transformer. Multimed. Syst. 30(2), 78 (2024)
Article Google Scholar
Gao, X., Wu, S., Zhou, Y., Wang, F., Hu, X.: Lcformer: linear complexity transformer for efficient image super-resolution. Multimed. Syst. 30(4), 1–18 (2024)
Article Google Scholar
Han, Y., Du, X., Yang, Z.: Two-stage network for single image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 880–887 (2021)
Du, X., Niu, J., Liu, C.: Expectation-maximization attention cross residual network for single image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 888–896 (2021)
Du, X., Jiang, S., Liu, J.: Augmented global attention network for image super-resolution. IET Image Process. 16(2), 567–575 (2022)
Article Google Scholar
Jeon, D.S., Baek, S.-H., Choi, I., Kim, M.H.: Enhancing the spatial resolution of stereo images using a parallax prior. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1721–1730 (2018)
Wang, Y., Ying, X., Wang, L., Yang, J., An, W., Guo, Y.: Symmetric parallax attention for stereo image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 766–775 (2021)
Song, W., Choi, S., Jeong, S., Sohn, K.: Stereoscopic image super-resolution with stereo consistent feature. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12031–12038 (2020)
Ying, X., Wang, Y., Wang, L., Sheng, W., An, W., Guo, Y.: A stereo attention module for stereo image super-resolution. IEEE Signal Process. Lett. 27, 496–500 (2020)
Article Google Scholar
Lin, J., Yin, L., Wang, Y.: Steformer: Efficient stereo image super-resolution with transformer. IEEE Trans. Multimed. 25, 8396–407 (2023)
Article Google Scholar
Guo, H., Li, J., Gao, G., Li, Z., Zeng, T.: Pft-ssr: Parallax fusion transformer for stereo image super-resolution. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE (2023)
Wang, Y., Wang, L., Yang, J., An, W., Guo, Y.: Flickr1024: A large-scale dataset for stereo image super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)
Wang, H., Chen, X., Ni, B., Liu, Y., Liu, J.: Omni aggregation networks for lightweight image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22378–22387 (2023)
Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.-H.: Restormer: Efficient transformer for high-resolution image restoration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5728–5739 (2022)
Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part IV 13, pp. 184–199. Springer (2014)
Kim, J., Lee, J.K., Lee, K.M.: Accurate image super-resolution using very deep convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1646–1654 (2016)
Lim, B., Son, S., Kim, H., Nah, S., Mu Lee, K.: Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 136–144 (2017)
Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., Fu, Y.: Image super-resolution using very deep residual channel attention networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 286–301 (2018)
Chen, H., Wang, Y., Guo, T., Xu, C., Deng, Y., Liu, Z., Ma, S., Xu, C., Xu, C., Gao, W.: Pre-trained image processing transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12299–12310 (2021)
Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: Swinir: Image restoration using swin transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1833–1844 (2021)
Lu, Z., Li, J., Liu, H., Huang, C., Zhang, L., Zeng, T.: Transformer for single image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 457–466 (2022)
Chen, X., Wang, X., Zhou, J., Qiao, Y., Dong, C.: Activating more pixels in image super-resolution transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22367–22377 (2023)
Zhou, Y., Li, Z., Guo, C.-L., Bai, S., Cheng, M.-M., Hou, Q.: Srformer: Permuted self-attention for single image super-resolution. arXiv preprint arXiv:2303.09735 (2023)
Wang, L., Wang, Y., Liang, Z., Lin, Z., Yang, J., An, W., Guo, Y.: Learning parallax attention for stereo image super-resolution. In: CVPR, pp. 12250–12259 (2019)
Xu, Q., Wang, L., Wang, Y., Sheng, W., Deng, X.: Deep bilateral learning for stereo image super-resolution. IEEE Signal Process. Lett. 28, 613–617 (2021)
Article Google Scholar
Chen, C., Qing, C., Xu, X., Dickinson, P.: Cross parallax attention network for stereo image super-resolution. IEEE Trans. Multimed. 24, 202–216 (2021)
Article Google Scholar
Ma, C., Yan, B., Tan, W., Jiang, X.: Perception-oriented stereo image super-resolution. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 2420–2428 (2021)
Zhu, X., Guo, K., Fang, H., Chen, L., Ren, S., Hu, B.: Cross view capture for stereo image super-resolution. IEEE Trans. Multimed. 24, 3074–3086 (2021)
Article Google Scholar
Wan, J., Yin, H., Liu, Z., Liu, Y., Wang, S.: Multi-stage edge-guided stereo feature interaction network for stereoscopic image super-resolution. IEEE Trans. Broadcast. 69(2), 357–68 (2023)
Article Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Dai, T., Cai, J., Zhang, Y., Xia, S.-T., Zhang, L.: Second-order attention network for single image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11065–11074 (2019)
Chen, L., Chu, X., Zhang, X., Sun, J.: Simple baselines for image restoration. In: European Conference on Computer Vision, pp. 17–33. Springer (2022)
Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert, D., Wang, Z.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1874–1883 (2016)
Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016)
Scharstein, D., Hirschmüller, H., Kitajima, Y., Krathwohl, G., Nešić, N., Wang, X., Westling, P.: High-resolution stereo datasets with subpixel-accurate ground truth. In: Pattern Recognition: 36th German Conference, GCPR 2014, Münster, Germany, September 2–5, 2014, Proceedings 36, pp. 31–42. Springer (2014)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012)
Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3061–3070 (2015)
Chu, X., Chen, L., Yu, W.: Nafssr: Stereo image super-resolution using nafnet. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1239–1248 (2022)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Ilya, L., Frank, H.: Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016)
Dai, Q., Li, J., Yi, Q., Fang, F., Zhang, G.: Feedback network for mutually boosted stereo image super-resolution and disparity estimation. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 1985–1993 (2021)
Liu, A., Li, S., Chang, Y., Zhang, W., Hou, Y.: Coarse-to-fine cross-view interaction based accurate stereo image super-resolution network. IEEE Trans. Multimed. (2024). https://doi.org/10.1109/TMM.2024.3364492
Article Google Scholar
Guo, X., Yang, K., Yang, W., Wang, X., Li, H.: Group-wise correlation stereo network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3273–3282 (2019)

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China under Grant No.U19B2037 and No.61901384, Natural Science Basic Research Program of Shaanxi Province (Program No.2021JCW-03, No.2023-JC-QN-0685).

Author information

Authors and Affiliations

School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072, China
Jingcheng Zhang, Yu Zhu, Axi Niu, Qingsen Yan & Yanning Zhang
School of Astronautics, Northwestern Polytechnical University, Xi’an, 710072, China
Jinqiu Sun
China Xi’an Satellite Control Center, Xi’an, 710699, China
Shengjun Peng

Authors

Jingcheng Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Yu Zhu
View author publications
You can also search for this author inPubMed Google Scholar
Shengjun Peng
View author publications
You can also search for this author inPubMed Google Scholar
Axi Niu
View author publications
You can also search for this author inPubMed Google Scholar
Qingsen Yan
View author publications
You can also search for this author inPubMed Google Scholar
Jinqiu Sun
View author publications
You can also search for this author inPubMed Google Scholar
Yanning Zhang
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Jingcheng Zhang and Yu Zhu designed the algorithm and wrote the paper. Axi Niu and Qingsen Yan performed experiment. Shengjun Peng, Jinqiu Sun, and Yanning Zhang contributed to improve the experiment, and wrote the paper.

Corresponding author

Correspondence to Yu Zhu.

Ethics declarations

Conflict of interest

All authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical and informed consent

Not applicable.

Additional information

Communicated by Bing-kun Bao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, J., Zhu, Y., Peng, S. et al. A multi-scale feature cross-dimensional interaction network for stereo image super-resolution. Multimedia Systems 31, 114 (2025). https://doi.org/10.1007/s00530-025-01714-8

Download citation

Received: 07 October 2024
Accepted: 05 February 2025
Published: 17 February 2025
DOI: https://doi.org/10.1007/s00530-025-01714-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A multi-scale feature cross-dimensional interaction network for stereo image super-resolution

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

SSRT: Intra- and cross-view attention for stereo image super-resolution

CT-MVSNet: Efficient Multi-view Stereo with Cross-Scale Transformer

Image Super-Resolution with Multi-scale Hybrid Attention

Explore related subjects

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical and informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now