Skip to main content

Advertisement

Log in

A multi-scale feature cross-dimensional interaction network for stereo image super-resolution

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Recently, stereo image super-resolution (SSR) has achieved impressive performance by leveraging both intra-view and inter-view information. However, existing SSR methods often rely on single-scale features for stereo image feature extraction and overlook multi-dimensional feature interactions, resulting in poor visual quality with unclear and insufficiently sharp reconstruction of details. To address these issues and achieve better performance for stereo image super-resolution, we propose a multi-scale feature cross-dimensional interaction network (MFCINet) for SSR. Specifically, to fully exploit intra-view information, we design multi-scale feature extraction blocks to capture abundant multi-scale texture patterns, including the Local Feature Extraction Block (LFEB), Mesoscale Feature Extraction Block (MFEB), and Global Feature Extraction Block (GFEB). We progressively fuse smaller-scale features with larger-scale features, utilizing the local texture information contained in the smaller-scale features to refine the global structure information of the larger-scale features. To explore richer interactions of complementary features, we introduce the Cross-dimensional Attention Interaction Block (CAIB), which calculates attention between complementary features across different spatial positions and channels, facilitating comprehensive interaction among complementary features across various dimensions. Extensive experiments and ablation studies demonstrate that MFCINet better leverages intra-view and inter-view information to reconstruct clear texture details, achieving competitive results and outperforming state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Data availability

The data supporting the findings of this study are openly available to the public. The KITTI2012 and KITTI2015 data set are accessible at https://www.cvlibs.net/datasets/kitti/eval_stereo.php. The Flickr1024 data set is accessible at https://yingqianwang.github.io/Flickr1024/. The Middlebury data set is accessible at https://vision.middlebury.edu/stereo/data/.

References

  1. Lin, F., Lin, T., Li, Z., Chen, Q., Wu, J., Yao, Y.: Intelligent perception solution for construction machinery based on binocular stereo vision. In: International Conference on Intelligent Robotics and Applications, pp. 359–374 (2023). Springer

  2. Alzayer, H., Abuolaim, A., Chan, L.C., Yang, Y., Lou, Y.C., Huang, J.-B., Kar, A.: Dc2: Dual-camera defocus control by learning to refocus. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21488–21497 (2023)

  3. Niu, A., Zhu, Y., Zhang, C., Sun, J., Wang, P., Kweon, I.S., Zhang, Y.: Ms2net: Multi-scale and multi-stage feature fusion for blurred image super-resolution. IEEE Trans. Circuits Syst. Video Technol. 32(8), 5137–5150 (2022)

    Article  Google Scholar 

  4. Sun, L., Dong, J., Tang, J., Pan, J.: Spatially-adaptive feature modulation for efficient image super-resolution. arXiv preprint arXiv:2302.13800 (2023)

  5. Niu, A., Wang, P., Zhu, Y., Sun, J., Yan, Q., Zhang, Y.: Gran: Ghost residual attention network for single image super resolution. Multimed. Tools Appl. 83(10), 28505–22 (2023)

    Article  Google Scholar 

  6. Liu, B., Sun, J., Zhu, B., Li, T., Sun, F.: Madformer: multi-attention-driven image super-resolution method based on transformer. Multimed. Syst. 30(2), 78 (2024)

    Article  Google Scholar 

  7. Gao, X., Wu, S., Zhou, Y., Wang, F., Hu, X.: Lcformer: linear complexity transformer for efficient image super-resolution. Multimed. Syst. 30(4), 1–18 (2024)

    Article  Google Scholar 

  8. Han, Y., Du, X., Yang, Z.: Two-stage network for single image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 880–887 (2021)

  9. Du, X., Niu, J., Liu, C.: Expectation-maximization attention cross residual network for single image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 888–896 (2021)

  10. Du, X., Jiang, S., Liu, J.: Augmented global attention network for image super-resolution. IET Image Process. 16(2), 567–575 (2022)

    Article  Google Scholar 

  11. Jeon, D.S., Baek, S.-H., Choi, I., Kim, M.H.: Enhancing the spatial resolution of stereo images using a parallax prior. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1721–1730 (2018)

  12. Wang, Y., Ying, X., Wang, L., Yang, J., An, W., Guo, Y.: Symmetric parallax attention for stereo image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 766–775 (2021)

  13. Song, W., Choi, S., Jeong, S., Sohn, K.: Stereoscopic image super-resolution with stereo consistent feature. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12031–12038 (2020)

  14. Ying, X., Wang, Y., Wang, L., Sheng, W., An, W., Guo, Y.: A stereo attention module for stereo image super-resolution. IEEE Signal Process. Lett. 27, 496–500 (2020)

    Article  Google Scholar 

  15. Lin, J., Yin, L., Wang, Y.: Steformer: Efficient stereo image super-resolution with transformer. IEEE Trans. Multimed. 25, 8396–407 (2023)

    Article  Google Scholar 

  16. Guo, H., Li, J., Gao, G., Li, Z., Zeng, T.: Pft-ssr: Parallax fusion transformer for stereo image super-resolution. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE (2023)

  17. Wang, Y., Wang, L., Yang, J., An, W., Guo, Y.: Flickr1024: A large-scale dataset for stereo image super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)

  18. Wang, H., Chen, X., Ni, B., Liu, Y., Liu, J.: Omni aggregation networks for lightweight image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22378–22387 (2023)

  19. Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.-H.: Restormer: Efficient transformer for high-resolution image restoration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5728–5739 (2022)

  20. Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part IV 13, pp. 184–199. Springer (2014)

  21. Kim, J., Lee, J.K., Lee, K.M.: Accurate image super-resolution using very deep convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1646–1654 (2016)

  22. Lim, B., Son, S., Kim, H., Nah, S., Mu Lee, K.: Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 136–144 (2017)

  23. Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., Fu, Y.: Image super-resolution using very deep residual channel attention networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 286–301 (2018)

  24. Chen, H., Wang, Y., Guo, T., Xu, C., Deng, Y., Liu, Z., Ma, S., Xu, C., Xu, C., Gao, W.: Pre-trained image processing transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12299–12310 (2021)

  25. Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: Swinir: Image restoration using swin transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1833–1844 (2021)

  26. Lu, Z., Li, J., Liu, H., Huang, C., Zhang, L., Zeng, T.: Transformer for single image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 457–466 (2022)

  27. Chen, X., Wang, X., Zhou, J., Qiao, Y., Dong, C.: Activating more pixels in image super-resolution transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22367–22377 (2023)

  28. Zhou, Y., Li, Z., Guo, C.-L., Bai, S., Cheng, M.-M., Hou, Q.: Srformer: Permuted self-attention for single image super-resolution. arXiv preprint arXiv:2303.09735 (2023)

  29. Wang, L., Wang, Y., Liang, Z., Lin, Z., Yang, J., An, W., Guo, Y.: Learning parallax attention for stereo image super-resolution. In: CVPR, pp. 12250–12259 (2019)

  30. Xu, Q., Wang, L., Wang, Y., Sheng, W., Deng, X.: Deep bilateral learning for stereo image super-resolution. IEEE Signal Process. Lett. 28, 613–617 (2021)

    Article  Google Scholar 

  31. Chen, C., Qing, C., Xu, X., Dickinson, P.: Cross parallax attention network for stereo image super-resolution. IEEE Trans. Multimed. 24, 202–216 (2021)

    Article  Google Scholar 

  32. Ma, C., Yan, B., Tan, W., Jiang, X.: Perception-oriented stereo image super-resolution. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 2420–2428 (2021)

  33. Zhu, X., Guo, K., Fang, H., Chen, L., Ren, S., Hu, B.: Cross view capture for stereo image super-resolution. IEEE Trans. Multimed. 24, 3074–3086 (2021)

    Article  Google Scholar 

  34. Wan, J., Yin, H., Liu, Z., Liu, Y., Wang, S.: Multi-stage edge-guided stereo feature interaction network for stereoscopic image super-resolution. IEEE Trans. Broadcast. 69(2), 357–68 (2023)

    Article  Google Scholar 

  35. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)

  36. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

  37. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)

  38. Dai, T., Cai, J., Zhang, Y., Xia, S.-T., Zhang, L.: Second-order attention network for single image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11065–11074 (2019)

  39. Chen, L., Chu, X., Zhang, X., Sun, J.: Simple baselines for image restoration. In: European Conference on Computer Vision, pp. 17–33. Springer (2022)

  40. Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert, D., Wang, Z.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1874–1883 (2016)

  41. Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016)

  42. Scharstein, D., Hirschmüller, H., Kitajima, Y., Krathwohl, G., Nešić, N., Wang, X., Westling, P.: High-resolution stereo datasets with subpixel-accurate ground truth. In: Pattern Recognition: 36th German Conference, GCPR 2014, Münster, Germany, September 2–5, 2014, Proceedings 36, pp. 31–42. Springer (2014)

  43. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012)

  44. Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3061–3070 (2015)

  45. Chu, X., Chen, L., Yu, W.: Nafssr: Stereo image super-resolution using nafnet. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1239–1248 (2022)

  46. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

  47. Ilya, L., Frank, H.: Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016)

  48. Dai, Q., Li, J., Yi, Q., Fang, F., Zhang, G.: Feedback network for mutually boosted stereo image super-resolution and disparity estimation. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 1985–1993 (2021)

  49. Liu, A., Li, S., Chang, Y., Zhang, W., Hou, Y.: Coarse-to-fine cross-view interaction based accurate stereo image super-resolution network. IEEE Trans. Multimed. (2024). https://doi.org/10.1109/TMM.2024.3364492

    Article  Google Scholar 

  50. Guo, X., Yang, K., Yang, W., Wang, X., Li, H.: Group-wise correlation stereo network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3273–3282 (2019)

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China under Grant No.U19B2037 and No.61901384, Natural Science Basic Research Program of Shaanxi Province (Program No.2021JCW-03, No.2023-JC-QN-0685).

Author information

Authors and Affiliations

Authors

Contributions

Jingcheng Zhang and Yu Zhu designed the algorithm and wrote the paper. Axi Niu and Qingsen Yan performed experiment. Shengjun Peng, Jinqiu Sun, and Yanning Zhang contributed to improve the experiment, and wrote the paper.

Corresponding author

Correspondence to Yu Zhu.

Ethics declarations

Conflict of interest

All authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical and informed consent

Not applicable.

Additional information

Communicated by Bing-kun Bao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, J., Zhu, Y., Peng, S. et al. A multi-scale feature cross-dimensional interaction network for stereo image super-resolution. Multimedia Systems 31, 114 (2025). https://doi.org/10.1007/s00530-025-01714-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00530-025-01714-8

Keywords