Abstract
Recently, stereo image super-resolution (SSR) has achieved impressive performance by leveraging both intra-view and inter-view information. However, existing SSR methods often rely on single-scale features for stereo image feature extraction and overlook multi-dimensional feature interactions, resulting in poor visual quality with unclear and insufficiently sharp reconstruction of details. To address these issues and achieve better performance for stereo image super-resolution, we propose a multi-scale feature cross-dimensional interaction network (MFCINet) for SSR. Specifically, to fully exploit intra-view information, we design multi-scale feature extraction blocks to capture abundant multi-scale texture patterns, including the Local Feature Extraction Block (LFEB), Mesoscale Feature Extraction Block (MFEB), and Global Feature Extraction Block (GFEB). We progressively fuse smaller-scale features with larger-scale features, utilizing the local texture information contained in the smaller-scale features to refine the global structure information of the larger-scale features. To explore richer interactions of complementary features, we introduce the Cross-dimensional Attention Interaction Block (CAIB), which calculates attention between complementary features across different spatial positions and channels, facilitating comprehensive interaction among complementary features across various dimensions. Extensive experiments and ablation studies demonstrate that MFCINet better leverages intra-view and inter-view information to reconstruct clear texture details, achieving competitive results and outperforming state-of-the-art methods.














Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Data availability
The data supporting the findings of this study are openly available to the public. The KITTI2012 and KITTI2015 data set are accessible at https://www.cvlibs.net/datasets/kitti/eval_stereo.php. The Flickr1024 data set is accessible at https://yingqianwang.github.io/Flickr1024/. The Middlebury data set is accessible at https://vision.middlebury.edu/stereo/data/.
References
Lin, F., Lin, T., Li, Z., Chen, Q., Wu, J., Yao, Y.: Intelligent perception solution for construction machinery based on binocular stereo vision. In: International Conference on Intelligent Robotics and Applications, pp. 359–374 (2023). Springer
Alzayer, H., Abuolaim, A., Chan, L.C., Yang, Y., Lou, Y.C., Huang, J.-B., Kar, A.: Dc2: Dual-camera defocus control by learning to refocus. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21488–21497 (2023)
Niu, A., Zhu, Y., Zhang, C., Sun, J., Wang, P., Kweon, I.S., Zhang, Y.: Ms2net: Multi-scale and multi-stage feature fusion for blurred image super-resolution. IEEE Trans. Circuits Syst. Video Technol. 32(8), 5137–5150 (2022)
Sun, L., Dong, J., Tang, J., Pan, J.: Spatially-adaptive feature modulation for efficient image super-resolution. arXiv preprint arXiv:2302.13800 (2023)
Niu, A., Wang, P., Zhu, Y., Sun, J., Yan, Q., Zhang, Y.: Gran: Ghost residual attention network for single image super resolution. Multimed. Tools Appl. 83(10), 28505–22 (2023)
Liu, B., Sun, J., Zhu, B., Li, T., Sun, F.: Madformer: multi-attention-driven image super-resolution method based on transformer. Multimed. Syst. 30(2), 78 (2024)
Gao, X., Wu, S., Zhou, Y., Wang, F., Hu, X.: Lcformer: linear complexity transformer for efficient image super-resolution. Multimed. Syst. 30(4), 1–18 (2024)
Han, Y., Du, X., Yang, Z.: Two-stage network for single image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 880–887 (2021)
Du, X., Niu, J., Liu, C.: Expectation-maximization attention cross residual network for single image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 888–896 (2021)
Du, X., Jiang, S., Liu, J.: Augmented global attention network for image super-resolution. IET Image Process. 16(2), 567–575 (2022)
Jeon, D.S., Baek, S.-H., Choi, I., Kim, M.H.: Enhancing the spatial resolution of stereo images using a parallax prior. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1721–1730 (2018)
Wang, Y., Ying, X., Wang, L., Yang, J., An, W., Guo, Y.: Symmetric parallax attention for stereo image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 766–775 (2021)
Song, W., Choi, S., Jeong, S., Sohn, K.: Stereoscopic image super-resolution with stereo consistent feature. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12031–12038 (2020)
Ying, X., Wang, Y., Wang, L., Sheng, W., An, W., Guo, Y.: A stereo attention module for stereo image super-resolution. IEEE Signal Process. Lett. 27, 496–500 (2020)
Lin, J., Yin, L., Wang, Y.: Steformer: Efficient stereo image super-resolution with transformer. IEEE Trans. Multimed. 25, 8396–407 (2023)
Guo, H., Li, J., Gao, G., Li, Z., Zeng, T.: Pft-ssr: Parallax fusion transformer for stereo image super-resolution. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE (2023)
Wang, Y., Wang, L., Yang, J., An, W., Guo, Y.: Flickr1024: A large-scale dataset for stereo image super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)
Wang, H., Chen, X., Ni, B., Liu, Y., Liu, J.: Omni aggregation networks for lightweight image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22378–22387 (2023)
Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.-H.: Restormer: Efficient transformer for high-resolution image restoration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5728–5739 (2022)
Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part IV 13, pp. 184–199. Springer (2014)
Kim, J., Lee, J.K., Lee, K.M.: Accurate image super-resolution using very deep convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1646–1654 (2016)
Lim, B., Son, S., Kim, H., Nah, S., Mu Lee, K.: Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 136–144 (2017)
Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., Fu, Y.: Image super-resolution using very deep residual channel attention networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 286–301 (2018)
Chen, H., Wang, Y., Guo, T., Xu, C., Deng, Y., Liu, Z., Ma, S., Xu, C., Xu, C., Gao, W.: Pre-trained image processing transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12299–12310 (2021)
Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: Swinir: Image restoration using swin transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1833–1844 (2021)
Lu, Z., Li, J., Liu, H., Huang, C., Zhang, L., Zeng, T.: Transformer for single image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 457–466 (2022)
Chen, X., Wang, X., Zhou, J., Qiao, Y., Dong, C.: Activating more pixels in image super-resolution transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22367–22377 (2023)
Zhou, Y., Li, Z., Guo, C.-L., Bai, S., Cheng, M.-M., Hou, Q.: Srformer: Permuted self-attention for single image super-resolution. arXiv preprint arXiv:2303.09735 (2023)
Wang, L., Wang, Y., Liang, Z., Lin, Z., Yang, J., An, W., Guo, Y.: Learning parallax attention for stereo image super-resolution. In: CVPR, pp. 12250–12259 (2019)
Xu, Q., Wang, L., Wang, Y., Sheng, W., Deng, X.: Deep bilateral learning for stereo image super-resolution. IEEE Signal Process. Lett. 28, 613–617 (2021)
Chen, C., Qing, C., Xu, X., Dickinson, P.: Cross parallax attention network for stereo image super-resolution. IEEE Trans. Multimed. 24, 202–216 (2021)
Ma, C., Yan, B., Tan, W., Jiang, X.: Perception-oriented stereo image super-resolution. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 2420–2428 (2021)
Zhu, X., Guo, K., Fang, H., Chen, L., Ren, S., Hu, B.: Cross view capture for stereo image super-resolution. IEEE Trans. Multimed. 24, 3074–3086 (2021)
Wan, J., Yin, H., Liu, Z., Liu, Y., Wang, S.: Multi-stage edge-guided stereo feature interaction network for stereoscopic image super-resolution. IEEE Trans. Broadcast. 69(2), 357–68 (2023)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Dai, T., Cai, J., Zhang, Y., Xia, S.-T., Zhang, L.: Second-order attention network for single image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11065–11074 (2019)
Chen, L., Chu, X., Zhang, X., Sun, J.: Simple baselines for image restoration. In: European Conference on Computer Vision, pp. 17–33. Springer (2022)
Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert, D., Wang, Z.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1874–1883 (2016)
Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016)
Scharstein, D., Hirschmüller, H., Kitajima, Y., Krathwohl, G., Nešić, N., Wang, X., Westling, P.: High-resolution stereo datasets with subpixel-accurate ground truth. In: Pattern Recognition: 36th German Conference, GCPR 2014, Münster, Germany, September 2–5, 2014, Proceedings 36, pp. 31–42. Springer (2014)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012)
Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3061–3070 (2015)
Chu, X., Chen, L., Yu, W.: Nafssr: Stereo image super-resolution using nafnet. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1239–1248 (2022)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Ilya, L., Frank, H.: Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016)
Dai, Q., Li, J., Yi, Q., Fang, F., Zhang, G.: Feedback network for mutually boosted stereo image super-resolution and disparity estimation. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 1985–1993 (2021)
Liu, A., Li, S., Chang, Y., Zhang, W., Hou, Y.: Coarse-to-fine cross-view interaction based accurate stereo image super-resolution network. IEEE Trans. Multimed. (2024). https://doi.org/10.1109/TMM.2024.3364492
Guo, X., Yang, K., Yang, W., Wang, X., Li, H.: Group-wise correlation stereo network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3273–3282 (2019)
Acknowledgements
This work was supported by National Natural Science Foundation of China under Grant No.U19B2037 and No.61901384, Natural Science Basic Research Program of Shaanxi Province (Program No.2021JCW-03, No.2023-JC-QN-0685).
Author information
Authors and Affiliations
Contributions
Jingcheng Zhang and Yu Zhu designed the algorithm and wrote the paper. Axi Niu and Qingsen Yan performed experiment. Shengjun Peng, Jinqiu Sun, and Yanning Zhang contributed to improve the experiment, and wrote the paper.
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethical and informed consent
Not applicable.
Additional information
Communicated by Bing-kun Bao.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, J., Zhu, Y., Peng, S. et al. A multi-scale feature cross-dimensional interaction network for stereo image super-resolution. Multimedia Systems 31, 114 (2025). https://doi.org/10.1007/s00530-025-01714-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00530-025-01714-8