Skip to main content
Log in

Rt-swinir: an improved digital wallchart image super-resolution with attention-based learned text loss

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

In recent years, image super-resolution (SR) has made remarkable progress in areas such as natural images or text images. However, in the field of digital wallchart image super-resolution, existing methods have failed to preserve the finer details of text regions while restoring graphics. To address this challenge, we present a new model called Real Text-SwinIR (RT-SwinIR), which employs a novel plug-and-play Attention-based Learned Text Loss (LTL) technique to enhance the architecture’s ability to render clear text structure while preserving the clarity of graphics. To evaluate the effectiveness of our method, we have collected a dataset of digital wallcharts and subjected them to a two-order degradation process that simulates real-world damage, including creases and stains on wallcharts, as well as noise and blurriness caused by compression during computer network transmission. On the proposed dataset, RT-SwinIR achieves the best 0.58 on Learned Text Loss and 0.11 on LPIPS, reduced by an average of 41.4% and 35.3%, respectively. Experiments have shown that our method outperforms prior works in digital wallchart image super-resolution, indicating its superior visual perceptual performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data Availability

The partial datasets analyzed during the current study period are valuable flip charts of our schools that are not publicly available for the time being, but can be obtained from corresponding authors upon reasonable request.

Other datasets analyzed during the current study are available in the Flip chart library of popular science resources.

References

  1. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)

    Article  MathSciNet  Google Scholar 

  2. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Adv. Neural. Inf. Process. Syst. 33, 6840–6851 (2020)

    Google Scholar 

  3. Wang, X., Yu, K., Wu, S., Gu, J., Liu, Y., Dong, C., Qiao, Y., Change Loy, C.: “Esrgan: Enhanced super-resolution generative adversarial networks,” In: Proceedings of the European conference on computer vision (ECCV) workshops, pp. 0–0. (2018)

  4. Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: Swinir: Image restoration using swin transformer. In: Proceedings of the IEEE/CVF international conference on computer vision., pp. 1833–1844. (2021)

  5. Wang, X., Xie, L., Dong, C., Shan, Y.: Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1905–1914. (2021)

  6. Wang, W., Xie, E., Liu, X., Wang, W., Liang, D., Shen, C., Bai, X.: Scene text image super-resolution in the wild. In: Computer Vision-ECCV,: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part X 16. Springer 2020, 650–666 (2020)

  7. Chen, J., Yu, H., Ma, J., Li, B., Xue, X.: Text gestalt: stroke-aware scene text image super-resolution. In: Proceedings of the AAAI Conference on Artificial Intelligence 36(1), 285–293 (2022)

  8. Ma, J., Guo, S., Zhang, L.: Text prior guided scene text image super-resolution. IEEE Trans. Image Process. 32, 1341–1353 (2023)

    Article  Google Scholar 

  9. Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, (2019), pp. 9365–9374

  10. Gu, S., Sang, N., Ma, F.: Fast image super resolution via local regression. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012). IEEE, (2012), pp. 3128–3131

  11. Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2015)

    Article  Google Scholar 

  12. Zhang, K., Zuo, W., Chen, Y., Meng, D., Zhang, L.: Beyond a gaussian denoiser: residual learning of deep CNN for image denoising. IEEE Trans. Image Process. 26(7), 3142–3155 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  13. Yu, K., Dong, C., Loy, C.C., Tang, X.: “Deep convolution networks for compression artifacts reduction,” arXiv preprint arXiv:1608.02778, (2016)

  14. Cavigelli, L., Hager, P., Benini, L.: Cas-CNN: a deep convolutional neural network for image compression artifact suppression. In: 2017 International Joint Conference on Neural Networks (IJCNN). IEEE, pp. 752–759. (2017)

  15. Zhang, K., Li, Y., Zuo, W., Zhang, L., Van Gool, L., Timofte, R.: Plug-and-play image restoration with deep denoiser prior. IEEE Trans. Pattern Anal. Mach. Intell. 44(10), 6360–6376 (2021)

    Article  Google Scholar 

  16. Wang, L., Wang, Y., Dong, X., Xu, Q., Yang, J., An, W., Guo, Y.: Unsupervised degradation representation learning for blind super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10581–10590. (2021)

  17. Guo, Y., Chen, J., Wang, J., Chen, Q., Cao, J., Deng, Z., Xu, Y., Tan, M.: Closed-loop matters: Dual regression networks for single image super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5407–5416 (2020)

  18. Dai, T., Cai, J., Zhang, Y., Xia, S.-T., Zhang, L.: Second-order attention network for single image super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11065–11074. (2019)

  19. Mei, Y., Fan, Y., Zhou, Y.: Image super-resolution with non-local sparse attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3517–3526. (2021)

  20. Zhou, S., Zhang, J., Zuo, W., Loy, C.C.: Cross-scale internal graph neural network for image super-resolution. Adv. Neural. Inf. Process. Syst. 33, 3499–3509 (2020)

    Google Scholar 

  21. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst., vol. 30, (2017)

  22. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012–10022. (2021)

  23. Du, Y., Li, C., Guo, R., Yin, X., Liu, W., Zhou, J., Bai, Y., Yu, Z., Yang, Y., Dang, Q. et al.: “Pp-ocr: A practical ultra lightweight ocr system,” arXiv preprint arXiv:2009.09941, (2020)

  24. Bell-Kligler, S., Shocher, A., Irani, M.: Blind super-resolution kernel estimation using an internal-gan. Adv. Neural Inf. Proces. Syst., vol. 32, (2019)

  25. Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z. et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681–4690. (2017)

  26. Gu, J., Lu, H., Zuo, W., Dong, C.: Blind super-resolution with iterative kernel correction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2019), pp. 1604–1613

  27. Huang, Y., Li, S., Wang, L., Tan, T., et al.: Unfolding the alternating optimization for blind super resolution. Adv. Neural. Inf. Process. Syst. 33, 5632–5643 (2020)

    Google Scholar 

  28. Ji, X., Cao, Y., Tai, Y., Wang, C., Li, J., Huang, F.: Real-world super-resolution via kernel estimation and noise injection. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp. 466–467. (2020)

  29. Liu, C., Sun, D.: On Bayesian adaptive video super resolution. IEEE Trans. Pattern Anal. Mach. Intell. 36(2), 346–360 (2013)

    Article  Google Scholar 

  30. Yan, Y., Liu, C., Chen, C., Sun, X., Jin, L., Peng, X., Zhou, X.: Fine-grained attention and feature-sharing generative adversarial networks for single image super-resolution. IEEE Trans. Multimed. 24, 1473–1487 (2021)

    Article  Google Scholar 

  31. Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings of 8th Int’l Conference Computer Vision, vol. 2, July, pp. 416–423. (2001)

  32. Dong, C., Loy, C.C., Tang, X.: Accelerating the super-resolution convolutional neural network. In: Computer Vision-ECCV,: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part II 14. Springer 2016, 391–407 (2016)

  33. Agustsson, E., Timofte, R.: Ntire 2017 challenge on single image super-resolution: Dataset and study. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July (2017)

  34. Hore, A., Ziou, D.: Image quality metrics: Psnr vs. ssim. In: 20th international conference on pattern recognition. IEEE 2010, 2366–2369 (2010)

  35. Singh, A., Pang, G., Toh, M., Huang, J., Galuba, W., Hassner, T.: TextOCR: towards large-scale end-to-end reasoning for arbitrary-shaped scene text. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 2021 (pp. 8802-8812). (2021)

  36. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, (2014)

  37. Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Computer Vision-ECCV,: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part II 14. Springer 2016, 694–711 (2016)

  38. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 586–595. (2018)

  39. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  40. Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: International conference on computer vision. IEEE 2011, 1457–1464 (2011)

  41. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778. (2016)

  42. Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert, D., Wang, Z.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1874–1883. (2016)

  43. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S. et al.: “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, (2020)

  44. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, (2018)

  45. He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 16000–16009. (2022)

  46. Lee, J., Park, S., Baek, J., Oh, S.J., Kim, S., Lee, H.: On recognizing texts of arbitrary shapes with 2d self-attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 546–547. (2020)

  47. Polesel, A., Ramponi, G., Mathews, V.J.: Image enhancement via adaptive unsharp masking. IEEE Trans. Image Process. 9(3), 505–510 (2000)

    Article  Google Scholar 

  48. Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1125–1134. (2017)

  49. Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: “Spectral normalization for generative adversarial networks,” arXiv preprint arXiv:1802.05957, (2018)

  50. Wang, T.-C., Liu, M.-Y., Zhu, J.-Y.,Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 8798–8807. (2018)

  51. Zhu, Z., Huang, T., Shi, B., Yu, M., Wang, B., Bai, X.: Progressive pose attention transfer for person image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 2347–2356. (2019)

  52. Loshchilov, I., Hutter, F.: “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, (2017)

  53. Ward, C.M., Harguess, J., Crabb, B., Parameswaran, S.: Image quality assessment for determining efficacy and limitations of super-resolution convolutional neural network (SRCNN). In: Applications of Digital Image Processing XL, vol. 10396. SPIE, pp. 19–30. (2017)

  54. Zhang, K., Liang, J., Van Gool, L., Timofte, R.: Designing a practical degradation model for deep blind image super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 4791–4800. (2021)

  55. Lee, J., Jin, K.H.: Local texture estimator for implicit representation function. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1929–1938. (2022)

  56. Chen, X., Wang, X., Zhou, J., Dong, C.: “Activating more pixels in image super-resolution transformer. arxiv (2022),” arXiv preprint arXiv:2205.04437

  57. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  Google Scholar 

  58. Blau, Y., Mechrez, R., Timofte, R., Michaeli, T., Zelnik-Manor, L.: The 2018 PIRM challenge on perceptual image super-resolution. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops. pp. 0–0. (2018)

  59. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in neural information processing systems, vol. 30, (2017)

  60. Saharia, C., Ho, J., Chan, W., Salimans, T., Fleet, D.J., Norouzi, M.: Image super-resolution via iterative refinement. IEEE Trans. Pattern Anal. Mach. Intell. 45(4), 4713–4726 (2022)

    MATH  Google Scholar 

Download references

Acknowledgements

This paper is supported by National College Student Innovation and Entrepreneurship Training Program under the project number 202210712194. Special thanks go to Yao Zhou.

Funding

The research leading to these results received funding from National College Student Innovation and Entrepreneurship Training Program under Grant Agreement number 202210712194.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to MeiLi Wang.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xue, F., Zhou, M., Zhang, C. et al. Rt-swinir: an improved digital wallchart image super-resolution with attention-based learned text loss. Vis Comput 39, 3467–3479 (2023). https://doi.org/10.1007/s00371-023-03017-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-03017-3

Keywords

Navigation