Skip to main content

Advertisement

Log in

Advancing scene text image super-resolution via edge enhancement priors

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Scene text image super-resolution aims to simultaneously enhance the readability and resolution of low-resolution text images. Despite significant progress in this field, the issue of text and background image blending remains unresolved. Existing methods that utilize pre-trained text recognizers to guide reconstruction through text priors often overlook contextual semantics and are susceptible to interference from redundant information during the integration with text image features, leading to misguidance in text reconstruction. To address these challenges, we propose a network based on edge enhancement priors (EEP). EEP initially introduces the Canny operator and employs a pixel attention module to obtain edge-enhanced feature maps, thereby avoiding the problem of text-background blending. The edge features further enhance the text priors, aiding in the enhancement of contextual semantic information. Subsequently, we propose a novel sequence reconstruction module based on edge-enhanced priors, which reduces the impact of redundant information on the image reconstruction process and achieves superior super-resolution effects. Extensive experiments demonstrate that our EEP model can achieve remarkable performance compared to other state-of-the-art deep learning methods, with a 0.6% improvement in detection accuracy using the ASTER recognizer and a PSNR increase of 0.53 dB on the TextZoom dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

No datasets were generated or analysed during the current study.

References

  1. Bautista, D., Atienza, R.: Scene text recognition with permuted autoregressive sequence models. In: European Conference on Computer Vision. pp. 178–196. Springer (2022)

  2. Guan, T., Shen, W., Yang, X., Feng, Q., Jiang, Z., Yang, X.: Self-supervised character-to-character distillation for text recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 19473–19484 (2023)

  3. Li, M., Lv, T., Chen, J., Cui, L., Lu, Y., Florencio, D., Zhang, C., Li, Z., Wei, F.: Trocr: transformer-based optical character recognition with pre-trained models. Assoc. Adv. Artific. Intell. Conferen. Artific. Intell. 37, 13094–13102 (2023)

    Google Scholar 

  4. Luan, Y., Eisenstein, J., Toutanova, K., Collins, M.: Sparse, dense, and attentional representations for text retrieval. Trans. Assoc. Comput. Ling. 9, 329–345 (2021)

    Google Scholar 

  5. Alberti, C., Ling, J., Collins, M., Reitter, D.: Fusion of detected objects in text for visual question answering. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing. pp. 2131–2140 (2019)

  6. Biten, A.F., Tito, R., Mafla, A., Gomez, L., Rusinol, M., Valveny, E., Jawahar, C., Karatzas, D.: Scene text visual question answering. In: Proceedings of IEEE international conference on computer vision. pp. 4291–4301 (2019)

  7. Dong, C., Zhu, X., Deng, Y., Loy, C.C., Qiao, Y.: Boosting optical character recognition: a super-resolution approach. arXiv preprint (2015)

  8. Tran, H.T., Ho-Phuoc, T.: Deep laplacian pyramid network for text images super-resolution. In: International conference on computing and communication technologies. pp. 1–6 (2019)

  9. Wang, W., Xie, E., Liu, X., Wang, W., Liang, D., Shen, C., Bai, X.: scene text image super-resolution in the wild. In: Proceedings of European conference on computer vision. pp. 650–666 (2020)

  10. Wang, W., Xie, E., Sun, P., Wang, W., Tian, L., Shen, C., Luo, P.: TextSR: Content-aware text super-resolution guided by recognition. arXiv preprint (2019)

  11. Zhao, C., Feng, S., Zhao, B.N., Ding, Z., Wu, J., Shen, F., Shen, H.T.: Scene text image super-resolution via parallelly contextual attention network. In: Proceedings of the 29th ACM international conference on multimedia (MM '21). Association for Computing Machinery, New York, NY, USA, 2908–2917 (2021)

  12. Peyrard, C, Baccouche, M, Mamalet, F, Garcia, C: Competition on text image super-resolution. In: Int. Conf. Doc. Anal. Recog., 1201–1205 (2015)

  13. Xu, X, Sun, D, Pan, J, Zhang, Y, Pfister, H, Yang, M: Learning to superresolve blurry face and text images. In: Int. Conf. Comput. Vis., 251–260, (2017)

  14. Yi, W., Dong, L., Liu, M., Hui, M., Kong, L., Zhao, Y.: SID-Net: single image dehazing network using adversarial and contrastive learning. Multimed. Tools Appl. 83, 71619–71638 (2024)

    Article  Google Scholar 

  15. Yi, W., Dong, L., Liu, M., Hui, M., Kong, L., Zhao, Y.: Towards compact single image dehazing via task-related contrastive network. Exp. Syst. Appl. 235, 121130 (2024)

    Article  Google Scholar 

  16. Yi, W., Dong, L., Liu, M., Hui, M., Kong, L., Zhao, Y.: Priors-assisted dehazing network with attention supervision and detail preservation. Exp. Syst. Appl. 173, 106165 (2024)

    Google Scholar 

  17. Liu, W., Zhao, Y., Liu, M., Yi, W., Dong, L., Hui, M.: Triple-adjacent-frame generative network for blind video motion deblurring. Neurocomputing 376, 153–165 (2020)

    Article  Google Scholar 

  18. Yi, W., Dong, L., Liu, M., Zhao, Y., Hui, M., Kong, L.: Gated residual feature attention network for real-time Dehazing. Appl. Intell. 52(17449), 17464 (2022)

    Google Scholar 

  19. Yi, W., Dong, L., Liu, M., Zhao, Y., Hui, M., Kong, L.: DCNet: dual-cascade network for single image dehazing. Neural. Comput. Applic. 34, 16771–16783 (2022)

    Article  Google Scholar 

  20. Yi, W., Dong, L., Liu, M., Hui, M., Kong, L., Zhao, Y.: Frequency-guidance Collaborative Triple-branch Network for single image dehazing. Displays 80, 102577 (2023)

    Article  Google Scholar 

  21. Ma, J., Liang, Z., Zhang, L.: A text attention network for spatial deformation robust scene text image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 5911–5920 (2022)

  22. Ma, J., Guo, S., Zhang, L.: Text prior guided scene text image super-resolution. IEEE Trans. Image Process. 32, 1341–1353 (2023)

    Article  Google Scholar 

  23. Guo, H., Dai, T., Meng, G., Xia, S.T.: Towards robust scene text image super-resolution via explicit location enhancement. In: Proceedings of the Thirty-second international joint conference on artificial intelligence. 8. pp. 782–790. (2023)

  24. Zhao, M., Wang, M., Bai, F., Li, B., Wang, J., Zhou, S.: C3-stisr: Scene text image super-resolution with triple clues. In: Proceedings of the thirty-second international joint conference on artificial intelligence. pp. 1707–1713 (2022)

  25. Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6, 679–698 (1986)

    Article  Google Scholar 

  26. Ledig, C., Theis, L., Husz´ ar, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., Shi, W: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4681–4690 (2017)

  27. Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, L.A.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE trans. pattern anal. Mach. Intelli. 404, 834–848 (2017)

    Google Scholar 

  28. Zhu, S., Zhao, Z., Fang, P., Xue, H.: Improving scene text image super-resolution via dual prior modulation network. In: Proceedings of the association for the advancement of artificial intelligence conference on artificial intelligence (2023)

  29. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)

    Article  Google Scholar 

  30. Li, H, Wang, P, Shen, C, Zhang, G.: Show, attend and read: a simple and strong baseline for irregular text recognition. In: Proceedings of the association for the advancement of artificial intelligence conference on artificial intelligence. Vol. 33. No. 01. (2019)

  31. Huang, M, Liu, Y, Peng, Z, Liu, C, Lin, D, Zhu, S, Yuan, N, Ding, K, Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (2022)

  32. Aberdam, A, Bensaid, D, Golts, A, Nuriel, O, Tichauer, R, Mazor, S, Litman, R.: Clipter: Looking at the bigger picture in scene text recognition. In: Proceedings of the IEEE/CVF international conference on computer vision. (2023)

  33. Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: ASTER: An attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2018)

    Article  Google Scholar 

  34. Luo, C., Jin, L., Sun, Z.: MORAN: a multi-object rectified attention network for scene text recognition. Pattern Recogn. 90, 109–118 (2019)

    Article  Google Scholar 

  35. Chen, J., Li, B., Xue, X.: Scene text telescope: text-focused scene image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 12026–12035 (2021)

  36. Chen, J., Yu, H., Ma, J., Li, B., Xue, X.: Text gestalt: stroke-aware scene text image super-resolution. In: Proceedings of the association for the advancement of artificial intelligence conference on artificial intelligence pp. 285–293 (2022)

  37. Liu, B., Yang, Z., Wang, P., Zhou, J., Liu, Z., Song, Z., Liu, Y., Xiong, Y.: Textdiff: Mask-guided residual diffusion models for scene text image super-resolution. Preprint arXiv:2308.06743 (2023)

  38. Shi, Q., Zhu, Y., Liu, Y., Ye, J., Yang, D.: Perceiving multiple representations for scene text image super-resolution guided by text recognizer. Eng. Appl. Artif. Intell. 124, 106551 (2023)

    Article  Google Scholar 

  39. TomyEnrique, L., Du, X., Liu, K., Yuan, H., Zhou, Z., Jin, C.:Efficient scene text image super-resolution with semantic guidance, In: Proceedings of IEEE international conference on acoustics, speech and signal processing. pp. 3160–3164 (2024)

Download references

Funding

This work is supported in part by National Natural Science Foundation of China under Grant 62371261, in part by Nantong Science and Technology Program JC2023076, and in part by Postgraduate Research and Practice Innovation Program of Jiangsu Province KYCX24_3643.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Hongjun Li and Shangfeng Li. The first draft of the manuscript was written by Shangfeng Li and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Hongjun Li.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, H., Li, S. Advancing scene text image super-resolution via edge enhancement priors. SIViP 18, 8241–8250 (2024). https://doi.org/10.1007/s11760-024-03467-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-024-03467-9

Keywords