Advancing scene text image super-resolution via edge enhancement priors

Li, Hongjun; Li, Shangfeng

doi:10.1007/s11760-024-03467-9

Advancing scene text image super-resolution via edge enhancement priors

Original Paper
Published: 07 August 2024

Volume 18, pages 8241–8250, (2024)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

237 Accesses
Explore all metrics

Abstract

Scene text image super-resolution aims to simultaneously enhance the readability and resolution of low-resolution text images. Despite significant progress in this field, the issue of text and background image blending remains unresolved. Existing methods that utilize pre-trained text recognizers to guide reconstruction through text priors often overlook contextual semantics and are susceptible to interference from redundant information during the integration with text image features, leading to misguidance in text reconstruction. To address these challenges, we propose a network based on edge enhancement priors (EEP). EEP initially introduces the Canny operator and employs a pixel attention module to obtain edge-enhanced feature maps, thereby avoiding the problem of text-background blending. The edge features further enhance the text priors, aiding in the enhancement of contextual semantic information. Subsequently, we propose a novel sequence reconstruction module based on edge-enhanced priors, which reduces the impact of redundant information on the image reconstruction process and achieves superior super-resolution effects. Extensive experiments demonstrate that our EEP model can achieve remarkable performance compared to other state-of-the-art deep learning methods, with a 0.6% improvement in detection accuracy using the ASTER recognizer and a PSNR increase of 0.53 dB on the TextZoom dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

More and Less: Enhancing Abundance and Refining Redundancy for Text-Prior-Guided Scene Text Image Super-Resolution

Soft-edge-guided significant coordinate attention network for scene text image super-resolution

Article 08 October 2023

Semantic and Gradient Guided Scene Text Image Super-Resolution

Data availability

No datasets were generated or analysed during the current study.

References

Bautista, D., Atienza, R.: Scene text recognition with permuted autoregressive sequence models. In: European Conference on Computer Vision. pp. 178–196. Springer (2022)
Guan, T., Shen, W., Yang, X., Feng, Q., Jiang, Z., Yang, X.: Self-supervised character-to-character distillation for text recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 19473–19484 (2023)
Li, M., Lv, T., Chen, J., Cui, L., Lu, Y., Florencio, D., Zhang, C., Li, Z., Wei, F.: Trocr: transformer-based optical character recognition with pre-trained models. Assoc. Adv. Artific. Intell. Conferen. Artific. Intell. 37, 13094–13102 (2023)
Google Scholar
Luan, Y., Eisenstein, J., Toutanova, K., Collins, M.: Sparse, dense, and attentional representations for text retrieval. Trans. Assoc. Comput. Ling. 9, 329–345 (2021)
Google Scholar
Alberti, C., Ling, J., Collins, M., Reitter, D.: Fusion of detected objects in text for visual question answering. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing. pp. 2131–2140 (2019)
Biten, A.F., Tito, R., Mafla, A., Gomez, L., Rusinol, M., Valveny, E., Jawahar, C., Karatzas, D.: Scene text visual question answering. In: Proceedings of IEEE international conference on computer vision. pp. 4291–4301 (2019)
Dong, C., Zhu, X., Deng, Y., Loy, C.C., Qiao, Y.: Boosting optical character recognition: a super-resolution approach. arXiv preprint (2015)
Tran, H.T., Ho-Phuoc, T.: Deep laplacian pyramid network for text images super-resolution. In: International conference on computing and communication technologies. pp. 1–6 (2019)
Wang, W., Xie, E., Liu, X., Wang, W., Liang, D., Shen, C., Bai, X.: scene text image super-resolution in the wild. In: Proceedings of European conference on computer vision. pp. 650–666 (2020)
Wang, W., Xie, E., Sun, P., Wang, W., Tian, L., Shen, C., Luo, P.: TextSR: Content-aware text super-resolution guided by recognition. arXiv preprint (2019)
Zhao, C., Feng, S., Zhao, B.N., Ding, Z., Wu, J., Shen, F., Shen, H.T.: Scene text image super-resolution via parallelly contextual attention network. In: Proceedings of the 29th ACM international conference on multimedia (MM '21). Association for Computing Machinery, New York, NY, USA, 2908–2917 (2021)
Peyrard, C, Baccouche, M, Mamalet, F, Garcia, C: Competition on text image super-resolution. In: Int. Conf. Doc. Anal. Recog., 1201–1205 (2015)
Xu, X, Sun, D, Pan, J, Zhang, Y, Pfister, H, Yang, M: Learning to superresolve blurry face and text images. In: Int. Conf. Comput. Vis., 251–260, (2017)
Yi, W., Dong, L., Liu, M., Hui, M., Kong, L., Zhao, Y.: SID-Net: single image dehazing network using adversarial and contrastive learning. Multimed. Tools Appl. 83, 71619–71638 (2024)
Article Google Scholar
Yi, W., Dong, L., Liu, M., Hui, M., Kong, L., Zhao, Y.: Towards compact single image dehazing via task-related contrastive network. Exp. Syst. Appl. 235, 121130 (2024)
Article Google Scholar
Yi, W., Dong, L., Liu, M., Hui, M., Kong, L., Zhao, Y.: Priors-assisted dehazing network with attention supervision and detail preservation. Exp. Syst. Appl. 173, 106165 (2024)
Google Scholar
Liu, W., Zhao, Y., Liu, M., Yi, W., Dong, L., Hui, M.: Triple-adjacent-frame generative network for blind video motion deblurring. Neurocomputing 376, 153–165 (2020)
Article Google Scholar
Yi, W., Dong, L., Liu, M., Zhao, Y., Hui, M., Kong, L.: Gated residual feature attention network for real-time Dehazing. Appl. Intell. 52(17449), 17464 (2022)
Google Scholar
Yi, W., Dong, L., Liu, M., Zhao, Y., Hui, M., Kong, L.: DCNet: dual-cascade network for single image dehazing. Neural. Comput. Applic. 34, 16771–16783 (2022)
Article Google Scholar
Yi, W., Dong, L., Liu, M., Hui, M., Kong, L., Zhao, Y.: Frequency-guidance Collaborative Triple-branch Network for single image dehazing. Displays 80, 102577 (2023)
Article Google Scholar
Ma, J., Liang, Z., Zhang, L.: A text attention network for spatial deformation robust scene text image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 5911–5920 (2022)
Ma, J., Guo, S., Zhang, L.: Text prior guided scene text image super-resolution. IEEE Trans. Image Process. 32, 1341–1353 (2023)
Article Google Scholar
Guo, H., Dai, T., Meng, G., Xia, S.T.: Towards robust scene text image super-resolution via explicit location enhancement. In: Proceedings of the Thirty-second international joint conference on artificial intelligence. 8. pp. 782–790. (2023)
Zhao, M., Wang, M., Bai, F., Li, B., Wang, J., Zhou, S.: C3-stisr: Scene text image super-resolution with triple clues. In: Proceedings of the thirty-second international joint conference on artificial intelligence. pp. 1707–1713 (2022)
Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6, 679–698 (1986)
Article Google Scholar
Ledig, C., Theis, L., Husz´ ar, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., Shi, W: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4681–4690 (2017)
Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, L.A.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE trans. pattern anal. Mach. Intelli. 404, 834–848 (2017)
Google Scholar
Zhu, S., Zhao, Z., Fang, P., Xue, H.: Improving scene text image super-resolution via dual prior modulation network. In: Proceedings of the association for the advancement of artificial intelligence conference on artificial intelligence (2023)
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)
Article Google Scholar
Li, H, Wang, P, Shen, C, Zhang, G.: Show, attend and read: a simple and strong baseline for irregular text recognition. In: Proceedings of the association for the advancement of artificial intelligence conference on artificial intelligence. Vol. 33. No. 01. (2019)
Huang, M, Liu, Y, Peng, Z, Liu, C, Lin, D, Zhu, S, Yuan, N, Ding, K, Jin, L.: Swintextspotter: Scene text spotting via better synergy between text detection and text recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (2022)
Aberdam, A, Bensaid, D, Golts, A, Nuriel, O, Tichauer, R, Mazor, S, Litman, R.: Clipter: Looking at the bigger picture in scene text recognition. In: Proceedings of the IEEE/CVF international conference on computer vision. (2023)
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: ASTER: An attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2018)
Article Google Scholar
Luo, C., Jin, L., Sun, Z.: MORAN: a multi-object rectified attention network for scene text recognition. Pattern Recogn. 90, 109–118 (2019)
Article Google Scholar
Chen, J., Li, B., Xue, X.: Scene text telescope: text-focused scene image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 12026–12035 (2021)
Chen, J., Yu, H., Ma, J., Li, B., Xue, X.: Text gestalt: stroke-aware scene text image super-resolution. In: Proceedings of the association for the advancement of artificial intelligence conference on artificial intelligence pp. 285–293 (2022)
Liu, B., Yang, Z., Wang, P., Zhou, J., Liu, Z., Song, Z., Liu, Y., Xiong, Y.: Textdiff: Mask-guided residual diffusion models for scene text image super-resolution. Preprint arXiv:2308.06743 (2023)
Shi, Q., Zhu, Y., Liu, Y., Ye, J., Yang, D.: Perceiving multiple representations for scene text image super-resolution guided by text recognizer. Eng. Appl. Artif. Intell. 124, 106551 (2023)
Article Google Scholar
TomyEnrique, L., Du, X., Liu, K., Yuan, H., Zhou, Z., Jin, C.:Efficient scene text image super-resolution with semantic guidance, In: Proceedings of IEEE international conference on acoustics, speech and signal processing. pp. 3160–3164 (2024)

Download references

Funding

This work is supported in part by National Natural Science Foundation of China under Grant 62371261, in part by Nantong Science and Technology Program JC2023076, and in part by Postgraduate Research and Practice Innovation Program of Jiangsu Province KYCX24_3643.

Author information

Authors and Affiliations

School of Information Science and Technology, Nantong University, Nantong, 226019, China
Hongjun Li & Shangfeng Li

Authors

Hongjun Li
View author publications
You can also search for this author inPubMed Google Scholar
Shangfeng Li
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Hongjun Li and Shangfeng Li. The first draft of the manuscript was written by Shangfeng Li and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Hongjun Li.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, H., Li, S. Advancing scene text image super-resolution via edge enhancement priors. SIViP 18, 8241–8250 (2024). https://doi.org/10.1007/s11760-024-03467-9

Download citation

Received: 07 July 2024
Revised: 18 July 2024
Accepted: 26 July 2024
Published: 07 August 2024
Issue Date: November 2024
DOI: https://doi.org/10.1007/s11760-024-03467-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Advancing scene text image super-resolution via edge enhancement priors

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

More and Less: Enhancing Abundance and Refining Redundancy for Text-Prior-Guided Scene Text Image Super-Resolution

Soft-edge-guided significant coordinate attention network for scene text image super-resolution

Semantic and Gradient Guided Scene Text Image Super-Resolution

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now