research-article

STIRER: A Unified Model for Low-Resolution Scene Text Image Recovery and Recognition

Authors:

Minyi Zhao,

Shijie Xuyang,

Jihong Guan,

Shuigeng ZhouAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 7530 - 7539

https://doi.org/10.1145/3581783.3612488

Published: 27 October 2023 Publication History

Get Access

Abstract

Though scene text recognition (STR) from high-resolution (HR) images has achieved significant success in the past years, text recognition from low-resolution (LR) images is still a challenging task. This inspires the study on scene text image super-resolution (STISR) to generate super-resolution (SR) images based on the LR images, then STR is performed on the generated SR images, which eventually boosts the recognition performance. However, existing methods have two major drawbacks: 1) STISR models may generate imperfect SR images, which mislead the subsequent recognition. 2) As the STISR models are optimized for high recognition accuracy, the fidelity of SR images may be degraded. Consequently, neither the recognition performance of STR nor the fidelity of STISR is desirable. In this paper, a novel model called STIRER (the abbreviation of Scene Text Image REcovery and Recognition) is proposed to effectively and simultaneously recover and recognize LR scene text images under a unified framework. Concretely, STIRER consists of a feature encoder to obtain pixel features and two dedicated decoders to generate SR images and recognize texts respectively based on the encoded features and the raw LR images. We propose a progressive scene text swin transformer architecture as the encoder to enrich the representations of the pixel features for better recovery and recognition. Extensive experiments on two LR datasets show the superiority of our model to the existing methods on recognition performance, super-resolution fidelity and computational cost. The STIRER Code is available in https://github.com/zhaominyiz/STIRER.

Supplemental Material

MP4 File

Presentation video for STIRER: A Unified Model for Low-Resolution Scene Text Image Recovery and Recognition

Download
30.44 MB

References

[1]

Rowel Atienza. 2021. Vision transformer for fast and efficient scene text recognition. In International Conference on Document Analysis and Recognition. Springer, 319--334.

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

Pixel Adapter: A Graph-Based Post-Processing Approach for Scene Text Image Super-Resolution

Scene Text Image Super-Resolution in the Wild

Perceiving Multiple Representations for scene text image super-resolution guided by text recognizer

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations