Abstract
Text image super-resolution is a pre-processing of scene text recognition, which aims to improve the visual quality of text from low-resolution images. However, existing super-resolution (SR) models designed for general images have difficulty in recovering text from low-resolution images in real scenes. There are several reasons for this, including the fact that the models do not consider text-specific properties and that the background is not important for text images SR. In this paper, we propose a multi-task learning model for reconstruction and SR termed TRSRT using a transformer for text images. Compared to the super-resolution model, the reconstruction model is better at denoising and tends to have structural information about the text. Focusing on this point, the proposed method utilizes these properties of the reconstructed model to the SR model through the transformer. In addition, we attempt to acquire a text-specific model by training with three loss functions including feature-driven loss using a text recognizer. Experimental results on TextZoom show that the proposed method achieves performance comparable to state-of-the-art methods and prove the advantages of multi-task learning.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Qadri, M.T., Asif, M.: Automatic number plate recognition system for vehicle identification using optical character recognition. In: 2009 International Conference on Education Technology and Computer, pp. 335–338 (2009). https://doi.org/10.1109/ICETC.2009.54
Tian, Z., Huang, W., He, T., He, P., Qiao, Yu.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_4
Wang, Z., et al.: CAMP: cross-modal adaptive message passing for text-image retrieval. In: ICCV, pp. 5763–5772 (2019). https://doi.org/10.1109/ICCV.2019.005
Dong, S., Zhu, X., Deng, Y., Loy, C.C., Qiao, Y.: Boosting optical character recognition: a super-resolution approach, arXiv preprint arXiv:1506.02211 (2015)
Tran, H.T.M., Ho-Phuoc, T.: Deep laplacian pyramid network for text images super-resolution. In: RIVF, pp. 1–6 (2019)
Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 184–199. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_13
Wang, W., et al.: Scene text image super-resolution in the wild. In: ECCV (2020)
Ledig, C., et al.: Photorealistic single image super-resolution using a generative adversarial network. In: CVPR, pp. 4681–4690 (2017)
Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167 (2008)
Liu, S., Johns, E., Davison, A.J.: End-to-end multi-task learning with attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1871–1880 (2019)
Kim, S., Hori, T., Watanabe, S.: Joint CTC-attention based end-to-end speech recognition using multi-task learning. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4835–4839. IEEE (2017)
Rad, M.S., et al.: Benefiting from multitask learning to improve single image super-resolution. Neurocomputing 398, 304–313 (2020). https://doi.org/10.1016/j.neucom.2019.07.107
Urazoe, K., Kuroki, N., Kato, Y., Ohtani, S., Hirose, T., Numa, M.: Multi-category image super-resolution with convolutional neural network and multi-task learning. IEICE Trans. Inf. Syst. E104.D(1), 183–193: Released January 01, 2021, Online ISSN 1745–1361. Print ISSN 0916–8532 (2021). https://doi.org/10.1587/transinf.2020EDP7054
Feng, C.-M., Yan, Y., Fu, H., Chen, L., Xu, Y.: Task transformer network for joint MRI reconstruction and super-resolution. In: de Bruijne, M., Cattin, P.C., Cotin, S., Padoy, N., Speidel, S., Zheng, Y., Essert, C. (eds.) MICCAI 2021. LNCS, vol. 12906, pp. 307–317. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87231-1_30
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: 2011 International Conference on Computer Vision, pp. 1457–1464. IEEE (2011)
He, P., Huang, W., Qiao, Y., Loy, C.C., Tang, X.: Reading scene text in deep convolutional sequences. arXiv preprint arXiv:1506.04395 (2015)
Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 512–528. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_34
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)
Graves, A., Fernandez, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006)
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2018)
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: NeurIPS, pp. 2017–2025 (2015)
Li, H., Wang, P., Shen, C., Zhang, G.: Show, attend and read: a simple and strong baseline for irregular text recognition. In: AAAI, vol. 33, pp. 8610–8617 (2019)
Yang, L., Wang, P., Li, H., Li, Z., Zhang, Y.: A holistic representation guided attention network for scene text recognition. Neurocomputing 414, 67–75 (2020)
Chen, J., Li, B., Xue, X.: Scene text telescope: text-focused scene image super-resolution. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021, pp. 12021–12030 (2021). https://doi.org/10.1109/CVPR46437.2021.01185
Fang, C., Zhu, Y., Liao, L., Ling, X.: TSRGAN: real-world text image super-resolution based on adversarial learning and triplet attention. Neurocomputing 455, 88–96 (2021). https://doi.org/10.1016/j.neucom.2021.05.060. ISSN 0925–2312
Lim, B., Son, S., Kim, H., Nah, S., Lee, K.M.: Enhanced deep residual networks for single image super-resolution. In: CVPR (2017)
Sun, J., Sun, J., Xu, Z., Shum, H.: Gradient profile prior and its applications in image super-resolution and enhancement. In: TIP (2011)
Wang, B., Lu, T., Zhang, Y.: Feature-driven super-resolution for object detection. In: 2020 5th International Conference on Control, Robotics and Cybernetics (CRC), pp. 211–215 (2020). https://doi.org/10.1109/CRC51253.2020.9253468
Luo, C., Jin, L., Sun, Z., Moran: a multi-object rectified attention network for scene text recognition. Pattern Recogn., 109–118 (2019)
Lai, W., Huang, J., Ahuja, N., Yang, M.: Deep laplacian pyramid networks for fast and accurate super-resolution. In: CVPR (2017)
Acknowledgements
This study is supported by JSPS/JAPAN KAKENHI (Grants-in-Aid for Scientific Research) #JP20K11955.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Honda, K., Fujita, H., Kurematsu, M. (2022). Improvement of Text Image Super-Resolution Benefiting Multi-task Learning. In: Fujita, H., Fournier-Viger, P., Ali, M., Wang, Y. (eds) Advances and Trends in Artificial Intelligence. Theory and Practices in Artificial Intelligence. IEA/AIE 2022. Lecture Notes in Computer Science(), vol 13343. Springer, Cham. https://doi.org/10.1007/978-3-031-08530-7_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-08530-7_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08529-1
Online ISBN: 978-3-031-08530-7
eBook Packages: Computer ScienceComputer Science (R0)