Self-supervised Attribute-Aware Refinement Network for Low-Quality Text Recognition

Lee, Younkwan; Yoo, Heongjun; Kim, Yechan; Jeong, Jihun; Jeon, Moongu

doi:10.1007/978-3-030-68238-5_17

Self-supervised Attribute-Aware Refinement Network for Low-Quality Text Recognition

Younkwan Lee¹⁰,
Heongjun Yoo¹⁰,
Yechan Kim¹⁰,
Jihun Jeong¹⁰ &
…
Moongu Jeon^10,11

Conference paper
First Online: 31 January 2021

1985 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12539))

Abstract

Scene texts collected from unconstrained environments encompass various types of degradation, including low-resolution, cluttered backgrounds, and irregular shapes. Training a model for text recognition with such types of degradations is notoriously hard. In this work, we analyze this problem in terms of two attributes: semantic and a geometric attribute, which are crucial cues for describing low-quality text. To handle this issue, we propose a new Self-supervised Attribute-Aware Refinement Network (SAAR-Net) that addresses these attributes simultaneously. Specifically, a novel text refining mechanism is combined with self-supervised learning for multiple auxiliary tasks to solve this problem. In addition, it can extract semantic and geometric attributes important to text recognition by introducing mutual information constraint that explicitly preserves invariant and discriminative information across different tasks. Such learned representation encourages our method to evidently generate a clear image, thus leading to better recognition performance. Extensive results demonstrate the effectiveness in refinement and recognition simultaneously.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Alemi, A.A., Fischer, I., Dillon, J.V., Murphy, K.: Deep variational information bottleneck. arXiv preprint arXiv:1612.00410 (2016)
Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Word spotting and recognition with embedded attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2552–2566 (2014)
Article Google Scholar
Bai, F., Cheng, Z., Niu, Y., Pu, S., Zhou, S.: Edit probability for scene text recognition. In: Proceedings of the IEEE Conference on CVPR, pp. 1508–1516 (2018)
Google Scholar
Bartz, C., Yang, H., Meinel, C.: See: towards semi-supervised end-to-end scene text recognition. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Belghazi, M.I., Baratin, A., Rajeswar, S., Ozair, S., Bengio, Y., Courville, A., Hjelm, R.D.: Mine: mutual information neural estimation. arXiv preprint arXiv:1801.04062 (2018)
Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: Photoocr: reading text in uncontrolled conditions. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 785–792 (2013)
Google Scholar
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: Infogan: interpretable representation learning by information maximizing generative adversarial nets. In: NIPS, pp. 2172–2180 (2016)
Google Scholar
Cheng, Z., Bai, F., Xu, Y., Zheng, G., Pu, S., Zhou, S.: Focusing attention: towards accurate text recognition in natural images. In: ICCV, pp. 5076–5084 (2017)
Google Scholar
Cheng, Z., Xu, Y., Bai, F., Niu, Y., Pu, S., Zhou, S.: Aon: towards arbitrarily-oriented text recognition. In: Proceedings of the IEEE Conference on CVPR, pp. 5571–5579 (2018)
Google Scholar
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on CVPR, pp. 248–255. Ieee (2009)
Google Scholar
Gonçalves, G.R., da Silva, S.P.G., Menotti, D., Schwartz, W.R.: Benchmark for license plate character segmentation. J. Electron. Imaging 25(5), 053034 (2016)
Article Google Scholar
Gordo, A.: Supervised mid-level features for word image representation. In: Proceedings of the IEEE conference on CVPR, pp. 2956–2964 (2015)
Google Scholar
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376. ACM (2006)
Google Scholar
Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31, 855–868 (2009)
Article Google Scholar
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on CVPR, pp. 2315–2324 (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on CVPR, pp. 770–778 (2016)
Google Scholar
Hjelm, R.D., et al.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Hradiš, M., Kotera, J., Zemcık, P., Šroubek, F.: Convolutional neural networks for direct text deblurring. In: Proceedings of BMVC, vol. 10, p. 2 (2015)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Deep structured output learning for unconstrained text recognition. arXiv preprint arXiv:1412.5903 (2014)
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227 (2014)
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vision 116(1), 1–20 (2016)
Article MathSciNet Google Scholar
Karatzas, D., et al.: Icdar 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)
Google Scholar
Karatzas, D., et al.: Icdar 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1484–1493. IEEE (2013)
Google Scholar
Kim, B., Kim, H., Kim, K., Kim, S., Kim, J.: Learning not to learn: training deep neural networks with biased data. In: Proceedings of the IEEE Conference on CVPR, pp. 9012–9020 (2019)
Google Scholar
Laroca, R., et al.: A robust real-time automatic license plate recognition based on the yolo detector. In: 2018 IJCNN. IEEE (2018)
Google Scholar
Lee, Y., Jeon, J., Yu, J., Jeon, M.: Context-aware multi-task learning for traffic scene recognition in autonomous vehicles. arXiv preprint arXiv:2004.01351 (2020)
Lee, Y., Jun, J., Hong, Y., Jeon, M.: Practical license plate recognition in unconstrained surveillance systems with adversarial super-resolution. arXiv preprint arXiv:1910.04324 (2019)
Lee, Y., Lee, J., Ahn, H., Jeon, M.: Snider: Single noisy image denoising and rectification for improving license plate recognition. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (2019)
Google Scholar
Lee, Y., Lee, J., Hong, Y., Ko, Y., Jeon, M.: Unconstrained road marking recognition with generative adversarial networks. In: 2019 IEEE Intelligent Vehicles Symposium (IV) pp. 1414–1419. IEEE (2019)
Google Scholar
Lee, Y., Yun, J., Hong, Y., Lee, J., Jeon, M.: Accurate license plate recognition and super-resolution using a generative adversarial networks on traffic surveillance video. In: 2018 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), pp. 1–4. IEEE (2018)
Google Scholar
Liao, M., Zhang, J., Wan, Z., Xie, F., Liang, J., Lyu, P., Yao, C., Bai, X.: Scene text recognition from two-dimensional perspective. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol.33, pp. 8714–8721 (2019)
Google Scholar
Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: The IEEE Conference on CVPR, July 2017
Google Scholar
Liu, W., Chen, C., Wong, K.Y.K.: Char-net: a character-aware neural network for distorted scene text recognition. In: AAAI (2018)
Google Scholar
Liu, W., Chen, C., Wong, K.Y.K., Su, Z., Han, J.: Star-net: a spatial attention residue network for scene text recognition. In: BMVC, vol. 2, p. 7 (2016)
Google Scholar
Liu, Y., Wang, Z., Jin, H., Wassell, I.: Synthetically supervised feature learning for scene text recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 449–465. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_27
Chapter Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR
Google Scholar
Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C.: TextSnake: a flexible representation for detecting text of arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 19–35. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_2
Chapter Google Scholar
Luo, C., Jin, L., Sun, Z.: Moran: a multi-object rectified attention network for scene text recognition. Pattern Recogn. 90, 109–118 (2019)
Article Google Scholar
Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: EMNLP (2015)
Google Scholar
Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 71–88. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_5
Chapter Google Scholar
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of icml, vol. 30, p. 3 (2013)
Google Scholar
Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: CVPR, pp. 3538–3545 (2012)
Google Scholar
Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)
Quy Phan, T., Shivakumara, P., Tian, S., Lim Tan, C.: Recognizing text with perspective distortion in natural scenes. In: ICCV, pp. 569–576 (2013)
Google Scholar
Risnumawan, A., Shivakumara, P., Chan, C.S., Tan, C.L.: A robust arbitrary text detection system for natural scene images. Expert Syst. Appl. 41(18), 8027–8048 (2014)
Article Google Scholar
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)
Article Google Scholar
Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE Conference on CVPR, pp. 4168–4176 (2016)
Google Scholar
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 1 (2018)
Google Scholar
Shi, Y., Sha, F.: Information-theoretical learning of discriminative clusters for unsupervised domain adaptation. arXiv preprint arXiv:1206.6438 (2012)
Silva, S.M., Jung, C.R.: License plate detection and recognition in unconstrained scenarios. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11216, pp. 593–609. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01258-8_36
Chapter Google Scholar
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: ICCV, pp. 1457–1464. IEEE (2011)
Google Scholar
Wang, K., Belongie, S.: Word spotting in the wild. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 591–604. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15549-9_43
Chapter Google Scholar
Wang, T., Wu, D.J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), pp. 3304–3308. IEEE (2012)
Google Scholar
Xue, C., Lu, S., Zhan, F.: Accurate scene text detection through border semantics awareness and bootstrapping. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 370–387. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_22
Chapter Google Scholar
Yang, M., et al.: Symmetry-constrained rectification network for scene text recognition. In: ICCV, pp. 9147–9156 (2019)
Google Scholar
Yang, X., He, D., Zhou, Z., Kifer, D., Giles, C.L.: Learning to read irregular text with attention mechanisms. In: IJCAI, vol. 1, p. 3 (2017)
Google Scholar
Yao, C., Bai, X., Shi, B., Liu, W.: Strokelets: a learned multi-scale representation for scene text recognition. In: Proceedings of the IEEE Conference on CVPR, pp. 4042–4049 (2014)
Google Scholar
Yin, F., Wu, Y.C., Zhang, X.Y., Liu, C.L.: Scene text recognition with sliding convolutional character models. arXiv preprint arXiv:1709.01727 (2017)
Zhan, F., Lu, S.: Esir: End-to-end scene text recognition via iterative image rectification. In: Proceedings of the IEEE Conference on CVPR, pp. 2059–2068 (2019)
Google Scholar
Zhan, F., Lu, S., Xue, C.: Verisimilar image synthesis for accurate detection and recognition of texts in scenes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 257–273. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_16
Chapter Google Scholar
Zhan, F., Xue, C., Lu, S.: Ga-dan: geometry-aware domain adaptation network for scene text detection and recognition. In: ICCV, pp. 9105–9115 (2019)
Google Scholar

Download references

Acknowledgements

This work was partly supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2014-3-00077, AI National Strategy Project) and Ministry of Culture, Sports and Tourism and Korea Creative Content Agency(Project Number: R2020070004).

Author information

Authors and Affiliations

Gwangju Institute of Science and Technology, Gwangju, South Korea
Younkwan Lee, Heongjun Yoo, Yechan Kim, Jihun Jeong & Moongu Jeon
Korea Culture Technology Institute, Gwangju, South Korea
Moongu Jeon

Authors

Younkwan Lee
View author publications
You can also search for this author in PubMed Google Scholar
Heongjun Yoo
View author publications
You can also search for this author in PubMed Google Scholar
Yechan Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jihun Jeong
View author publications
You can also search for this author in PubMed Google Scholar
Moongu Jeon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Younkwan Lee .

Editor information

Editors and Affiliations

University of Clermont Auvergne, Clermont Ferrand, France
Adrien Bartoli
Università degli Studi di Udine, Udine, Italy
Andrea Fusiello

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1384 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, Y., Yoo, H., Kim, Y., Jeong, J., Jeon, M. (2020). Self-supervised Attribute-Aware Refinement Network for Low-Quality Text Recognition. In: Bartoli, A., Fusiello, A. (eds) Computer Vision – ECCV 2020 Workshops. ECCV 2020. Lecture Notes in Computer Science(), vol 12539. Springer, Cham. https://doi.org/10.1007/978-3-030-68238-5_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-68238-5_17
Published: 31 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-68237-8
Online ISBN: 978-3-030-68238-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics