Skip to main content

Self-supervised Attribute-Aware Refinement Network for Low-Quality Text Recognition

  • Conference paper
  • First Online:
  • 1985 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12539))

Abstract

Scene texts collected from unconstrained environments encompass various types of degradation, including low-resolution, cluttered backgrounds, and irregular shapes. Training a model for text recognition with such types of degradations is notoriously hard. In this work, we analyze this problem in terms of two attributes: semantic and a geometric attribute, which are crucial cues for describing low-quality text. To handle this issue, we propose a new Self-supervised Attribute-Aware Refinement Network (SAAR-Net) that addresses these attributes simultaneously. Specifically, a novel text refining mechanism is combined with self-supervised learning for multiple auxiliary tasks to solve this problem. In addition, it can extract semantic and geometric attributes important to text recognition by introducing mutual information constraint that explicitly preserves invariant and discriminative information across different tasks. Such learned representation encourages our method to evidently generate a clear image, thus leading to better recognition performance. Extensive results demonstrate the effectiveness in refinement and recognition simultaneously.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Alemi, A.A., Fischer, I., Dillon, J.V., Murphy, K.: Deep variational information bottleneck. arXiv preprint arXiv:1612.00410 (2016)

  2. Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Word spotting and recognition with embedded attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2552–2566 (2014)

    Article  Google Scholar 

  3. Bai, F., Cheng, Z., Niu, Y., Pu, S., Zhou, S.: Edit probability for scene text recognition. In: Proceedings of the IEEE Conference on CVPR, pp. 1508–1516 (2018)

    Google Scholar 

  4. Bartz, C., Yang, H., Meinel, C.: See: towards semi-supervised end-to-end scene text recognition. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  5. Belghazi, M.I., Baratin, A., Rajeswar, S., Ozair, S., Bengio, Y., Courville, A., Hjelm, R.D.: Mine: mutual information neural estimation. arXiv preprint arXiv:1801.04062 (2018)

  6. Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: Photoocr: reading text in uncontrolled conditions. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 785–792 (2013)

    Google Scholar 

  7. Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: Infogan: interpretable representation learning by information maximizing generative adversarial nets. In: NIPS, pp. 2172–2180 (2016)

    Google Scholar 

  8. Cheng, Z., Bai, F., Xu, Y., Zheng, G., Pu, S., Zhou, S.: Focusing attention: towards accurate text recognition in natural images. In: ICCV, pp. 5076–5084 (2017)

    Google Scholar 

  9. Cheng, Z., Xu, Y., Bai, F., Niu, Y., Pu, S., Zhou, S.: Aon: towards arbitrarily-oriented text recognition. In: Proceedings of the IEEE Conference on CVPR, pp. 5571–5579 (2018)

    Google Scholar 

  10. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)

  11. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on CVPR, pp. 248–255. Ieee (2009)

    Google Scholar 

  12. Gonçalves, G.R., da Silva, S.P.G., Menotti, D., Schwartz, W.R.: Benchmark for license plate character segmentation. J. Electron. Imaging 25(5), 053034 (2016)

    Article  Google Scholar 

  13. Gordo, A.: Supervised mid-level features for word image representation. In: Proceedings of the IEEE conference on CVPR, pp. 2956–2964 (2015)

    Google Scholar 

  14. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376. ACM (2006)

    Google Scholar 

  15. Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31, 855–868 (2009)

    Article  Google Scholar 

  16. Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on CVPR, pp. 2315–2324 (2016)

    Google Scholar 

  17. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on CVPR, pp. 770–778 (2016)

    Google Scholar 

  18. Hjelm, R.D., et al.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018)

  19. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  20. Hradiš, M., Kotera, J., Zemcık, P., Šroubek, F.: Convolutional neural networks for direct text deblurring. In: Proceedings of BMVC, vol. 10, p. 2 (2015)

    Google Scholar 

  21. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)

  22. Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Deep structured output learning for unconstrained text recognition. arXiv preprint arXiv:1412.5903 (2014)

  23. Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227 (2014)

  24. Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vision 116(1), 1–20 (2016)

    Article  MathSciNet  Google Scholar 

  25. Karatzas, D., et al.: Icdar 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)

    Google Scholar 

  26. Karatzas, D., et al.: Icdar 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1484–1493. IEEE (2013)

    Google Scholar 

  27. Kim, B., Kim, H., Kim, K., Kim, S., Kim, J.: Learning not to learn: training deep neural networks with biased data. In: Proceedings of the IEEE Conference on CVPR, pp. 9012–9020 (2019)

    Google Scholar 

  28. Laroca, R., et al.: A robust real-time automatic license plate recognition based on the yolo detector. In: 2018 IJCNN. IEEE (2018)

    Google Scholar 

  29. Lee, Y., Jeon, J., Yu, J., Jeon, M.: Context-aware multi-task learning for traffic scene recognition in autonomous vehicles. arXiv preprint arXiv:2004.01351 (2020)

  30. Lee, Y., Jun, J., Hong, Y., Jeon, M.: Practical license plate recognition in unconstrained surveillance systems with adversarial super-resolution. arXiv preprint arXiv:1910.04324 (2019)

  31. Lee, Y., Lee, J., Ahn, H., Jeon, M.: Snider: Single noisy image denoising and rectification for improving license plate recognition. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (2019)

    Google Scholar 

  32. Lee, Y., Lee, J., Hong, Y., Ko, Y., Jeon, M.: Unconstrained road marking recognition with generative adversarial networks. In: 2019 IEEE Intelligent Vehicles Symposium (IV) pp. 1414–1419. IEEE (2019)

    Google Scholar 

  33. Lee, Y., Yun, J., Hong, Y., Lee, J., Jeon, M.: Accurate license plate recognition and super-resolution using a generative adversarial networks on traffic surveillance video. In: 2018 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), pp. 1–4. IEEE (2018)

    Google Scholar 

  34. Liao, M., Zhang, J., Wan, Z., Xie, F., Liang, J., Lyu, P., Yao, C., Bai, X.: Scene text recognition from two-dimensional perspective. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol.33, pp. 8714–8721 (2019)

    Google Scholar 

  35. Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: The IEEE Conference on CVPR, July 2017

    Google Scholar 

  36. Liu, W., Chen, C., Wong, K.Y.K.: Char-net: a character-aware neural network for distorted scene text recognition. In: AAAI (2018)

    Google Scholar 

  37. Liu, W., Chen, C., Wong, K.Y.K., Su, Z., Han, J.: Star-net: a spatial attention residue network for scene text recognition. In: BMVC, vol. 2, p. 7 (2016)

    Google Scholar 

  38. Liu, Y., Wang, Z., Jin, H., Wassell, I.: Synthetically supervised feature learning for scene text recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 449–465. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_27

    Chapter  Google Scholar 

  39. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR

    Google Scholar 

  40. Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C.: TextSnake: a flexible representation for detecting text of arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 19–35. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_2

    Chapter  Google Scholar 

  41. Luo, C., Jin, L., Sun, Z.: Moran: a multi-object rectified attention network for scene text recognition. Pattern Recogn. 90, 109–118 (2019)

    Article  Google Scholar 

  42. Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: EMNLP (2015)

    Google Scholar 

  43. Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 71–88. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_5

    Chapter  Google Scholar 

  44. Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of icml, vol. 30, p. 3 (2013)

    Google Scholar 

  45. Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: CVPR, pp. 3538–3545 (2012)

    Google Scholar 

  46. Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)

  47. Quy Phan, T., Shivakumara, P., Tian, S., Lim Tan, C.: Recognizing text with perspective distortion in natural scenes. In: ICCV, pp. 569–576 (2013)

    Google Scholar 

  48. Risnumawan, A., Shivakumara, P., Chan, C.S., Tan, C.L.: A robust arbitrary text detection system for natural scene images. Expert Syst. Appl. 41(18), 8027–8048 (2014)

    Article  Google Scholar 

  49. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)

    Article  Google Scholar 

  50. Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE Conference on CVPR, pp. 4168–4176 (2016)

    Google Scholar 

  51. Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 1 (2018)

    Google Scholar 

  52. Shi, Y., Sha, F.: Information-theoretical learning of discriminative clusters for unsupervised domain adaptation. arXiv preprint arXiv:1206.6438 (2012)

  53. Silva, S.M., Jung, C.R.: License plate detection and recognition in unconstrained scenarios. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11216, pp. 593–609. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01258-8_36

    Chapter  Google Scholar 

  54. Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: ICCV, pp. 1457–1464. IEEE (2011)

    Google Scholar 

  55. Wang, K., Belongie, S.: Word spotting in the wild. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 591–604. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15549-9_43

    Chapter  Google Scholar 

  56. Wang, T., Wu, D.J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), pp. 3304–3308. IEEE (2012)

    Google Scholar 

  57. Xue, C., Lu, S., Zhan, F.: Accurate scene text detection through border semantics awareness and bootstrapping. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 370–387. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_22

    Chapter  Google Scholar 

  58. Yang, M., et al.: Symmetry-constrained rectification network for scene text recognition. In: ICCV, pp. 9147–9156 (2019)

    Google Scholar 

  59. Yang, X., He, D., Zhou, Z., Kifer, D., Giles, C.L.: Learning to read irregular text with attention mechanisms. In: IJCAI, vol. 1, p. 3 (2017)

    Google Scholar 

  60. Yao, C., Bai, X., Shi, B., Liu, W.: Strokelets: a learned multi-scale representation for scene text recognition. In: Proceedings of the IEEE Conference on CVPR, pp. 4042–4049 (2014)

    Google Scholar 

  61. Yin, F., Wu, Y.C., Zhang, X.Y., Liu, C.L.: Scene text recognition with sliding convolutional character models. arXiv preprint arXiv:1709.01727 (2017)

  62. Zhan, F., Lu, S.: Esir: End-to-end scene text recognition via iterative image rectification. In: Proceedings of the IEEE Conference on CVPR, pp. 2059–2068 (2019)

    Google Scholar 

  63. Zhan, F., Lu, S., Xue, C.: Verisimilar image synthesis for accurate detection and recognition of texts in scenes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 257–273. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_16

    Chapter  Google Scholar 

  64. Zhan, F., Xue, C., Lu, S.: Ga-dan: geometry-aware domain adaptation network for scene text detection and recognition. In: ICCV, pp. 9105–9115 (2019)

    Google Scholar 

Download references

Acknowledgements

This work was partly supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2014-3-00077, AI National Strategy Project) and Ministry of Culture, Sports and Tourism and Korea Creative Content Agency(Project Number: R2020070004).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Younkwan Lee .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1384 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lee, Y., Yoo, H., Kim, Y., Jeong, J., Jeon, M. (2020). Self-supervised Attribute-Aware Refinement Network for Low-Quality Text Recognition. In: Bartoli, A., Fusiello, A. (eds) Computer Vision – ECCV 2020 Workshops. ECCV 2020. Lecture Notes in Computer Science(), vol 12539. Springer, Cham. https://doi.org/10.1007/978-3-030-68238-5_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-68238-5_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-68237-8

  • Online ISBN: 978-3-030-68238-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics