Skip to main content

Adversarial Rectification Network for Scene Text Regularization

  • Conference paper
  • First Online:
  • 2415 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12533))

Abstract

Scene text recognition with irregular layouts is a challenging yet important problem in computer vision. One widely used method is to employ a rectification network before the recognition stage. However, most previous rectification methods either did not consider recognition information or were integrated into end-to-end recognition models without considering rectification explicitly. To overcome this issue, we propose an adversarial learning-based rectification network that integrates transformation (from irregular texts to regular texts) with recognition information into a unified framework. In this framework, we optimize the rectification network with an extended Generative Adversarial Network that competes between rectifier and discriminator, together with the results of a recognizer. To evaluate the rectification performance, we generated a regular-irregular pair set from the benchmark datasets, and experimental results show that the proposed method can achieve significant improvement on the rectification performance with comparable recognition performance. Specifically, the PSNR and SSIM are improved by 0.81 and 0.051, respectively, which demonstrates its effectiveness.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Huang, K., Hussain, A., Wang, Q., Zhang, R. (eds.): Deep Learning: Fundamentals, Theory and Applications. Cognitive Computation Trends, vol. 2. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-06073-2

    Book  Google Scholar 

  2. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)

    Article  Google Scholar 

  3. Bissacco, A., Cummins, M., Netzer, Y., et al.: PhotoOCR: reading text in uncontrolled conditions. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 785–792 (2013)

    Google Scholar 

  4. Shi, B., Yang, M., Wang, X., et al.: ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2018)

    Article  Google Scholar 

  5. Luo, C., Jin, L., Sun, Z.: MORAN: a multi-object rectified attention network for scene text recognition. Pattern Recogn. 90, 109–118 (2019)

    Article  Google Scholar 

  6. Guo, Z., Xu, H., Lu, F., et al.: Improving irregular text recognition by integrating gabor convolutional network. In: 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), pp. 286–293. IEEE (2019)

    Google Scholar 

  7. Bookstein, F.L.: Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell. 11(6), 567–585 (1989)

    Article  Google Scholar 

  8. Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)

    Google Scholar 

  9. Ledig, C., Theis, L., Huszár, F., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4681–4690 (2017)

    Google Scholar 

  10. Qian, Z., Huang, K., Wang, Q., et al.: Generative adversarial classifier for handwriting characters super-resolution. Pattern Recogn. 107, 107453 (2020)

    Article  Google Scholar 

  11. Li, C.X., Xu, T., Zhu, J., et al.: Triple generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 4088–4098 (2017)

    Google Scholar 

  12. Zeiler, M.D.: Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012)

  13. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  14. Jaderberg, M., Simonyan, K., Vedaldi, A., et al.: Synthetic data and artificial neural networks for natural scene text recognition. In: Workshop on Deep Learning, NIPS (2014)

    Google Scholar 

  15. Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016)

    Google Scholar 

  16. Mishra, A., Alahari, K., Jawahar, C.V.: Scene text recognition using higher order language priors. In: British Machine Vision Conference, pp. 1–11 (2012)

    Google Scholar 

  17. Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: 2011 International Conference on Computer Vision, pp. 1457–1464. IEEE (2011)

    Google Scholar 

  18. Lucas, S.M., Panaretos, A., Sosa, L., et al.: ICDAR 2003 robust reading competitions. In: Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings, pp. 682–687. IEEE (2003)

    Google Scholar 

  19. Karatzas, D., Shafait, F., Uchida, S., et al.: ICDAR 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1484–1493 IEEE (2013)

    Google Scholar 

  20. Karatzas D, Gomez-Bigorda L, Nicolaou A, et al.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)

    Google Scholar 

  21. Phan, T.Q., Shivakumara, P., Tian, S., et al.: Recognizing text with perspective distortion in natural scenes. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 569–576 (2013)

    Google Scholar 

  22. Risnumawan, A., Shivakumara, P., Chan, C.S., et al.: A robust arbitrary text detection system for natural scene images. Exp. Syst. Appl. 41(18), 8027–8048 (2014)

    Article  Google Scholar 

  23. Wang, Z., Bovik, A.C., Sheikh, H.R., et al.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  Google Scholar 

Download references

Acknowledgements

This study was funded by National Natural Science Foundation of China under no. 61876154 and 61876155; Natural Science Foundation of Jiangsu Province BK20181189 and BK20181190; Key Program Special Fund in XJTLU under no. KSF-A-10, KSF-A-01, KSF-P-02, KSF-E-26 and KSF-T-06; and XJTLU Research Development Fund RDF-16-02-49 and RDF-16-01-57.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiu-Feng Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, J., Wang, QF., Zhang, R., Huang, K. (2020). Adversarial Rectification Network for Scene Text Regularization. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12533. Springer, Cham. https://doi.org/10.1007/978-3-030-63833-7_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63833-7_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63832-0

  • Online ISBN: 978-3-030-63833-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics