Abstract
Scene text recognition with irregular layouts is a challenging yet important problem in computer vision. One widely used method is to employ a rectification network before the recognition stage. However, most previous rectification methods either did not consider recognition information or were integrated into end-to-end recognition models without considering rectification explicitly. To overcome this issue, we propose an adversarial learning-based rectification network that integrates transformation (from irregular texts to regular texts) with recognition information into a unified framework. In this framework, we optimize the rectification network with an extended Generative Adversarial Network that competes between rectifier and discriminator, together with the results of a recognizer. To evaluate the rectification performance, we generated a regular-irregular pair set from the benchmark datasets, and experimental results show that the proposed method can achieve significant improvement on the rectification performance with comparable recognition performance. Specifically, the PSNR and SSIM are improved by 0.81 and 0.051, respectively, which demonstrates its effectiveness.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Huang, K., Hussain, A., Wang, Q., Zhang, R. (eds.): Deep Learning: Fundamentals, Theory and Applications. Cognitive Computation Trends, vol. 2. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-06073-2
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)
Bissacco, A., Cummins, M., Netzer, Y., et al.: PhotoOCR: reading text in uncontrolled conditions. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 785–792 (2013)
Shi, B., Yang, M., Wang, X., et al.: ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2018)
Luo, C., Jin, L., Sun, Z.: MORAN: a multi-object rectified attention network for scene text recognition. Pattern Recogn. 90, 109–118 (2019)
Guo, Z., Xu, H., Lu, F., et al.: Improving irregular text recognition by integrating gabor convolutional network. In: 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), pp. 286–293. IEEE (2019)
Bookstein, F.L.: Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell. 11(6), 567–585 (1989)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Ledig, C., Theis, L., Huszár, F., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4681–4690 (2017)
Qian, Z., Huang, K., Wang, Q., et al.: Generative adversarial classifier for handwriting characters super-resolution. Pattern Recogn. 107, 107453 (2020)
Li, C.X., Xu, T., Zhu, J., et al.: Triple generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 4088–4098 (2017)
Zeiler, M.D.: Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Jaderberg, M., Simonyan, K., Vedaldi, A., et al.: Synthetic data and artificial neural networks for natural scene text recognition. In: Workshop on Deep Learning, NIPS (2014)
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016)
Mishra, A., Alahari, K., Jawahar, C.V.: Scene text recognition using higher order language priors. In: British Machine Vision Conference, pp. 1–11 (2012)
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: 2011 International Conference on Computer Vision, pp. 1457–1464. IEEE (2011)
Lucas, S.M., Panaretos, A., Sosa, L., et al.: ICDAR 2003 robust reading competitions. In: Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings, pp. 682–687. IEEE (2003)
Karatzas, D., Shafait, F., Uchida, S., et al.: ICDAR 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1484–1493 IEEE (2013)
Karatzas D, Gomez-Bigorda L, Nicolaou A, et al.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)
Phan, T.Q., Shivakumara, P., Tian, S., et al.: Recognizing text with perspective distortion in natural scenes. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 569–576 (2013)
Risnumawan, A., Shivakumara, P., Chan, C.S., et al.: A robust arbitrary text detection system for natural scene images. Exp. Syst. Appl. 41(18), 8027–8048 (2014)
Wang, Z., Bovik, A.C., Sheikh, H.R., et al.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Acknowledgements
This study was funded by National Natural Science Foundation of China under no. 61876154 and 61876155; Natural Science Foundation of Jiangsu Province BK20181189 and BK20181190; Key Program Special Fund in XJTLU under no. KSF-A-10, KSF-A-01, KSF-P-02, KSF-E-26 and KSF-T-06; and XJTLU Research Development Fund RDF-16-02-49 and RDF-16-01-57.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, J., Wang, QF., Zhang, R., Huang, K. (2020). Adversarial Rectification Network for Scene Text Regularization. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12533. Springer, Cham. https://doi.org/10.1007/978-3-030-63833-7_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-63833-7_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63832-0
Online ISBN: 978-3-030-63833-7
eBook Packages: Computer ScienceComputer Science (R0)