A Natural Scene Text Extraction Approach Based on Generative Adversarial Learning

Xu, Huali; Su, Xiangdong; Liu, Tongyang; Guo, Pengcheng; Gao, Guanglai; Bao, Feilong

doi:10.1007/978-3-030-36708-4_6

Huali Xu¹¹,
Xiangdong Su¹¹,
Tongyang Liu¹¹,
Pengcheng Guo¹¹,
Guanglai Gao¹¹ &
…
Feilong Bao¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11953))

Included in the following conference series:

International Conference on Neural Information Processing

2756 Accesses
1 Citations

Abstract

Extracting textual information embodied in natural scenes is a very challenge task, and has a great influence on the performance of the following text recognition and understanding. It can be seen as an image-to-image conversion task, in which we transform the front text in each natural image into a specified color and the background into black. After that, we use the connected component algorithm to extract text from the two-color image. Based on such motivation, we proposed an approach based on generative adversarial learning to deal with the image-to-image conversion. The neural network in our approach consists of a generator sub-network and a discriminator sub-network, which are trained with paired images (scene images and their corresponding two-color images) in an adversarial way. After the training stage, the generator network is used to perform image conversion. Experiments on standard datasets including KAIST scene text database and MSRA text detection 500 database demonstrate that the proposed algorithm achieves a very competitive performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Busta, M., Neumann, L., Matas, J.: FASText: efficient unconstrained scene text detector. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1206–1214 (2015)
Google Scholar
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016)
Google Scholar
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vision 116(1), 1–20 (2016)
Article MathSciNet Google Scholar
Tian, S., Pan, Y., Huang, C., Lu, S., Yu, K., Lim Tan, C.: Text flow: a unified text detection system in natural scene images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4651–4659 (2015)
Google Scholar
Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4159–4167 (2016)
Google Scholar
Coates, A., et al.: Text detection and character recognition in scene images with unsupervised feature learning. In: 2011 International Conference on Document Analysis and Recognition. IEEE, pp. 440–445 (2011)
Google Scholar
Huang, W., Qiao, Y., Tang, X.: Robust scene text detection with convolution neural network induced MSER trees. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 497–511. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_33
Chapter Google Scholar
Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 512–528. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_34
Chapter Google Scholar
Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. CoRR, vol. abs/1604.04018 (2016)
Google Scholar
Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 2672–2680. Curran Associates Inc., Red Hook (2014)
Google Scholar
Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: CVPR, pp. 1083–1090, June 2012
Google Scholar
Yin, X., Yin, X., Huang, K.: Robust text detection in natural scene images. CoRR, vol. abs/1301.2628 (2013)
Google Scholar
Cho, H., Sung, M., Jun, B.: Canny text detector: fast and robust scene text localization algorithm. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
Google Scholar
Tong, H., Huang, W., Qiao, Y., Yao, J.: Text-attentional convolutional neural network for scene text detection, October 2015
Google Scholar
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Ser. ICDAR 2015, pp. 1156–1160. IEEE Computer Society, Washington, D.C. (2015)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. CoRR, vol. abs/1512.02325 (2015)
Google Scholar
Wang, T., Wu, D.J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR 2012), pp. 3304–3308 (2012)
Google Scholar
Girshick, R.: Fast R-CNN. In: The IEEE International Conference on Computer Vision (ICCV), December 2015
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28, pp. 91–99. Curran Associates Inc., Red Hook (2015)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR, vol. abs/1409.1556 (2014)
Google Scholar
Huang, W., Lin, Z., Yang, J., Wang, J.: Text localization in natural images using stroke feature transform and text covariance descriptors. In: The IEEE International Conference on Computer Vision (ICCV), December 2013
Google Scholar
Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. arXiv:1604.04018 (2016)
Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: The IEEE International Conference on Computer Vision (ICCV), October 2017
Google Scholar
Liu, G., Reda, F.A., Shih, K.J., Wang, T.-C., Tao, A., Catanzaro, B.: Image inpainting for irregular holes using partial convolutions. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 89–105. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_6
Chapter Google Scholar
Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: feature learning by inpainting. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
Google Scholar
Luc, P., Couprie, C., Chintala, S., Verbeek, J.: Semantic segmentation using adversarial networks. CoRR, vol. abs/1611.08408 (2016)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. CoRR, vol. abs/1502.03167 (2015)
Google Scholar
Collis, J.: Glossary of deep learning: batch normalisation (2017)
Google Scholar
Jung, J., Lee, S., Cho, M.S., Kim, J.H.: Touch TT: scene text extractor using touch screen interface. ETRI J. 33(1), 78–88 (2011)
Article Google Scholar
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localization in natural images. In: CVPR (2016)
Google Scholar
Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_4
Chapter Google Scholar

Download references

Acknowledgement

This work was funded by National Natural Science Foundation of China (Grant No. 61563040, 61773224, 61762069, 61866029), Natural Science Foundation of Inner Mongolia Autonomous Region (Grant No. 2017BS0601, 2016ZD06), and program of higher-level talents of Inner Mongolia University (Grant No. 21500-5165161).

Author information

Authors and Affiliations

College of Computer Science, Inner Mongolia University, Hohhot, China
Huali Xu, Xiangdong Su, Tongyang Liu, Pengcheng Guo, Guanglai Gao & Feilong Bao

Authors

Huali Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xiangdong Su
View author publications
You can also search for this author in PubMed Google Scholar
Tongyang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Pengcheng Guo
View author publications
You can also search for this author in PubMed Google Scholar
Guanglai Gao
View author publications
You can also search for this author in PubMed Google Scholar
Feilong Bao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiangdong Su .

Editor information

Editors and Affiliations

Australian National University, Canberra, ACT, Australia
Tom Gedeon
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, H., Su, X., Liu, T., Guo, P., Gao, G., Bao, F. (2019). A Natural Scene Text Extraction Approach Based on Generative Adversarial Learning. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Lecture Notes in Computer Science(), vol 11953. Springer, Cham. https://doi.org/10.1007/978-3-030-36708-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-36708-4_6
Published: 09 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36707-7
Online ISBN: 978-3-030-36708-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics