DetectGAN: GAN-based text detector for camera-captured document images

Zhao, Jinyuan; Wang, Yanna; Xiao, Baihua; Shi, Cunzhao; Jia, Fuxi; Wang, Chunheng

doi:10.1007/s10032-020-00358-w

DetectGAN: GAN-based text detector for camera-captured document images

Original Paper
Published: 10 August 2020

Volume 23, pages 267–277, (2020)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Jinyuan Zhao^1,2,
Yanna Wang¹,
Baihua Xiao¹,
Cunzhao Shi¹,
Fuxi Jia¹ &
…
Chunheng Wang¹

733 Accesses
7 Citations
Explore all metrics

Abstract

Nowadays, with the development of electronic devices, more and more attention has been paid to camera-based text processing. Different from scene image, the recognition system of document image needs to sort out the recognition results and store them in the structured document for the subsequent data processing. However, in document images, the fusion of text lines largely depends on their semantic information rather than just the distance between the characters, which causes the problem of learning confusion in training. At the same time, for multi-directional printed characters in document images, it is necessary to use additional directional information to guide subsequent recognition tasks. In order to avoid learning confusion and get recognition-friendly detection results, we propose a character-level text detection framework, DetectGAN, based on the conditional generative adversarial networks (abbreviation cGAN used in the text). In the proposed framework, position regression and NMS process are removed, and the problem of text detection is directly transformed into an image-to-image generation problem. Experimental results show that our method has an excellent effect on text detection of camera-captured document images and outperforms the classical and state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Natural Scene Text Extraction Approach Based on Generative Adversarial Learning

Separating Content from Style Using Adversarial Learning for Recognizing Text in the Wild

Article 05 January 2021

TextBFA: Arbitrary Shape Text Detection with Bidirectional Feature Aggregation

References

Chen, F., Carter, S., Denoue, L. et al.: SmartDCap: semi-automatic capture of higher quality document images from a smartphone. In: Proceedings of the 2013 International Conference on Intelligent User Interfaces, pp. 287–296. ACM (2013)
Kumar, J., Bala, R., Ding, H., Emmett, P.: Mobile video capture of multi-page documents. In: Conference on Computer Vision and Pattern Recognition Workshops, p. 3540. IEEE (2013)
Yao, C., Bai, X., Sang, N., Zhou, X., Zhou, S., Cao, Z.: Scene text detection via holistic, multi-channel prediction. CoRR, arXiv:1606.09002 (2016)
Lyu, P., Liao, M., Yao, C., et al.: Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 67–83 (2018)
Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9365–9374 (2019)
Ohta, K.: Character segmentation of address reading/letter sorting machine for the ministry of posts and telecommunications of Japan. NEC Res. Dev. 34(2), 248256 (1993)
Google Scholar
Lee, S.-W., Lee, D.-J., Park, H.-S.: A new methodology for grayscale character segmentation and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 18(10), 10451050 (1996)
Google Scholar
Shivakumara, P., Bhowmick, S., Su, B., Tan, C.L., Pal, U.: A new gradient based character segmentation method for video text recognition. In: International Conference on Document Analysis and Recognition (ICDAR), p. 126130. IEEE (2011)
Taxt, T., Flynn, P.J., Jain, A.K.: Segmentation of document images. IEEE Trans. Pattern Anal. Mach. Intell. 11(12), 1322–1329 (1989)
Article Google Scholar
Busta, M., Neumann, L., Fastext, Matas J.: Efficient unconstrained scene text detector. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1206–1214 (2015)
Koo, H.I.: Text-line detection in camera-captured document images using the state estimation of connected components. IEEE Trans. Image Process. 25(11), 5358–5368 (2016)
Article MathSciNet Google Scholar
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 19(1), 6266 (1979)
Google Scholar
Bernsen, J.: Dynamic thresholding of grey-level images. In: ICPR, p. 12511255. IEEE (1986)
Niblack, W.: An Introduction to Digital Image Processing. Strandberg Publishing Company, Copenhagen (1985)
Google Scholar
Sauvola, J., Pietikainen, M.: Adaptive document image binarization. Pattern Recogn. 33(2), 225236 (2000)
Article Google Scholar
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C., Berg, A.C.: SSD: single shot multibox detector. In: Computer Vision—ECCV 2016—14th European Conference, Amsterdam, The Netherlands, 11–14 Oct 2016, Proceedings. Part I, p. 2137 (2016)
Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, p. 779788 (2016)
Huang, L., Yang, Y., Deng, Y., Yu, Y.: Densebox: unifying landmark localization with end to end object detection. CoRR, arXiv:1509.04874 (2015)
Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 4–9 Feb 2017, San Francisco, California, USA, p. 41614167 (2017)
Liao, M., Shi, B., Xiang, B.: TextBoxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process. 27(8), 3676–3690 (2018)
Article MathSciNet Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., et al.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 2999–3007 (2017)
Google Scholar
Liao, M., Zhu, Z., Shi, B., et al.: Rotation-sensitive regression for oriented scene text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5909–5918 (2018)
Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2550–2558 (2017)
Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2550–2558 (2017)
Li, X., Wang, W., Hou, W., Liu, R. Z., Lu, T., Yang, J.: Shape robust text detection with progressive scale expansion network (2018)
Lyu, P., Yao., C, Wu, W., et al.: Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7553–7563 (2018)
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, p. 23152324 (2016)
Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, p. 41594167 (2016)
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114 (2013)
Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. In: International Conference on Machine Learning, pp. 1278–1286 (2014)
Oord, A.V.D., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. In: International Conference on Machine Learning, pp. 1747–1756 (2016)
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, vol. 9351, pp. 234–241 (2015)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Zhang, X.Y., Bengio, Y., Liu, C.L.: Online and offline handwritten Chinese character recognition: a comprehensive study and new benchmark. Pattern Recogn. 61((Complete)), 348–360 (2017)
Article Google Scholar
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: International Conference on Neural Information Processing Systems, pp. 2672–2680 (2014)
Denton, E., Chintala, S., Szlam, A., Fergus, R.: Deep generative image models using a Laplacian pyramid of adversarial networks. In: International Conference on Neural Information Processing Systems, pp. 1486–1494 (2015)
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: International Conference on International Conference on. Machine Learning, pp. 1060–1069 (2016)

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (NSFC) under Grant No. 71621002 and the Key Programs of the Chinese Academy of Sciences under Grant Nos. ZDBS-SSW-JSC003, ZDBS-SSW-JSC004 and ZDBS-SSW-JSC005.

Author information

Authors and Affiliations

Institute of Automation, Chinese Academy of Sciences (CASIA), 95 Zhongguancun East Road, Beijing, 100190, PR China
Jinyuan Zhao, Yanna Wang, Baihua Xiao, Cunzhao Shi, Fuxi Jia & Chunheng Wang
University of Chinese Academy of Sciences (UCAS), No. 19 (A) Yuquan Road, Shijingshan District, Beijing, 100049, PR China
Jinyuan Zhao

Authors

Jinyuan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yanna Wang
View author publications
You can also search for this author in PubMed Google Scholar
Baihua Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Cunzhao Shi
View author publications
You can also search for this author in PubMed Google Scholar
Fuxi Jia
View author publications
You can also search for this author in PubMed Google Scholar
Chunheng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinyuan Zhao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, J., Wang, Y., Xiao, B. et al. DetectGAN: GAN-based text detector for camera-captured document images. IJDAR 23, 267–277 (2020). https://doi.org/10.1007/s10032-020-00358-w

Download citation

Received: 01 December 2019
Revised: 10 June 2020
Accepted: 23 July 2020
Published: 10 August 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s10032-020-00358-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DetectGAN: GAN-based text detector for camera-captured document images

Abstract

Access this article

Similar content being viewed by others

A Natural Scene Text Extraction Approach Based on Generative Adversarial Learning

Separating Content from Style Using Adversarial Learning for Recognizing Text in the Wild

TextBFA: Arbitrary Shape Text Detection with Bidirectional Feature Aggregation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DetectGAN: GAN-based text detector for camera-captured document images

Abstract

Access this article

Similar content being viewed by others

A Natural Scene Text Extraction Approach Based on Generative Adversarial Learning

Separating Content from Style Using Adversarial Learning for Recognizing Text in the Wild

TextBFA: Arbitrary Shape Text Detection with Bidirectional Feature Aggregation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation