Skip to main content
Log in

DetectGAN: GAN-based text detector for camera-captured document images

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

Nowadays, with the development of electronic devices, more and more attention has been paid to camera-based text processing. Different from scene image, the recognition system of document image needs to sort out the recognition results and store them in the structured document for the subsequent data processing. However, in document images, the fusion of text lines largely depends on their semantic information rather than just the distance between the characters, which causes the problem of learning confusion in training. At the same time, for multi-directional printed characters in document images, it is necessary to use additional directional information to guide subsequent recognition tasks. In order to avoid learning confusion and get recognition-friendly detection results, we propose a character-level text detection framework, DetectGAN, based on the conditional generative adversarial networks (abbreviation cGAN used in the text). In the proposed framework, position regression and NMS process are removed, and the problem of text detection is directly transformed into an image-to-image generation problem. Experimental results show that our method has an excellent effect on text detection of camera-captured document images and outperforms the classical and state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Chen, F., Carter, S., Denoue, L. et al.: SmartDCap: semi-automatic capture of higher quality document images from a smartphone. In: Proceedings of the 2013 International Conference on Intelligent User Interfaces, pp. 287–296. ACM (2013)

  2. Kumar, J., Bala, R., Ding, H., Emmett, P.: Mobile video capture of multi-page documents. In: Conference on Computer Vision and Pattern Recognition Workshops, p. 3540. IEEE (2013)

  3. Yao, C., Bai, X., Sang, N., Zhou, X., Zhou, S., Cao, Z.: Scene text detection via holistic, multi-channel prediction. CoRR, arXiv:1606.09002 (2016)

  4. Lyu, P., Liao, M., Yao, C., et al.: Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 67–83 (2018)

  5. Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9365–9374 (2019)

  6. Ohta, K.: Character segmentation of address reading/letter sorting machine for the ministry of posts and telecommunications of Japan. NEC Res. Dev. 34(2), 248256 (1993)

    Google Scholar 

  7. Lee, S.-W., Lee, D.-J., Park, H.-S.: A new methodology for grayscale character segmentation and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 18(10), 10451050 (1996)

    Google Scholar 

  8. Shivakumara, P., Bhowmick, S., Su, B., Tan, C.L., Pal, U.: A new gradient based character segmentation method for video text recognition. In: International Conference on Document Analysis and Recognition (ICDAR), p. 126130. IEEE (2011)

  9. Taxt, T., Flynn, P.J., Jain, A.K.: Segmentation of document images. IEEE Trans. Pattern Anal. Mach. Intell. 11(12), 1322–1329 (1989)

    Article  Google Scholar 

  10. Busta, M., Neumann, L., Fastext, Matas J.: Efficient unconstrained scene text detector. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1206–1214 (2015)

  11. Koo, H.I.: Text-line detection in camera-captured document images using the state estimation of connected components. IEEE Trans. Image Process. 25(11), 5358–5368 (2016)

    Article  MathSciNet  Google Scholar 

  12. Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 19(1), 6266 (1979)

    Google Scholar 

  13. Bernsen, J.: Dynamic thresholding of grey-level images. In: ICPR, p. 12511255. IEEE (1986)

  14. Niblack, W.: An Introduction to Digital Image Processing. Strandberg Publishing Company, Copenhagen (1985)

    Google Scholar 

  15. Sauvola, J., Pietikainen, M.: Adaptive document image binarization. Pattern Recogn. 33(2), 225236 (2000)

    Article  Google Scholar 

  16. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C., Berg, A.C.: SSD: single shot multibox detector. In: Computer Vision—ECCV 2016—14th European Conference, Amsterdam, The Netherlands, 11–14 Oct 2016, Proceedings. Part I, p. 2137 (2016)

  17. Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, p. 779788 (2016)

  18. Huang, L., Yang, Y., Deng, Y., Yu, Y.: Densebox: unifying landmark localization with end to end object detection. CoRR, arXiv:1509.04874 (2015)

  19. Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 4–9 Feb 2017, San Francisco, California, USA, p. 41614167 (2017)

  20. Liao, M., Shi, B., Xiang, B.: TextBoxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process. 27(8), 3676–3690 (2018)

    Article  MathSciNet  Google Scholar 

  21. Lin, T.Y., Goyal, P., Girshick, R., et al.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 2999–3007 (2017)

    Google Scholar 

  22. Liao, M., Zhu, Z., Shi, B., et al.: Rotation-sensitive regression for oriented scene text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5909–5918 (2018)

  23. Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2550–2558 (2017)

  24. Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2550–2558 (2017)

  25. Li, X., Wang, W., Hou, W., Liu, R. Z., Lu, T., Yang, J.: Shape robust text detection with progressive scale expansion network (2018)

  26. Lyu, P., Yao., C, Wu, W., et al.: Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7553–7563 (2018)

  27. Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, p. 23152324 (2016)

  28. Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, p. 41594167 (2016)

  29. Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114 (2013)

  30. Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. In: International Conference on Machine Learning, pp. 1278–1286 (2014)

  31. Oord, A.V.D., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. In: International Conference on Machine Learning, pp. 1747–1756 (2016)

  32. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, vol. 9351, pp. 234–241 (2015)

  33. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)

  34. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)

  35. Zhang, X.Y., Bengio, Y., Liu, C.L.: Online and offline handwritten Chinese character recognition: a comprehensive study and new benchmark. Pattern Recogn. 61((Complete)), 348–360 (2017)

    Article  Google Scholar 

  36. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: International Conference on Neural Information Processing Systems, pp. 2672–2680 (2014)

  37. Denton, E., Chintala, S., Szlam, A., Fergus, R.: Deep generative image models using a Laplacian pyramid of adversarial networks. In: International Conference on Neural Information Processing Systems, pp. 1486–1494 (2015)

  38. Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: International Conference on International Conference on. Machine Learning, pp. 1060–1069 (2016)

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (NSFC) under Grant No. 71621002 and the Key Programs of the Chinese Academy of Sciences under Grant Nos. ZDBS-SSW-JSC003, ZDBS-SSW-JSC004 and ZDBS-SSW-JSC005.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinyuan Zhao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, J., Wang, Y., Xiao, B. et al. DetectGAN: GAN-based text detector for camera-captured document images. IJDAR 23, 267–277 (2020). https://doi.org/10.1007/s10032-020-00358-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-020-00358-w

Keywords

Navigation