Skip to main content

Localization of Text in Photorealistic Images

  • Conference paper
  • First Online:
Computational Science and Its Applications – ICCSA 2019 (ICCSA 2019)

Abstract

Detection and localization of text in photorealistic images is a difficult, and not yet completely solved, problem. We propose the approach to solving this problem based on the method of semantic image segmentation. In this interpretation, text characters are treated as objects to be segmented. In this paper proposes the network architecture for text localization, describes the procedure for the formation of the training set, and considers the algorithm for pre-processing images, reducing the amount of processed data and simplifying the segmentation of the object “background”. The network architecture is a modification of well-known DeepLabv3 network and takes into account the specifics of images of text characters. The proposed method is able to determine the location of text characters in the images with acceptable accuracy. Experimental results of assessing the quality of text localization by the IoU criterion (Intersection over Union) showed that the obtained accuracy is sufficient for further text recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2315–2324. IEEE, Las Vegas, NV (2016). https://doi.org/10.1109/CVPR.2016.254

  2. Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: TextBoxes: A fast text detector with a single deep neural network. In: 31st AAAI Conference on Artificial Intelligence, pp. 4161–4167. AAAI, San Francisco (2017)

    Google Scholar 

  3. Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vis. 116(1), 1–20 (2016). https://doi.org/10.1007/s11263-015-0823-z

    Article  MathSciNet  Google Scholar 

  4. Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: 2011 International Conference on Computer Vision, pp. 1457–1464. IEEE, Barcelona, Spain (2011). https://doi.org/10.1109/ICCV.2011.6126402

  5. Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6494, pp. 770–783. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19318-7_60

    Chapter  Google Scholar 

  6. Grishkin, V.: Document image segmentation based on wavelet features. In: 2015 Computer Science and Information Technologies (CSIT), pp. 82–84. IEEE, Yerevan, Armenia (2015). https://doi.org/10.1109/CSITechnol.2015.7358255

  7. Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2963–2970. IEEE, San Francisco, CA (2010). https://doi.org/10.1109/CVPR.2010.5540041

  8. Yao, C., Bai, X., Shi, B., Liu, W.: Strokelets: A learned multi-scale representation for scene text recognition. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 4042–4049. IEEE, Columbus, OH (2014). https://doi.org/10.1109/CVPR.2014.515

  9. Wang, K., Belongie, S.: Word spotting in the wild. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 591–604. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15549-9_43

    Chapter  Google Scholar 

  10. Bartz, C., Yang, H., Meinel, C.: STN-OCR: A single neural network for text detection and text recognition. https://arxiv.org/abs/1707.08831. Accessed 20 Mar 2019

  11. Busta, M., Neumann, L., Matas, J.: Deep TextSpotter: An end-to-end trainable scene text localization and recognition framework. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2223–2231. IEEE, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.242

  12. Chen, L., Barron, J.T., Papandreou, G., Murphy, K., Yuille, A.L.: Semantic image segmentation with task-specific edge detection using CNNs and a discriminatively trained domain transform. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4545–4554. IEEE, Las Vegas, NV (2016). https://doi.org/10.1109/CVPR.2016.492

  13. Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. https://arxiv.org/abs/1412.7062. Accessed 10 Mar 2019

  14. Chen, L., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. https://arxiv.org/abs/1706.05587. Accessed 20 Mar 2019

  15. Pascal VOC data set mirror. https://pjreddie.com/projects/pascal-voc-dataset-mirror/. Accessed 27 Feb 2019

  16. Karatzas, D. et al.: ICDAR 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1484–1493. IEEE, Washington, DC (2013). https://doi.org/10.1109/ICDAR.2013.221

  17. The Street View Text Dataset (SVT). http://tc11.cvc.uab.es/datasets/SVT_1. Accessed 3 Mar 2019

  18. The Chars74K dataset. http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/. Accessed 10 Feb 2019

Download references

Acknowledgment

The authors acknowledge Saint-Petersburg State University for a research grant 39417687.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Grishkin Valery .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Valery, G., Alexander, E., Oleg, I. (2019). Localization of Text in Photorealistic Images. In: Misra, S., et al. Computational Science and Its Applications – ICCSA 2019. ICCSA 2019. Lecture Notes in Computer Science(), vol 11622. Springer, Cham. https://doi.org/10.1007/978-3-030-24305-0_63

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-24305-0_63

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-24304-3

  • Online ISBN: 978-3-030-24305-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics