skip to main content
10.1145/3577164.3577165acmotherconferencesArticle/Chapter ViewAbstractPublication PagesvsipConference Proceedingsconference-collections
research-article

Text Detection and Recognition in Natural Scenes Based on TWO-DIMENSIONAL Attention

Published:04 April 2023Publication History

ABSTRACT

In this paper, we improve the natural scene text detection and recognition technology based on 2d attention and encoder-decoder framework. Firstly, the related work of text detection and recognition in different natural view is discussed. Secondly, we work on the basis of encoder-decoder framework and two-dimention module, and improve it through aggregation and hybridisation. Finally, we discussed and analyzed the results,and figured out the possible shortcomings of the model.

References

  1. hong Y, Karu K, Jain A K. Locating text in complex color images. Pattern Recognition, 1995, 28(10): 1523−1535.Google ScholarGoogle ScholarCross RefCross Ref
  2. Lee C M, Kankanhalli A. Automatic extraction of characters in complex scene images. International Journal of Pattern Recognition and Artifificial Intelligence, 1995, 9(1): 67−82.Google ScholarGoogle ScholarCross RefCross Ref
  3. Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform. In: Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE, 2010.2963−2970.Google ScholarGoogle ScholarCross RefCross Ref
  4. Shi B G, Bai X, Belongie S. Detecting oriented text in natural images by linking segments. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017. 3482−3490.Google ScholarGoogle ScholarCross RefCross Ref
  5. Wang R M, Sang N, Gao C X. Text detection approach based on confifidence map and context information. Neurocomputing, 2015, 157: 153−165.Google ScholarGoogle ScholarCross RefCross Ref
  6. Tian Z, Huang W L, He T, He P, Qiao Y. Detecting text in natural image with connectionist text proposal network. In: Proceedings of the 14th European Conference on Computer Vision. Cham, Switzerland: Springer, 2016. 56−72.Google ScholarGoogle ScholarCross RefCross Ref
  7. Huang W L, Qiao Y, Tang X O. Robust scene text detection with convolution neural network induced MSER trees.In: Proceedings of the 13th European Conference on Computer Vision. Cham, Switzerland: Springer, 2014. 497−511.Google ScholarGoogle ScholarCross RefCross Ref
  8. Liao M H, Shi B G, Bai X, Wang X G, Liu W Y. Textboxes: a fast text detector with a single deep neural network. In: Proceedings of the 31st AAAI Conference on Artifificial Intelligence. San Francisco, CA, USA: AAAI, 2017. 4161−4167Google ScholarGoogle ScholarCross RefCross Ref
  9. Li H, Wang P, Shen C H, Zhang G Y.Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition.. arXiv:1811.00751v2, 2019. 3Google ScholarGoogle Scholar
  10. Zhu D Q . The Research Progress and Prospects of Artificial Neural Networks[J]. Journal of Southern Yangtze University(Natural Science Edition), 2004, 3(1): 103-110Google ScholarGoogle Scholar
  11. Tian S,Pan Y,Huang C,et al.Text flow: A unified tet detection system in natural scene images[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:4651-4659Google ScholarGoogle Scholar
  12. Matas J. Chum O.,Urban M., Pajdla T..Robust wide-baseline stereo from maximally stable extremal regions[J].Image and Vision Computing,2004,22(10):761-767.Google ScholarGoogle Scholar
  13. Li Y,Lu H.Scene text detection via stroke width[C]//Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).IEEE,2012:681-684Google ScholarGoogle Scholar
  14. Chen S. Research on Scene Text Detection and Recognition Based on Deep Learning[D].Harbin University of Science and Technology, 2021.DOI:10.27063/d.cnki.ghlgu.2021.000407.Google ScholarGoogle ScholarCross RefCross Ref
  15. Liao M, Shi B, Bai X. TextBoxes++: A single-shot oriented scene text detector. IEEE Trans. on Image Processing, 2018,27(8):3676−3690. [doi: 10.1109/TIP.2018.2825107]Google ScholarGoogle ScholarCross RefCross Ref
  16. Liao M, Zhu Z, Shi B, Xia G, Bai X. Rotation-sensitive regression for oriented scene text detection. arXiv:1803.05265, 2018.Google ScholarGoogle Scholar
  17. Liu Y,Jin L.Deep matching prior network: Toward tighter multi-oriented text detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:1962-1969Google ScholarGoogle Scholar
  18. Zhu Y, Du J. TextMountain: Accurate scene text detection via instance segmentation. arXiv:1811.12786, 2018.Google ScholarGoogle Scholar
  19. Li X, Wang W, Hou W, Liu R, Lu T, Yang J. Shape robust text detection with progressive scale expansion network. arXiv: 1903.12473v2, 2018.Google ScholarGoogle Scholar
  20. Zhong Z, Sun L, Huo Q. An anchor-free region proposal network for faster R-CNN based text detection approaches. Int'l Journal on Document Analysis and Recognition, 2019,22(3):315−327. [doi: 10.1007/s10032-019-00335-y]Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Li Y, Yu Y, Li Z, Lin Y, Xu M, Li J, Zhou X. Pixel-anchor: A fast oriented scene text detector with combined networks. arXiv:1811.07432v1, 2018.Google ScholarGoogle Scholar
  22. Liu C Y, Chen X X,Luo C J, Jin L W, Xue Y and Liu Y L.2021 Deep learning methods for scene text detection and recognition. Journal of Image and Graphics, 26(06): 1330-1367Google ScholarGoogle Scholar
  23. Wang J X,Wang Z Y,Tian X. Review of natural scene text detection and recognition based on deep learning. Ruan Jian Xue Bao/Journal of Software,2020,31(5):1465-1496(in Chinese).Google ScholarGoogle Scholar
  24. Liu Y J,Yi X H,Li Y G,Zhang H Y,Liu Y Z. Application of Scene Text Recognition Technology Based on Deep Learning: A Survey. Computer Engineering and Applications,20222,58(4):52-58Google ScholarGoogle Scholar
  25. Shi B,Bai X,Yao C.An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,39(11):2298-2304.Google ScholarGoogle Scholar
  26. Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, "Gradient-based learning applied to document recognition," in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998, doi: 10.1109/5.726791.Google ScholarGoogle ScholarCross RefCross Ref
  27. Krizhevsky A , Sutskever I , Hinton G . ImageNet Classification with Deep Convolutional Neural Networks[J]. Advances in neural information processing systems, 2012, 25(2).Google ScholarGoogle Scholar
  28. Mccraith R, Neumann L, Zisserman A, Vedaldi A. Monocular Depth Estimation with Self-supervised Instance Adaptation.  arXiv:2004.05821v1, 2020.4Google ScholarGoogle Scholar
  29. Szegedy C, Liu W, Jia Y Q, Sermanet P, Rabinovich A. Going Deeper with Convolutions arXiv:1409.4842v1,2014.9Google ScholarGoogle Scholar
  30. He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition[J]. IEEE, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  31. Tang C.Review and prospect of convolutional neural networks[J]. China New Telecommunications,2022,22(23)Google ScholarGoogle Scholar
  32. Shi, B.; Bai, X.; and Yao, C. 2017. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11):2298–2304.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yang L, Wu Y X,Wang J L,Liu Y L. Research on recurrent neural network.Journal of Computer Applications.2018,38(S2),1-6+26Google ScholarGoogle Scholar
  34. Kingma, D., and Ba, J. 2014. Adam: A method for stochastic optimization. In Proc. Int. Conf. Learn. Representations.Google ScholarGoogle Scholar
  35. liuch37/sar-pytorch: Implementation of Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition published in AAAI 2019 in PyTorch (github.com)Google ScholarGoogle Scholar

Index Terms

  1. Text Detection and Recognition in Natural Scenes Based on TWO-DIMENSIONAL Attention

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      VSIP '22: Proceedings of the 2022 4th International Conference on Video, Signal and Image Processing
      November 2022
      165 pages
      ISBN:9781450397810
      DOI:10.1145/3577164

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 April 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)19
      • Downloads (Last 6 weeks)0

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format