research-article

Text Detection and Recognition in Natural Scenes Based on TWO-DIMENSIONAL Attention

Author:
Baicun Guo

Computer Science, North China Electric Power University, China

Computer Science, North China Electric Power University, China

0000-0002-2755-2687
View Profile

VSIP '22: Proceedings of the 2022 4th International Conference on Video, Signal and Image ProcessingNovember 2022Pages 1–6https://doi.org/10.1145/3577164.3577165

Published:04 April 2023Publication History

VSIP '22: Proceedings of the 2022 4th International Conference on Video, Signal and Image Processing

Pages 1–6

ABSTRACT

In this paper, we improve the natural scene text detection and recognition technology based on 2d attention and encoder-decoder framework. Firstly, the related work of text detection and recognition in different natural view is discussed. Secondly, we work on the basis of encoder-decoder framework and two-dimention module, and improve it through aggregation and hybridisation. Finally, we discussed and analyzed the results,and figured out the possible shortcomings of the model.

References

hong Y, Karu K, Jain A K. Locating text in complex color images. Pattern Recognition, 1995, 28(10): 1523−1535.Google ScholarCross Ref
Lee C M, Kankanhalli A. Automatic extraction of characters in complex scene images. International Journal of Pattern Recognition and Artifificial Intelligence, 1995, 9(1): 67−82.Google ScholarCross Ref
Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform. In: Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE, 2010.2963−2970.Google ScholarCross Ref
Shi B G, Bai X, Belongie S. Detecting oriented text in natural images by linking segments. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017. 3482−3490.Google ScholarCross Ref
Wang R M, Sang N, Gao C X. Text detection approach based on confifidence map and context information. Neurocomputing, 2015, 157: 153−165.Google ScholarCross Ref
Tian Z, Huang W L, He T, He P, Qiao Y. Detecting text in natural image with connectionist text proposal network. In: Proceedings of the 14th European Conference on Computer Vision. Cham, Switzerland: Springer, 2016. 56−72.Google ScholarCross Ref
Huang W L, Qiao Y, Tang X O. Robust scene text detection with convolution neural network induced MSER trees.In: Proceedings of the 13th European Conference on Computer Vision. Cham, Switzerland: Springer, 2014. 497−511.Google ScholarCross Ref
Liao M H, Shi B G, Bai X, Wang X G, Liu W Y. Textboxes: a fast text detector with a single deep neural network. In: Proceedings of the 31st AAAI Conference on Artifificial Intelligence. San Francisco, CA, USA: AAAI, 2017. 4161−4167Google ScholarCross Ref
Li H, Wang P, Shen C H, Zhang G Y.Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition.. arXiv:1811.00751v2, 2019. 3Google Scholar
Zhu D Q . The Research Progress and Prospects of Artificial Neural Networks[J]. Journal of Southern Yangtze University(Natural Science Edition), 2004, 3(1): 103-110Google Scholar
Tian S,Pan Y,Huang C,et al.Text flow: A unified tet detection system in natural scene images[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:4651-4659Google Scholar
Matas J. Chum O.,Urban M., Pajdla T..Robust wide-baseline stereo from maximally stable extremal regions[J].Image and Vision Computing,2004,22(10):761-767.Google Scholar
Li Y,Lu H.Scene text detection via stroke width[C]//Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).IEEE,2012:681-684Google Scholar
Chen S. Research on Scene Text Detection and Recognition Based on Deep Learning[D].Harbin University of Science and Technology, 2021.DOI:10.27063/d.cnki.ghlgu.2021.000407.Google ScholarCross Ref
Liao M, Shi B, Bai X. TextBoxes++: A single-shot oriented scene text detector. IEEE Trans. on Image Processing, 2018,27(8):3676−3690. [doi: 10.1109/TIP.2018.2825107]Google ScholarCross Ref
Liao M, Zhu Z, Shi B, Xia G, Bai X. Rotation-sensitive regression for oriented scene text detection. arXiv:1803.05265, 2018.Google Scholar
Liu Y,Jin L.Deep matching prior network: Toward tighter multi-oriented text detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:1962-1969Google Scholar
Zhu Y, Du J. TextMountain: Accurate scene text detection via instance segmentation. arXiv:1811.12786, 2018.Google Scholar
Li X, Wang W, Hou W, Liu R, Lu T, Yang J. Shape robust text detection with progressive scale expansion network. arXiv: 1903.12473v2, 2018.Google Scholar
Zhong Z, Sun L, Huo Q. An anchor-free region proposal network for faster R-CNN based text detection approaches. Int'l Journal on Document Analysis and Recognition, 2019,22(3):315−327. [doi: 10.1007/s10032-019-00335-y]Google ScholarDigital Library
Li Y, Yu Y, Li Z, Lin Y, Xu M, Li J, Zhou X. Pixel-anchor: A fast oriented scene text detector with combined networks. arXiv:1811.07432v1, 2018.Google Scholar
Liu C Y, Chen X X,Luo C J, Jin L W, Xue Y and Liu Y L.2021 Deep learning methods for scene text detection and recognition. Journal of Image and Graphics, 26(06): 1330-1367Google Scholar
Wang J X,Wang Z Y,Tian X. Review of natural scene text detection and recognition based on deep learning. Ruan Jian Xue Bao/Journal of Software,2020,31(5):1465-1496(in Chinese).Google Scholar
Liu Y J,Yi X H,Li Y G,Zhang H Y,Liu Y Z. Application of Scene Text Recognition Technology Based on Deep Learning: A Survey. Computer Engineering and Applications,20222,58(4):52-58Google Scholar
Shi B,Bai X,Yao C.An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,39(11):2298-2304.Google Scholar
Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, "Gradient-based learning applied to document recognition," in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998, doi: 10.1109/5.726791.Google ScholarCross Ref
Krizhevsky A , Sutskever I , Hinton G . ImageNet Classification with Deep Convolutional Neural Networks[J]. Advances in neural information processing systems, 2012, 25(2).Google Scholar
Mccraith R, Neumann L, Zisserman A, Vedaldi A. Monocular Depth Estimation with Self-supervised Instance Adaptation. arXiv:2004.05821v1, 2020.4Google Scholar
Szegedy C, Liu W, Jia Y Q, Sermanet P, Rabinovich A. Going Deeper with Convolutions arXiv:1409.4842v1,2014.9Google Scholar
He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition[J]. IEEE, 2016.Google ScholarCross Ref
Tang C.Review and prospect of convolutional neural networks[J]. China New Telecommunications,2022,22(23)Google Scholar
Shi, B.; Bai, X.; and Yao, C. 2017. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11):2298–2304.Google ScholarDigital Library
Yang L, Wu Y X,Wang J L,Liu Y L. Research on recurrent neural network.Journal of Computer Applications.2018,38(S2),1-6+26Google Scholar
Kingma, D., and Ba, J. 2014. Adam: A method for stochastic optimization. In Proc. Int. Conf. Learn. Representations.Google Scholar
liuch37/sar-pytorch: Implementation of Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition published in AAAI 2019 in PyTorch (github.com)Google Scholar

Index Terms

Text Detection and Recognition in Natural Scenes Based on TWO-DIMENSIONAL Attention
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Machine translation

Recommendations

Stroke constrained attention network for online handwritten mathematical expression recognition
Highlights
- A novel stroke constrained attention network for online HMER and online HCCR is proposed.
Abstract
In this paper, we propose a novel stroke constrained attention network (SCAN) which treats stroke as the basic unit for encoder-decoder based online handwritten mathematical expression recognition (HMER). Unlike previous methods which ...
Read More
Scene text recognition based on two-stage attention and multi-branch feature fusion module
Abstract
Text image recognition in natural scenes is challenging in computer vision, even though it is already widely used in real-life applications. With the development of deep learning, the accuracy of scene text recognition has been continuously ...
Read More
Two-Stage Feature Attention Fusion for Radar-Camera 3D Object Detection
ADMIT '23: Proceedings of the 2023 2nd International Conference on Algorithms, Data Mining, and Information Technology

Multi-sensor fusion is essential for 3D object detection in intelligent transportation due to it makes best use of cross-modality information, in which feature-level fusion of millimeter-wave radar and camera has been a hot topic. Existing research ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

VSIP '22: Proceedings of the 2022 4th International Conference on Video, Signal and Image Processing
November 2022
165 pages
ISBN:9781450397810
DOI:10.1145/3577164

Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 April 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
aggregation
encoder-decoder
two-dimention attention
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 21
  Total Downloads
- Downloads (Last 12 months)19
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Text Detection and Recognition in Natural Scenes Based on TWO-DIMENSIONAL Attention

VSIP '22: Proceedings of the 2022 4th International Conference on Video, Signal and Image Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Stroke constrained attention network for online handwritten mathematical expression recognition

Scene text recognition based on two-stage attention and multi-branch feature fusion module

Two-Stage Feature Attention Fusion for Radar-Camera 3D Object Detection

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Text Detection and Recognition in Natural Scenes Based on TWO-DIMENSIONAL Attention

VSIP '22: Proceedings of the 2022 4th International Conference on Video, Signal and Image Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Stroke constrained attention network for online handwritten mathematical expression recognition

Scene text recognition based on two-stage attention and multi-branch feature fusion module

Two-Stage Feature Attention Fusion for Radar-Camera 3D Object Detection

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media