ABSTRACT
In this paper, we improve the natural scene text detection and recognition technology based on 2d attention and encoder-decoder framework. Firstly, the related work of text detection and recognition in different natural view is discussed. Secondly, we work on the basis of encoder-decoder framework and two-dimention module, and improve it through aggregation and hybridisation. Finally, we discussed and analyzed the results,and figured out the possible shortcomings of the model.
- hong Y, Karu K, Jain A K. Locating text in complex color images. Pattern Recognition, 1995, 28(10): 1523−1535.Google ScholarCross Ref
- Lee C M, Kankanhalli A. Automatic extraction of characters in complex scene images. International Journal of Pattern Recognition and Artifificial Intelligence, 1995, 9(1): 67−82.Google ScholarCross Ref
- Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform. In: Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE, 2010.2963−2970.Google ScholarCross Ref
- Shi B G, Bai X, Belongie S. Detecting oriented text in natural images by linking segments. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017. 3482−3490.Google ScholarCross Ref
- Wang R M, Sang N, Gao C X. Text detection approach based on confifidence map and context information. Neurocomputing, 2015, 157: 153−165.Google ScholarCross Ref
- Tian Z, Huang W L, He T, He P, Qiao Y. Detecting text in natural image with connectionist text proposal network. In: Proceedings of the 14th European Conference on Computer Vision. Cham, Switzerland: Springer, 2016. 56−72.Google ScholarCross Ref
- Huang W L, Qiao Y, Tang X O. Robust scene text detection with convolution neural network induced MSER trees.In: Proceedings of the 13th European Conference on Computer Vision. Cham, Switzerland: Springer, 2014. 497−511.Google ScholarCross Ref
- Liao M H, Shi B G, Bai X, Wang X G, Liu W Y. Textboxes: a fast text detector with a single deep neural network. In: Proceedings of the 31st AAAI Conference on Artifificial Intelligence. San Francisco, CA, USA: AAAI, 2017. 4161−4167Google ScholarCross Ref
- Li H, Wang P, Shen C H, Zhang G Y.Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition.. arXiv:1811.00751v2, 2019. 3Google Scholar
- Zhu D Q . The Research Progress and Prospects of Artificial Neural Networks[J]. Journal of Southern Yangtze University(Natural Science Edition), 2004, 3(1): 103-110Google Scholar
- Tian S,Pan Y,Huang C,et al.Text flow: A unified tet detection system in natural scene images[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:4651-4659Google Scholar
- Matas J. Chum O.,Urban M., Pajdla T..Robust wide-baseline stereo from maximally stable extremal regions[J].Image and Vision Computing,2004,22(10):761-767.Google Scholar
- Li Y,Lu H.Scene text detection via stroke width[C]//Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).IEEE,2012:681-684Google Scholar
- Chen S. Research on Scene Text Detection and Recognition Based on Deep Learning[D].Harbin University of Science and Technology, 2021.DOI:10.27063/d.cnki.ghlgu.2021.000407.Google ScholarCross Ref
- Liao M, Shi B, Bai X. TextBoxes++: A single-shot oriented scene text detector. IEEE Trans. on Image Processing, 2018,27(8):3676−3690. [doi: 10.1109/TIP.2018.2825107]Google ScholarCross Ref
- Liao M, Zhu Z, Shi B, Xia G, Bai X. Rotation-sensitive regression for oriented scene text detection. arXiv:1803.05265, 2018.Google Scholar
- Liu Y,Jin L.Deep matching prior network: Toward tighter multi-oriented text detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:1962-1969Google Scholar
- Zhu Y, Du J. TextMountain: Accurate scene text detection via instance segmentation. arXiv:1811.12786, 2018.Google Scholar
- Li X, Wang W, Hou W, Liu R, Lu T, Yang J. Shape robust text detection with progressive scale expansion network. arXiv: 1903.12473v2, 2018.Google Scholar
- Zhong Z, Sun L, Huo Q. An anchor-free region proposal network for faster R-CNN based text detection approaches. Int'l Journal on Document Analysis and Recognition, 2019,22(3):315−327. [doi: 10.1007/s10032-019-00335-y]Google ScholarDigital Library
- Li Y, Yu Y, Li Z, Lin Y, Xu M, Li J, Zhou X. Pixel-anchor: A fast oriented scene text detector with combined networks. arXiv:1811.07432v1, 2018.Google Scholar
- Liu C Y, Chen X X,Luo C J, Jin L W, Xue Y and Liu Y L.2021 Deep learning methods for scene text detection and recognition. Journal of Image and Graphics, 26(06): 1330-1367Google Scholar
- Wang J X,Wang Z Y,Tian X. Review of natural scene text detection and recognition based on deep learning. Ruan Jian Xue Bao/Journal of Software,2020,31(5):1465-1496(in Chinese).Google Scholar
- Liu Y J,Yi X H,Li Y G,Zhang H Y,Liu Y Z. Application of Scene Text Recognition Technology Based on Deep Learning: A Survey. Computer Engineering and Applications,20222,58(4):52-58Google Scholar
- Shi B,Bai X,Yao C.An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,39(11):2298-2304.Google Scholar
- Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, "Gradient-based learning applied to document recognition," in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998, doi: 10.1109/5.726791.Google ScholarCross Ref
- Krizhevsky A , Sutskever I , Hinton G . ImageNet Classification with Deep Convolutional Neural Networks[J]. Advances in neural information processing systems, 2012, 25(2).Google Scholar
- Mccraith R, Neumann L, Zisserman A, Vedaldi A. Monocular Depth Estimation with Self-supervised Instance Adaptation. arXiv:2004.05821v1, 2020.4Google Scholar
- Szegedy C, Liu W, Jia Y Q, Sermanet P, Rabinovich A. Going Deeper with Convolutions arXiv:1409.4842v1,2014.9Google Scholar
- He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition[J]. IEEE, 2016.Google ScholarCross Ref
- Tang C.Review and prospect of convolutional neural networks[J]. China New Telecommunications,2022,22(23)Google Scholar
- Shi, B.; Bai, X.; and Yao, C. 2017. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11):2298–2304.Google ScholarDigital Library
- Yang L, Wu Y X,Wang J L,Liu Y L. Research on recurrent neural network.Journal of Computer Applications.2018,38(S2),1-6+26Google Scholar
- Kingma, D., and Ba, J. 2014. Adam: A method for stochastic optimization. In Proc. Int. Conf. Learn. Representations.Google Scholar
- liuch37/sar-pytorch: Implementation of Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition published in AAAI 2019 in PyTorch (github.com)Google Scholar
Index Terms
- Text Detection and Recognition in Natural Scenes Based on TWO-DIMENSIONAL Attention
Recommendations
Stroke constrained attention network for online handwritten mathematical expression recognition
Highlights- A novel stroke constrained attention network for online HMER and online HCCR is proposed.
AbstractIn this paper, we propose a novel stroke constrained attention network (SCAN) which treats stroke as the basic unit for encoder-decoder based online handwritten mathematical expression recognition (HMER). Unlike previous methods which ...
Scene text recognition based on two-stage attention and multi-branch feature fusion module
AbstractText image recognition in natural scenes is challenging in computer vision, even though it is already widely used in real-life applications. With the development of deep learning, the accuracy of scene text recognition has been continuously ...
Two-Stage Feature Attention Fusion for Radar-Camera 3D Object Detection
ADMIT '23: Proceedings of the 2023 2nd International Conference on Algorithms, Data Mining, and Information TechnologyMulti-sensor fusion is essential for 3D object detection in intelligent transportation due to it makes best use of cross-modality information, in which feature-level fusion of millimeter-wave radar and camera has been a hot topic. Existing research ...
Comments