skip to main content
10.1145/3522749.3523074acmotherconferencesArticle/Chapter ViewAbstractPublication PagescceaiConference Proceedingsconference-collections
research-article

An efficient scene text detection neural network

Published:13 April 2022Publication History

ABSTRACT

Abstract: We introduce a new type of text detection neural network, which can accurately locate the position of the text in a variety of complex environments and give the best rectangle containing them. It is composed of three parts, the first part is the backbone composed of residual network, which is responsible for refining the feature map. the second part is the sequence module composed of transformer, which processes the feature map as a linear behavioral unit, so as to deeply mine the context of characters in the image, and the last part is the multi-scale detection module, which is based on different sizes of feature maps The best target box is detected as the result. The residual backbone ensures that there will be no gradient explosion in the process of back propagation.as information between grid cells in the same line is consistent, the transformer module pay more attention to the text line. The detection module uses multiple anchors in the vertical direction at the same time, which achieves good results in speed and accuracy. Based on the data set icdar2015, which is commonly used in the field of text detection, we do experiments and achieve a f score of 0.7.

References

  1. Jiří Martínek, Lenc L, Pavel Král. Building an efficient OCR system for historical documents with little training data[J]. Neural Computing and Applications, 2020(3).Google ScholarGoogle Scholar
  2. Epshtein B, Ofek E, Wexler Y . Detecting Text in Natural Scenes with Stroke Width Transform[C]// Computer Vision & Pattern Recognition. IEEE, 2010.Google ScholarGoogle Scholar
  3. Kingma D, Ba J. Adam: A Method for Stochastic Optimization[J]. Computer Science, 2014.Google ScholarGoogle Scholar
  4. Ren S, He K, Girshick R, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(6):1137-1149.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Redmon J, Divvala S, Girshick R, You Only Look Once: Unified, Real-Time Object Detection[J]. 2015.Google ScholarGoogle Scholar
  6. Redmon J, Farhadi A. YOLOv3: An Incremental Improvement[J]. arXiv e-prints, 2018.Google ScholarGoogle Scholar
  7. Tian Z, Huang W, He T, Detecting Text in Natural Image with Connectionist TextGoogle ScholarGoogle Scholar
  8. Proposal Network[C]// European Conference on Computer Vision. Springer, Cham, 2016.Google ScholarGoogle Scholar
  9. He K, Zhang X, Ren S, Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.Google ScholarGoogle Scholar
  10. Karatzas D, Gomez-Bigorda L, Nicolaou A, ICDAR 2015 competition on Robust Reading[C]// International Conference on Document Analysis & Recognition. IEEE Computer Society, 2015.Google ScholarGoogle Scholar
  11. Tian Z, Huang W, He T, Detecting Text in Natural Image with Connectionist Text Proposal Network[J]. 2016.Google ScholarGoogle ScholarCross RefCross Ref
  12. Deep Residual Learning for Image Recognition[C]// IEEE Conference on Computer Vision & Pattern Recognition. IEEE Computer Society, 2016.Google ScholarGoogle Scholar
  13. Ioffe S, Szegedy C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift[J]. JMLR.org, 2015.Google ScholarGoogle Scholar
  14. Vaswani A, Shazeer N, Parmar N, Attention is all you need[J]. arXiv preprint arXiv:1706.03762, 2017.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    CCEAI '22: Proceedings of the 6th International Conference on Control Engineering and Artificial Intelligence
    March 2022
    130 pages
    ISBN:9781450385916
    DOI:10.1145/3522749

    Copyright © 2022 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 13 April 2022

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited
  • Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)0

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format