Skip to main content
Log in

Deep learning approaches to scene text detection: a comprehensive review

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

In recent times, text detection in the wild has significantly raised its ability due to tremendous success of deep learning models. Applications of computer vision have emerged and got reshaped in a new way in this booming era of deep learning. In the last decade, research community has witnessed drastic changes in the area of text detection from natural scene images in terms of approach, coverage and performance due to huge advancement of deep neural network based models. In this paper, we present (1) a comprehensive review of deep learning approaches towards scene text detection, (2) suitable deep frameworks for this task followed by critical analysis, (3) a categorical study of publicly available scene image datasets and applicable standard evaluation protocols with their pros and cons, and (4) comparative results and analysis of reported methods. Moreover, based on this review and analysis, we precisely mention possible future scopes and thrust areas of deep learning approaches towards text detection from natural scene images on which upcoming researchers may focus.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  • Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S (2016) Tensorflow: large-scale machine learning on heterogeneous distributed systems. In: arXiv:1603.04467

  • Ansari GJ, Shah JH, Yasmin M, Sharif M, Fernandes SL (2018) A novel machine learning approach for scene text extraction. Future Gener Comput Syst 87:328–340

    Article  Google Scholar 

  • Baek Y, Lee B, Han D, Yun S, Lee H (2019) Character region awareness for text detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9365–9374

  • Bagri N, Johari PK (2015) A comparative study on feature extraction using texture and shape for content based image retrieval. Int J Adv Sci Technol 80(4):41–52

    Article  Google Scholar 

  • Bai B, Yin F, Liu CL (2013) Scene text localization using gradient local correlation. In: 12th international conference on document analysis and recognition, pp 1380–1384

  • Bastien F, Lamblin P, Pascanu R, Bergstra J, Goodfellow I, Bergeron A, Bouchard N, Warde-Farley D, Bengio Y (2012) Theano: new features and speed improvements. In: arXiv:1211.5590

  • Bernardin K, Stiefelhagen R (2008) Evaluating multiple object tracking performance: the CLEAR MOT metrics. J Image Video Process 1

  • Busta M, Neumann L, Matas J (2017) Deep textspotter: an end-to-end trainable scene text localization and recognition framework. In: Proceedings of the IEEE international conference on computer vision, pp 2204–2212

  • Ch’ng CK, Chan CS (2017) Total-text: a comprehensive dataset for scene text detection and recognition. In: 14th international conference on document analysis and recognition, pp 935–942

  • Ch’ng CK, Chan CS, Liu CL (2019) Total-text: toward orientation robustness in scene text detection. In: International journal on document analysis and recognition, pp 1–22 (In press)

  • Chen X, Yuille AL (2004) Detecting and reading text in natural scenes. In: IEEE conference on computer vision and pattern recognition, vol 2, pp II–II

  • Chen H, Tsai SS, Schroth G, Chen DM, Grzeszczuk R, Girod B (2011) Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: 18th IEEE international conference on image processing, pp 2609–2612

  • Cho H, Sung M, Jun B (2016) Canny text detector: fast and robust scene text localization algorithm. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3566–3573

  • CIFAR-10 Dataset. https://www.cs.toronto.edu/~kriz/cifar.html. Accessed on 14 June 2020

  • Coates A, Carpenter B, Case C, Satheesh S, Suresh B, Wang T, Wu DJ, Ng AY (2011) Text detection and character recognition in scene images with unsupervised feature learning. In: IEEE international conference on document analysis and recognition, pp 440–445

  • da Silveira TL, Kozakevicius AJ, Rodrigues CR (2017) Single-channel EEG sleep stage classification based on a streamlined set of statistical features in wavelet domain. Med Biol Eng Comput 55(2):343–352

    Article  Google Scholar 

  • Dai Y, Huang Z, Gao Y, Xu Y, Chen K, Guo J, Qiu W (2018) Fused text segmentation networks for multi-oriented scene text detection. In: 24th international conference on pattern recognition, pp 3604–3609

  • Deng D, Liu H, Li X, Cai D (2018) Pixellink: detecting scene text via instance segmentation. In: 32th international conference of atrificial intelligence AAAI, pp 6773–6780

  • Dey S, Shivakumara P, Raghunandan KS, Pal U, Lu T, Kumar GH, Chan CS (2017) Script independent approach for multi-oriented text detection in scene image. Neurocomputing 242:96–112

    Article  Google Scholar 

  • Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: IEEE computer society conference on computer vision and pattern recognition, pp 2963–2970

  • Everingham M, Eslami SA, Van Gool L, Williams CK, Winn J, Zisserman A (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136

    Article  Google Scholar 

  • Fathi A, Wojna Z, Rathod V, Wang P, Song HO, Guadarrama S, Murphy KP (2017) Semantic instance segmentation via deep metric learning. In: arXiv:1703.10277

  • Feng W, He W, Yin F, Zhang XY, Liu CL (2019) TextDragon: an end-to-end framework for arbitrary shaped text spotting. In: Proceedings of the IEEE international conference on computer vision, pp 9076–9085

  • Fogel I, Sagi D (1989) Gabor filters as texture discriminator. Biol Cybern 61(2):103–113

    Article  Google Scholar 

  • Francis LM, Sreenath N (2017) TEDLESS–Text detection using least-square SVM from natural scene. J King Saud Univ Comput Inf Sci 29(4)

  • Fu CY, Liu W, Ranga A, Tyagi A, Berg AC (2017) DSSD: deconvolutional single shot detector. In: arXiv:1701.06659

  • Gao J, Wang Q, Yuan Y (2019) Convolutional regression network for multi-oriented text detection. IEEE Access 7:96424–96433

    Article  Google Scholar 

  • Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  • Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  • Gllavata J, Ewerth R, Freisleben B (2004) Text detection in images based on unsupervised classification of high-frequency wavelet coefficients. In: 17th international conference on pattern recognition, vol 1, pp 425–428

  • Google Street View. http://maps.google.com

  • Greenhalgh J, Mirmehdi M (2012) Real-time detection and recognition of road traffic signs. IEEE Trans Intell Transp Syst 13(4):1498–1506

    Article  Google Scholar 

  • Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: IEEE conference on computer vision and pattern recognition, pp 2315–2324

  • He T, Huang W, Qiao Y, Yao J (2016a) Text-attentional convolutional neural network for scene text detection. IEEE Trans Image Process 25(6):2529–2541

    Article  MathSciNet  MATH  Google Scholar 

  • He K, Zhang X, Ren S, Sun J (2016b) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  • He D, Yang X, Liang C, Zhou Z, Ororbi AG, Kifer D, Lee Giles C (2017a) Multi-scale FCN with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild. In: IEEE conference on computer vision and pattern recognition, pp 3519–3528

  • He P, Huang W, He T, Zhu Q, Qiao Y, Li X (2017b) Single shot text detector with regional attention. In: IEEE international conference on computer vision, pp 3047–3055

  • He W, Zhang XY, Yin F, Liu CL (2017c) Deep direct regression for multi-oriented scene text detection. In: IEEE international conference on computer vision, pp 745–753

  • He K, Gkioxari G, Dollár P, Girshick R (2017d) Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

  • He T, Tian Z, Huang W, Shen C, Qiao Y, Sun C (2018a) An end-to-end textspotter with explicit alignment and attention. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5020–5029

  • He W, Zhang XY, Yin F, Liu CL (2018b) Multi-oriented and multi-lingual scene text detection with direct regression. IEEE Trans Image Process 27(11):5406–5419

    Article  MathSciNet  Google Scholar 

  • He W, Zhang XY, Yin F, Luo Z, Ogier JM, Liu CL (2020) Realtime multi-scale scene text detection with scale-based region proposal network. Pattern Recognit 98:107026

    Article  Google Scholar 

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  • Huang X (2019) Automatic video scene text detection based on saliency edge map. Multimed Tools Appl 78(24):34819–34838

    Article  Google Scholar 

  • Huang W, Lin Z, Yang J, Wang J (2013) Text localization in natural images using stroke feature transform and text covariance descriptors. In: IEEE international conference on computer vision, pp 1241–1248

  • Huang W, Qiao Y, Tang X (2014) Robust scene text detection with convolution neural network induced mser trees. In: European conference on computer vision, pp 497–511

  • Huang L, Yang Y, Deng Y, Yu Y (2015) Densebox: unifying landmark localization with end to end object detection. In: arXiv:1509.04874

  • Huang Z, Zhong Z, Sun L, Huo Q (2019) Mask R-CNN with pyramid attention network for scene text detection. In: 2019 IEEE winter conference on applications of computer vision, pp 764–772

  • Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20

    Article  MathSciNet  Google Scholar 

  • Jeon M, Jeong YS (2020) Compact and accurate scene text detector. Appl Sci 10(6):2096

    Article  Google Scholar 

  • Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: 22nd international conference on multimedia, pp 675–678

  • Jiang Y, Zhu X, Wang X, Yang S, Li W, Wang H, Fu P, Luo Z (2017) R2CNN: rotational region CNN for orientation robust scene text detection. In: arXiv:1706.09579

  • Jiang M, Cheng J, Chen M, Ku X (2018) An improved text localization method for natural scene images. J Phys 960(1):012027

    Google Scholar 

  • Jiao L, Zhang F, Liu F, Yang S, Li L, Feng Z, Qu R (2019) A survey of deep learning-based object detection. IEEE Access 7:128837–128868

    Article  Google Scholar 

  • Joan SF, Valli S (2019) A survey on text information extraction from born-digital and scene text images. Proc Natl Acad Sci India Sect A 89(1):77–101

    Article  Google Scholar 

  • Karatzas D, Shafait F, Uchida S, Iwamura M, Bigorda LG, Mestre SR, Mas J, Mota DF, Almazan JA, De Las Heras LP (2011) ICDAR 2011 robust reading competition. In: 12th international conference on document analysis and recognition, pp 1484–1493

  • Karatzas D, Shafait F, Uchida S, Iwamura M, Bigorda LG, Mestre SR, Mas J, Mota DF, Almazan JA, De Las Heras LP (2013) ICDAR 2013 robust reading competition. In: 12th international conference on document analysis and recognition, pp 1484–1493

  • Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar VR, Lu S, Shafait F (2015) ICDAR 2015 competition on robust reading. In: 13th international conference on document analysis and recognition, pp 1156–1160

  • Kasturi R, Goldgof D, Soundararajan P, Manohar V, Garofolo J, Bowers R, Boonstra M, Korzhova V, Zhang J (2008) Framework for performance evaluation of face, text, and vehicle detection and tracking in video: data, metrics, and protocol. IEEE Trans Pattern Anal Mach Intell 31(2):319–336

    Article  Google Scholar 

  • Ketkar N (2017) Introduction to keras. In: Deep learning with python, pp 97–111

  • Khan T, Mollah AF (2019a) Distance transform-based stroke feature descriptor for text non-text classification. In: Recent developments in machine learning and data analytics, pp 189–200

  • Khan T, Mollah AF (2019b) AUTNT-A component level dataset for text non-text classification and benchmarking with novel script invariant feature descriptors and D-CNN. Multimed Tools Appl 78(22):32159–32186

    Article  Google Scholar 

  • Khan FA, Tahir MA, Khelifi F, Bouridane A, Almotaeryi R (2017) Robust off-line text independent writer identification using bagged discrete cosine transform features. Expert Syst Appl 71:404–415

    Article  Google Scholar 

  • Kim KH, Hong S, Roh B, Cheon Y, Park M (2016) Pvanet: deep but lightweight neural networks for real-time object detection. In: arXiv:1608.08021

  • Kobchaisawat T, Chalidabhongse TH, Satoh SI (2020) Scene text detection with polygon offsetting and border augmentation. Electronics 9(1):117

    Article  Google Scholar 

  • Kong S, Fowlkes CC (2018) Recurrent pixel embedding for instance grouping. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9018–9028

  • Koo HI, Kim DH (2013) Scene text detection via connected component clustering and nontext filtering. IEEE Trans Image Process 22(6):2296–2305

    Article  MathSciNet  MATH  Google Scholar 

  • Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  • LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. IEEE 86(11):2278–2324

    Article  Google Scholar 

  • Lee S, Cho MS, Jung K, Kim JH (2010) Scene text extraction with edge constraint and text collinearity. In: 20th international conference on pattern recognition, pp 3983–3986

  • Lee JJ, Lee PH, Lee SW, Yuille A, Koch C (2011a) Adaboost for text detection in natural scene. In: 2011 International conference on document analysis and recognition, pp 429–434

  • Lee JJ, Lee PH, Lee SW, Yuille A, Koch C (2011b) Adaboost for text detection in natural scene. In: International conference on document analysis and recognition, pp 429–434

  • Lee CY, Baek Y, Lee H (2019) TedEval: a fair evaluation metric for scene text detectors. In: arXiv:1907.01227

  • Leibe B, Matas J, Sebe N, Welling M (eds) (2016) Computer vision—ECCV 2016. In: 14th European conference, vol 9908

  • Li Y, Lu H (2012) Scene text detection via stroke width. In: 21st international conference on pattern recognition, pp 681–684

  • Li H, Wang P, Shen C (2017) Towards end-to-end text spotting with convolutional recurrent neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 5238–5246

  • Li X, Wang W, Hou W, Liu RZ, Lu T, Yang J (2018) Shape robust text detection with progressive scale expansion network. In: arXiv:1806.02559

  • Liang J, Phillips IT, Haralick RM (1997) Performance evaluation of document layout analysis algorithms on the UW data set. Int Soc Opt Photonics Doc Recognit 3027:149–160

    Google Scholar 

  • Liang G, Shivakumara P, Lu T, Tan CL (2015) A new wavelet-Laplacian method for arbitrarily-oriented character segmentation in video text lines. In: 13th international conference on document analysis and recognition, pp 926–930

  • Liao M, Shi B, Bai X, Wang X, Liu W (2017) TextBoxes: a fast text detector with a single deep neural network. In: International conference of AAAI, pp 4161–4167

  • Liao M, Shi B, Bai X (2018a) Textboxes++: a single-shot oriented scene text detector. IEEE Trans Image Process 27(8):3676–3690

    Article  MathSciNet  MATH  Google Scholar 

  • Liao M, Zhu Z, Shi B, Xia GS, Bai X (2018b) Rotation-sensitive regression for oriented scene text detection. In: IEEE conference on computer vision and pattern recognition, pp 5909–5918

  • Liao M, Lyu P, He M, Yao C, Wu W, Bai X (2019a) Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: IEEE transactions on pattern analysis and machine intelligence. https://doi.org/10.1109/tpami.2019.2937086

  • Liao M, Wan Z, Yao C, Chen K, Bai X (2019b) Real-time scene text detection with differentiable binarization. In: arXiv:1911.08947

  • Liao M, Song B, Long S, He M, Yao C, Bai X (2020) SynthText3D: synthesizing scene text images from 3D virtual worlds. Sci China Inf Sci 63(2):120105

    Article  Google Scholar 

  • Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, pp 740–755

  • Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: IEEE conference on computer vision and pattern recognition, pp 2117–2125

  • Lin H, Yang P, Zhang F (2019) Review of scene text detection and recognition. In: Archives of computational methods in engineering, pp 1–22

  • Liu Y, Jin L (2017) Deep matching prior network: toward tighter multi-oriented text detection. In: IEEE international conference on computer vision and pattern recognition, pp 3454–3461

  • Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016a) SSD: single shot multibox detector. In: European conference on computer vision, pp 21–37

  • Liu L, Lao S, Fieguth PW, Guo Y, Wang X, Pietikäinen M (2016b) Median robust extended local binary pattern for texture classification. IEEE Trans Image Process 25(3):1368–1381

    Article  MathSciNet  MATH  Google Scholar 

  • Liu L, Fieguth P, Guo Y, Wang X, Pietikäinen M (2017) Local binary features for texture classification: taxonomy and experimental study. Pattern Recognit 62:135–160

    Article  Google Scholar 

  • Liu Z, Lin G, Yang S, Feng J, Lin W, Goh WL (2018a) Learning markov clustering networks for scene text detection. In: IEEE international conference of computer vision and pattern recognition, pp 6936–6944

  • Liu S, Qi L, Qin H, Shi J, Jia J (2018b) Path aggregation network for instance segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759–8768

  • Liu X, Liang D, Yan S, Chen D, Qiao Y, Yan J (2018c) FOTS: fast oriented text spotting with a unified network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5676–5685

  • Liu Y, Jin L, Zhang S, Luo C, Zhang S (2019a) Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognit 90:337–345

    Article  Google Scholar 

  • Liu Y, Jin L, Xie Z, Luo C, Zhang S, Xie L (2019b) Tightness-aware evaluation protocol for scene text detection. In: IEEE Conference on computer vision and pattern recognition, pp 9612–9620

  • Liu F, Chen C, Gu D, Zheng J (2019c) FTPN: scene text detection with feature pyramid based text proposal network. IEEE Access 7:44219–44228

    Article  Google Scholar 

  • Liu X, Meng G, Pan C (2019d) Scene text detection and recognition with advances in deep learning: a survey. Int J Doc Anal Recognit 22(2):143–162

    Article  Google Scholar 

  • Liu Z, Lin G, Yang S, Liu F, Lin W, Goh WL (2019e) Towards robust curve text detection with conditional spatial expansion. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7269–7278

  • Liu Y, Zhang S, Jin L, Xie L, Wu Y, Wang Z (2019f) Omnidirectional scene text detection with sequential-free box discretization. In: arXiv:1906.02371

  • Liu X, Zhang R, Zhou Y, Jiang Q, Song Q, Li N, Zhou K, Wang L, Wang D, Liao M, Yang M (2019g) ICDAR 2019 robust reading challenge on reading chinese text on signboard. In: arXiv:1912.09641

  • Liu J, Liu X, Sheng J, Liang D, Li X, Liu Q (2019h) Pyramid mask text detector. In: arXiv:1903.11800

  • Liu H, Guo A, Jiang D, Hu Y, Ren B (2020a) PuzzleNet: scene text detection by segment context graph learning. In: arXiv:2002.11371

  • Liu Y, Chen H, Shen C, He T, Jin L, Wang L (2020b) ABCNet: real-time scene text spotting with adaptive bezier-curve network. In: arXiv:2002.10200

  • Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: IEEE international conference on computer vision and pattern recognition, pp 3431–3440

  • Long S, Ruan J, Zhang W, He X, Wu W, Yao C (2018a) TextSnake: a flexible representation for detecting text of arbitrary shapes. In: European conference on computer vision, pp 20–36

  • Long S, He X, Ya C (2018b) Scene text detection and recognition: the deep learning era. In: arXiv:1811.04256

  • Lu S, Chen T, Tian S, Lim JH, Tan CL (2015) Scene text extraction based on edges and support vector regression. Int J Doc Anal Recognit 18(2):125–135

    Article  Google Scholar 

  • Lucas SM (2005) ICDAR 2005 text locating competition results. In: 8th international conference on document analysis and recognition, pp 80–84

  • Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) ICDAR 2003 robust reading competitions. In: 7th international conference on document analysis and recognition, pp 682–687

  • Lyu P, Yao C, Wu W, Yan S, Bai X (2018a) Multi-oriented scene text detection via corner localization and region segmentation. In: IEEE conference on computer vision and pattern recognition, pp 7553–7563

  • Lyu P, Liao M, Yao C, Wu W, Bai X (2018b) Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the European conference on computer vision, pp 67–83

  • Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed 20(11):3111–3122

    Article  Google Scholar 

  • Ma C, Sun L, Zhong Z, Huo Q (2020) ReLaText: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. In: arXiv:2003.06999

  • Maitra DS, Bhattacharya U, Parui SK (2015) CNN based common approach to handwritten character recognition of multiple scripts. In: 13th international conference on document analysis and recognition, pp 1021–1025

  • Majhi B, Pujari P (2018) On development and performance evaluation of novel odia handwritten digit recognition methods. Arab J Sci Eng 43(8):3887–3901

    Article  Google Scholar 

  • Mallat SG (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell 7:674–693

    Article  MATH  Google Scholar 

  • Manjusha K, Kumar MA, Soman KP (2018) Reduced scattering representation for Malayalam character recognition. Arab J Sci Eng 43(8):4315–4326

    Article  Google Scholar 

  • Mishra A, Alahari K, Jawahar CV (2012) Scene text recognition using higher order language priors. In: HAL

  • Mitchell T (1999) The 20 newsgroups text dataset

  • Mollah AF, Basu S, Nasipuri M (2012) Text detection from camera captured images using a novel fuzzy-based technique. In: 3rd international conference on emerging applications of information technology, pp 291–294

  • Mosleh A, Bouguila N, Hamza AB (2012) Image text detection using a bandlet-based edge detector and stroke width transform. In: British machine vision conference, pp 1–12

  • Nayef N, Yin F, Bizid I, Choi H, Feng Y, Karatzas D, Luo Z, Pal U, Rigaud C, Chazalon J, Khlif W (2017) ICDAR 2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: 14th IAPR international conference on document analysis and recognition, pp 1454–1459

  • Nayef N, Patel Y, Busta M, Chowdhury PN, Karatzas D, Khlif W, Matas J, Pal U, Burie JC, Liu CL, Ogier JM (2019) ICDAR 2019 robust reading challenge on multi-lingual scene text detection and recognition–RRC-MLT-2019. In: IAPR international conference of document analysis and recognition

  • Neumann L, Matas J (2010) A method for text localization and recognition in real-world images. In: Asian conference on computer vision, pp 770–783

  • Neumann L, Matas J (2012) Real-time scene text localization and recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3538–3545

  • Neycharan JG, Ahmadyfard A (2018) Edge color transform: a new operator for natural scene text localization. Multimed Tools Appl 77(6):7615–7636

    Article  Google Scholar 

  • Niconico. http://www.nicovideo.jp

  • Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE international conference on computer vision, pp 1520–1528

  • Ojala T, Pietikäinen M, Harwood D (1996) A comparative study of texture measures with classification based on featured distributions. Pattern Recognit 29(1):51–59

    Article  Google Scholar 

  • Pan YF, Hou X, Liu CL (2010) A hybrid approach to detect and localize texts in natural scene images. IEEE Trans Image Process 20(3):800–813

    MathSciNet  MATH  Google Scholar 

  • Paul S, Saha S, Basu S, Saha PK, Nasipuri M (2019) Text localization in camera captured images using fuzzy distance transform based adaptive stroke filter. Multimed Tools Appl 78(13):18017–18036

    Article  Google Scholar 

  • Qiao L, Tang S, Cheng Z, Xu Y, Niu Y, Pu S, Wu F (2020) Text perceptron: towards end-to-end arbitrary-shaped text spotting. In: arXiv:2002.06820

  • Qin S, Manduchi R (2017) Cascaded segmentation-detection networks for word-level text spotting. In: 14th international conference on document analysis and recognition, pp 1275–1282

  • Qin H, Zhang H, Wang H, Yan Y, Zhang M, Zhao W (2019a) An algorithm for scene text detection using multibox and semantic segmentation. Appl Sci 9(6):1054

    Article  Google Scholar 

  • Qin S, Bissacco A, Raptis M, Fujii Y, Xiao Y (2019b) Towards unconstrained end-to-end text spotting. In: Proceedings of the IEEE international conference on computer vision, pp 4704–4714

  • Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: IEEE conference on computer vision and pattern recognition, pp 779–788

  • Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  • Richardson E, Azar Y, Avioz O, Geron N, Ronen T, Avraham Z, Shapiro S (2019) It’s all about the scale–efficient text detection using adaptive scaling. In: arXiv:1907.12122

  • Risnumawan A, Shivakumara P, Chan CS, Tan CL (2014) A robust arbitrary text detection system for natural scene images. Expert Syst Appl 41(18):8027–8048

    Article  Google Scholar 

  • Saha S, Chakraborty N, Kundu S, Paul S, Mollah AF, Basu S, Sarkar R (2020) Multi-lingual scene text detection and language identification. Pattern Recognit Lett 138:16–22

    Article  Google Scholar 

  • Sain A, Bhunia AK, Roy PP, Pal U (2018) Multi-oriented text detection and verification in video frames and scene images. Neurocomputing 275:1531–1549

    Article  Google Scholar 

  • Sherstinsky A (2018) Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. In: arXiv:1808.03314

  • Shi C, Wang C, Xiao B, Zhang Y, Gao S (2013) Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recognit Lett 34(2):107–116

    Article  Google Scholar 

  • Shi B, Bai X, Belongie S (2017a) Detecting oriented text in natural images by linking segments. In: IEEE international conference on computer vision and pattern recognition, pp 2550–2558

  • Shi B, Yao C, Liao M, Yang M, Xu P, Cui L, Belongie S, Lu S, Bai X (2017b) ICDAR 2017 competition on reading chinese text in the wild (rctw-17). In: 14th IAPR international conference on document analysis and recognition, pp 1429–1434

  • Shivakumara P, Phan TQ, Tan CL (2010) A Laplacian approach to multi-oriented text detection in video. IEEE Trans Pattern Anal Mach Intell 33(2):412–419

    Article  Google Scholar 

  • Shivakumara P, Roy S, Jalab HA, Ibrahim RW, Pal U, Lu T, Khare V, Wahab AWBA (2019) Fractional means based method for multi-oriented keyword spotting in video/scene/license plate images. Expert Syst Appl 118:1–19

    Article  Google Scholar 

  • Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. In: arXiv:1409.1556

  • Song X, Wu Y, Wang W, Lu T (2020) TK-text: multi-shaped scene text detection via instance segmentation. In: Proceedings of the international conference on multimedia modeling, pp 201–213

  • Sun Y, Zhang C, Huang Z, Liu J, Han J, Ding E (2018) Textnet: irregular text reading from images with an end-to-end trainable network. In: Proceedings of the Asian conference on computer vision, pp 83–99

  • Sun Y, Liu J, Liu W, Han J, Ding E, Liu J (2019) Chinese street view text: large-scale Chinese text reading with partially supervised learning. In: Proceedings of the IEEE international conference on computer vision, pp 9086–9095

  • Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  • Tang Y, Wu X (2017) Scene text detection and segmentation based on cascaded convolution neural networks. IEEE Trans Image Process 26(3):1509–1520

    Article  Google Scholar 

  • Tang Y, Wu X (2018) Scene text detection using superpixel-based stroke feature transform and deep learning based region classification. IEEE Trans Multimed 20(9):2276–2288

    Article  Google Scholar 

  • Tang J, Yang Z, Wang Y, Zheng Q, Xu Y, Bai X (2019) SegLink++: detecting dense and arbitrary-shaped scene text by instance-aware component grouping. In: Pattern recognition, vol 96, pp 106954

  • Tian Z, Huang W, He T, He P, Qiao Y (2016a) Detecting text in natural image with connectionist text proposal network. In: European conference on computer vision, pp 56–72

  • Tian S, Bhattacharya U, Lu S, Su B, Wang Q, Wei X, Lu Y, Tan CL (2016b) Multilingual scene character recognition with co-occurrence of histogram of oriented gradients. Pattern Recognit 51:125–134

    Article  Google Scholar 

  • Tian Z, Shu M, Lyu P, Li R, Zhou C, Shen X, Jia J (2019) Learning shape-aware embedding for scene text detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4234–4243

  • Tychsen-Smith L, Petersson L (2017) Denet: scalable real-time object detection with directed sparse sampling. In: IEEE international conference of computer vision, pp 428–436

  • Van Dongen SM (2000) Graph clustering by flow simulation (Doctoral dissertation)

  • Veit A, Matera T, Neumann L, Matas J, Belongie S (2016) Coco-text: Dataset and benchmark for text detection and recognition in natural images. In: arXiv:1601.07140

  • Wang K, Belongie S (2010) Word spotting in the wild. In: European conference on computer vision, pp 591–604

  • Wang K, Babenko B, Belongie S (2011) End-to-end scene text recognition. In: IEEE international conference on computer vision, pp 1457–1464

  • Wang T, Wu DJ, Coates A, Ng AY (2012) End-to-end text recognition with convolutional neural networks. In: 21st international conference on pattern recognition, pp 3304–3308

  • Wang X, Chen K, Huang Z, Yao C, Liu W (2017) Point linking network for object detection. In: arXiv:1706.03646

  • Wang K, Li G, Liu X, Yan J, Li S, Huang H (2018) Natural scene text detection based on MSER. In: 3rd international conference on communications, information management and network security

  • Wang X, Feng X, Xia Z (2019a) Scene video text tracking based on hybrid deep text detection and layout constraint. Neurocomputing 363:223–235

    Article  Google Scholar 

  • Wang W, Xie E, Song X, Zang Y, Wang W, Lu T, Yu G, Shen C (2019b) Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE international conference on computer vision, pp 8440–8449

  • Wang P, Zhang C, Qi F, Huang Z, En M, Han J, Liu J, Ding E, Shi G (2019c) A single-shot arbitrarily-shaped text detector based on context attended multi-task learning. In: Proceedings of the 27th ACM international conference on multimedia, pp 1277–1285

  • Wang X, Jiang Y, Luo Z, Liu CL, Choi H, Kim S (2019d) Arbitrary shape scene text detection with adaptive text region representation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6449–6458

  • Wang Y, Xie H, Fu Z, Zhang Y (2019e) DSRN: a deep scale relationship network for scene text detection. In: Proceedings of the 28th international joint conference on artificial intelligence. AAAI Press, pp 947–953

  • Wang H, Lu P, Zhang H, Yang M, Bai X, Xu Y, He M, Wang Y, Liu W (2019f) All you need is boundary: toward arbitrary-shaped text spotting. In: arXiv:1911.09550

  • Wang S, Liu Y, He Z, Wang Y, Tang Z (2020a) A quadrilateral scene text detector with two-stage network architecture. Pattern Recognit 102:107230

    Article  Google Scholar 

  • Wang Y, Xie H, Zha Z, Xing M, Fu Z, Zhang Y (2020b) ContourNet: taking a further step toward accurate arbitrary-shaped scene text detection. In: arXiv:2004.04940

  • Welcome to Lasagne. https://lasagne.readthedocs.io/en/latest/

  • Which GPU(s) to get for deep learning: my experience and advice for using GPUs in deep learning, https://timdettmers.com/2019/04/03/which-gpu-for-deep-learning/. Accessed on 3 June 2020

  • Wolf C, Jolion JM (2006) Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int J Doc Anal Recognit 8(4):280–296

    Article  Google Scholar 

  • Wu Y, Natarajan P (2017) Self-organized text detection with minimal post-processing via border learning. In: IEEE international conference of computer vision, pp 5000–5009

  • Xie E, Zang Y, Shao S, Yu G, Yao C, Li G (2019) Scene text detection with supervised pyramid context network. In: Proceedings of the AAAI conference on artificial intelligence, pp 9038–9045

  • Xu Y, Wang Y, Zhou W, Wang Y, Yang Z, Bai X (2019a) TextField: learning a deep direction field for irregular scene text detection. IEEE Trans Image Process 28(11):5566–5579

    Article  MathSciNet  MATH  Google Scholar 

  • Xu Y, Duan J, Kuang Z, Yue X, Sun H, Guan Y, Zhang W (2019b) Geometry normalization networks for accurate scene text detection. In: arXiv:1909.00794

  • Xue C, Lu S, Zhang W (2019) MSR: multi-scale shape regression for scene text detection. In: arXiv:1901.02596

  • Yang Q, Cheng M, Zhou W, Chen Y, Qiu M, Lin W, Chu W (2018) Inceptext: a new inception-text module with deformable psroi pooling for multi-oriented scene text detection. In: arXiv:1805.01167

  • Yang P, Zhang F, Yang G (2019) A fast scene text detector using knowledge distillation. IEEE Access 7:22588–22598

    Article  Google Scholar 

  • Yang P, Yang G, Gong X, Wu P, Han X, Wu J, Chen C (2020) Instance segmentation network with self-distillation for scene text detection. IEEE Access 8:45825–45836

    Article  Google Scholar 

  • Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: IEEE conference on computer vision and pattern recognition, pp 1083–1090

  • Yao C, Bai X, Sang N, Zhou X, Zhou S, Cao Z (2016) Scene text detection via holistic, multi-channel prediction. In: arXiv:1606.09002

  • Yi C, Tian Y (2011) Text string detection from natural scenes by structure-based partition and grouping. IEEE Trans Image Process 20(9):2594–2605

    Article  MathSciNet  MATH  Google Scholar 

  • Yi C, Tian Y (2012) Localizing text in scene images by boundary clustering, stroke segmentation, and string fragment classification. IEEE Trans Image Process 21(9):4256–4268

    Article  MathSciNet  MATH  Google Scholar 

  • Zamberletti A, Noce L, Gallo I (2014) Text localization based on fast feature pyramids and multi-resolution maximally stable extremal regions. In: Asian conference on computer vision, pp 91–105

  • Zeiler MD, Taylor GW, Fergus R (2011) Adaptive deconvolutional networks for mid and high level feature learning. In: 2011 International conference on computer vision, pp 2018–2025

  • Zhan F, Lu S, Xue C (2018) Verisimilar image synthesis for accurate detection and recognition of texts in scenes. In: Proceedings of the European conference on computer vision, pp 249–266

  • Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. In: IEEE international conference on computer vision and pattern recognition, pp 4159–4167

  • Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: IEEE conference on computer vision and pattern recognition, pp 4203–4212

  • Zhang C, Liang B, Huang Z, En M, Han J, Ding E, Ding X (2019) Look more than once: an accurate detector for text of arbitrary shapes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 10552–10561

  • Zhong Z, Jin L, Zhang S, Feng Z (2016) Deeptext: a unified framework for text proposal generation and text detection in natural images. arXiv:1605.07314

  • Zhong Z, Sun L, Huo Q (2019a) An anchor-free region proposal network for Faster R-CNN based text detection approaches. Int J Doc Anal Recognit 22(3):315–327

    Article  Google Scholar 

  • Zhong Z, Sun L, Huo Q (2019b) Improved localization accuracy by LocNet for faster R-CNN based text detection in natural scene images. In: Pattern recognition, p 106986

  • Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) EAST: an efficient and accurate scene text detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5551–5560

  • Zhu Y, Yao C, Bai X (2016) Scene text detection and recognition: recent advances and future trends. Front Comput Sci 10(1):19–36

    Article  Google Scholar 

  • Zhu Y, Ma C, Du J (2019) Rotated cascade R-CNN: a shape robust detector with coordinate regression. In: Pattern recognition, vol 96

Download references

Acknowledgements

Authors are grateful to Department of Computer Science and Engineering, Aliah University for providing necessary support to carry out this work. Tauseef Khan is further grateful to University Grant Commission (UGC), Govt. of India for granting financial support under the scheme of Maulana Azad National Fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ayatullah Faruk Mollah.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khan, T., Sarkar, R. & Mollah, A.F. Deep learning approaches to scene text detection: a comprehensive review. Artif Intell Rev 54, 3239–3298 (2021). https://doi.org/10.1007/s10462-020-09930-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-020-09930-6

Keywords

Navigation