Abstract
Uyghur text detection is crucial to a variety of real-world applications, while little researches put their attention on it. In this paper, we develop an effective and efficient region-based convolutional neural network for Uyghur text detection in complex background images. The characteristics of the network include: (1) Three region proposal networks are used to improve the recall, which simultaneously utilize feature maps from different convolutional layers. (2) The overall architecture of our network is in the form of fully convolutional network, and global average pooling is applied to replace the fully connected layers in the classification and bounding box regression layers. (3) To fully utilize the baseline information, Uyghur text lines are detected directly by the network in an end-to-end fashion. Experiment results on benchmark dataset show that our method achieves an F-measure of 0.83 and detection time of 0.6 s for each image in a single K20c GPU, which is much faster than the state-of-the-art methods while keeps competitive accuracy.









Similar content being viewed by others
Notes
It is an approximate joint training method due to ignoring the derivative w.r.t the coordinates of proposal, as discussed in [24].
References
Ahmad AMA, Alqutami A, Atoum J (2012) A robust algorithm for arabic video text detection. In: Proceedings of the 2011 2nd international congress on computer applications and computational science. Springer, pp 261–266
Bai J, Chen Z, Feng B, Xu B (2014) Chinese image text recognition on grayscale pixels. In: ICASSP. IEEE, pp 1380–1384
Bai J, Chen Z, Feng B, Xu B (2014) Image character recognition using deep convolutional neural network learned from different languages. In: ICIP. IEEE, pp 2560–2564
Chen J, Song Y, Xie H, Chen X, Deng H, Liu Y (2016) Robust uyghur text localization in complex background images. In: PCM, volume 9917 of lecture notes in computer science. Springer, pp 406–416
Chen Z, Chen Y, Gao X, Wang S, Hu L, Yan C C, Lane N D, Miao C (2015) Unobtrusive sensing incremental social contexts using fuzzy class incremental learning. In: ICDM. IEEE Computer Society, pp 71–80
Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: CVPR. IEEE Computer Society, pp 2963–2970
Girshick RB (2015) Fast R-CNN. In: ICCV. IEEE Computer Society, pp 1440–1448
Girshick RB, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR. IEEE Computer Society, pp 580–587
Halima MB, Karray H, Alimi AM (2010) A comprehensive method for arabic video text detection, localization, extraction and recognition. In: PCM, volume 6298 of lecture notes in computer science. Springer, pp 648–659
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. CoRR, arXiv:1512.03385
He T, Huang W, Qiao Y, Yao J (2016) Text-attentional convolutional neural network for scene text detection. IEEE Trans Image Process 25(6):2529–2541
Huang W, Lin Z, Yang J, Wang J (2013) Text localization in natural images using stroke feature transform and text covariance descriptors. In: ICCV. IEEE Computer Society, pp 1241–1248
Huang W, Qiao Y, Tang X (2014) Robust scene text detection with convolution neural network induced MSER trees. In: ECCV, volume 8692 of lecture notes in computer science. Springer, pp 497–511
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20
Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. In: ECCV, volume 8692 of lecture notes in computer science. Springer, pp 512–528
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick RB, Guadarrama So, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: ACM multimedia. ACM, pp 675–678
Kang L, Li Y, Doermann DS (2014) Orientation robust text line detection in natural images. In: CVPR. IEEE Computer Society, pp 4034–4041
Karatzas D, Shafait F, Uchida S, Iwamura M, Gomez i Bigorda L, Mestre SR, Mas J, Mota DF, Almazán J, de las Heras L-P (2013) ICDAR 2013 robust reading competition. In: ICDAR. IEEE Computer Society, pp 1484–1493
Lin M, Chen Q, Yan S (2013) Network in network. arXiv:1312.4400
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: CVPR. IEEE Computer Society, pp 3431–3440
Moradi M, Mozaffari S, Orouji AA (2010) Farsi/arabic text extraction from video images by corner detection. In: 2010 6th Iranian conference on machine vision and image processing. IEEE, pp 1–6
Neumann L, Matas J (2012) Real-time scene text localization and recognition. In: CVPR. IEEE Computer Society, pp 3538–3545
Neumann L, Matas J (2016) Real-time lexicon-free scene text localization and recognition. IEEE Trans Pattern Anal Mach Intell 38(9):1872–1885
Ren S, He K, Girshick RB, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp 91–99
Saudagar AKJ, Mohammed HV, Iqbal K, Gyani YJ (2015) Efficient arabic text extraction and recognition using thinning and dataset comparison technique. In: 2015 international conference on communication, information & computing technology (ICCICT). IEEE, pp 1–5
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: integrated recognition, localization and detection using convolutional networks. CoRR, arXiv:1312.6229
Shahab A, Shafait F, Dengel A (2011) ICDAR 2011 robust reading competition challenge 2: teading text in scene images. In: ICDAR. IEEE Computer Society, pp 1491–1496
Shivakumara P, Dutta A, Tan CL, Pal U (2014) Multi-oriented scene text detection in video based on wavelet and angle projection boundary growing. Multimed Tools Appl 72(1):515–539
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR, arXiv:1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: CVPR. IEEE Computer Society, pp 1–9
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2015) Rethinking the inception architecture for computer vision. CoRR, arXiv:1512.00567
Tian S, Pan Y, Huang C, Lu S, Yu K, Tan CL (2015) Text flow: a unified text detection system in natural scene images. In: ICCV. IEEE Computer Society, pp 4651–4659
Wang K, Babenko B, Belongie SJ (2011) End-to-end scene text recognition. In: ICCV. IEEE Computer Society, pp 1457–1464
Wang T, Wu DJ, Coates A, Ng AY (2012) End-to-end text recognition with convolutional neural networks. In: ICPR. IEEE Computer Society, pp 3304–3308
Wolf C, Jolion J-M (2006) Object count/area graphs for the evaluation of object detection and segmentation algorithms. IJDAR 8(4):280–296
Xie H, Gao K, Zhang Y, Li J, Ren H (2011) Common visual pattern discovery via graph matching. In: ACM multimedia. ACM, pp 1385–1388
Xie H, Gao K, Zhang Y, Li J, Liu Y (2011) Pairwise weak geometric consistency for large scale image search. In: ICMR. ACM, p 42
Xie H, Zhang Y, Ke G, Tang S, Kefu X, Li G, Li J (2013) Robust common visual pattern discovery using graph matching. J Vis Commun Image Represent 24(5):635–646
Xu Z, Hu C, Lin M (2016) Video structured description technology based intelligence analysis of surveillance videos for public security applications. Multimedia Tools Appl 75(19):12155–12172
Xu Z, Lin M, Hu C, Liu Y (2016) The big data analytics and applications of the surveillance system using video structured description technology. Clust Comput 19(3):1283–1292
Xu Z, Mei L, Liu Y, Hu C, Chen L (2016) Semantic enhanced cloud environment for surveillance data management using video structural description. Computing 98(1–2):35–54
Yan J, Zhu M, Liu H, Liu Y (2010) Visual saliency detection via sparsity pursuit. IEEE Signal Process Lett 17(8):739–742
Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: CVPR. IEEE Computer Society, pp 1083–1090
Ye Q, Doermann DS (2015) Text detection and recognition in imagery: a survey. IEEE Trans Pattern Anal Mach Intell 37(7):1480–1500
Yin X-C, Pei W-Y, Zhang J, Hao H-W (2015) Multi-orientation scene text detection with adaptive clustering. IEEE Trans Pattern Anal Mach Intell 37 (9):1930–1937
Yin X-C, Yin X, Huang K, Hao H-W (2014) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983
Yousfi S, Berrani S-A, Garcia C (2015) ALIF: a dataset for arabic embedded text recognition in TV broadcast. In: ICDAR. IEEE Computer Society, pp 1221–1225
Yuan J, Wei B, Liu Y, Zhang Y, Wang L (2015) A method for text line detection in natural images. Multimed Tools Appl 74(3):859–884
Zayene O, Hennebert J, Touj SM, Ingold R, Amara NEB (2015) A dataset for arabic text detection, tracking and recognition in news videos- activ. In: ICDAR. IEEE Computer Society, pp 996–1000
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: ECCV, volume 8689 of lecture notes in computer science. Springer, pp 818–833
Zhang Z, Shen W, Yao C, Bai X (2015) Symmetry-based text line detection in natural scenes. In: CVPR. IEEE Computer Society, pp 2558–2567
Zhang C, Yan J, Li C, Rui X, Liu L, Bie R (2016) On estimating air pollution from photos using convolutional neural network. In: ACM Multimedia. ACM, pp 297–301
Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. CoRR, arXiv:1604.04018
Acknowledgments
This work is supported by the National Nature Science Foundation of China (61303171,61303175), the “trategic Priority Research Program” of the Chinese Academy of Sciences (XDA06031000).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fang, S., Xie, H., Chen, Z. et al. Detecting Uyghur text in complex background images with convolutional neural network. Multimed Tools Appl 76, 15083–15103 (2017). https://doi.org/10.1007/s11042-017-4538-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-4538-8