ABSTRACT
Scene text detection has been studied for a long time and lots of approaches have achieved promising performances. Most approaches regard text as a specific object and utilize the popular frameworks of object detection to detect scene text. However, scene text is different from general objects in terms of orientations, sizes and aspect ratios. In this paper, we present an end-to-end multi-oriented scene text detection approach, which combines the object detection framework with the position-sensitive segmentation. For a given image, features are extracted through a fully convolutional network. Then they are input into text detection branch and position-sensitive segmentation branch simultaneously, where text detection branch is used for generating candidates and position-sensitive segmentation branch is used for generating segmentation maps. Finally the candidates generated by text detection branch are projected onto the position-sensitive segmentation maps for filtering. The proposed approach utilizes the merits of position-sensitive segmentation to improve the expressiveness of the proposed network. Additionally, the approach uses position-sensitive segmentation maps to further filter the candidates so as to highly improve the precision rate. Experiments on datasets ICDAR2015 and COCO-Text demonstrate that the proposed method outperforms previous state-of-the-art methods. For ICDAR2015 dataset, the proposed method achieves an F-score of 0.83 and a precision rate of 0.87.
- Jifeng Dai, Kaiming He, Yi Li, Shaoqing Ren, and Jian Sun. 2016. Instance-sensitive fully convolutional networks. In European Conference on Computer Vision. Springer, 534--549.Google ScholarCross Ref
- Jia Deng, Wei Dong, Richard Socher, Li Jia Li, Kai Li, and Fei Fei Li. 2009. ImageNet: A large-scale hierarchical image database Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. 248--255.Google Scholar
- Boris Epshtein, Eyal Ofek, and Yonatan Wexler. 2010. Detecting text in natural scenes with stroke width transform Computer Vision and Pattern Recognition. 2963--2970.Google Scholar
- Ankush Gupta, Andrea Vedaldi, and Andrew Zisserman. 2016. Synthetic Data for Text Localisation in Natural Images Computer Vision and Pattern Recognition. 2315--2324.Google Scholar
- Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017 a. Mask r-cnn Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 2980--2988.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
- Pan He, Weilin Huang, Tong He, Qile Zhu, Yu Qiao, and Xiaolin Li. 2017 b. Single shot text detector with regional attention. In The IEEE International Conference on Computer Vision (ICCV).Google ScholarCross Ref
- Wenhao He, Xu-Yao Zhang, Fei Yin, and Cheng-Lin Liu. 2017 c. Deep direct regression for multi-oriented scene text detection. arXiv preprint arXiv:1703.08289 (2017).Google Scholar
- Lichao Huang, Yi Yang, Yafeng Deng, and Yinan Yu. 2015. DenseBox: Unifying Landmark Localization with End to End Object Detection. Computer Science (2015).Google Scholar
- Weilin Huang, Yu Qiao, and Xiaoou Tang. 2014. Robust scene text detection with convolution neural network induced mser trees European Conference on Computer Vision. Springer, 497--511.Google Scholar
- Max Jaderberg, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014 a. Reading Text in the Wild with Convolutional Neural Networks. International Journal of Computer Vision Vol. 116, 1 (2014), 1--20. Google ScholarDigital Library
- Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. 2014 b. Deep Features for Text Spotting. In European Conference on Computer Vision. 512--528.Google Scholar
- Yuning Jiang, Yuning Jiang, Zhimin Cao, Zhimin Cao, and Thomas Huang. 2016. UnitBox: An Advanced Object Detection Network. In ACM on Multimedia Conference. 516--520. Google ScholarDigital Library
- Dimosthenis Karatzas, Lluis Gomez-Bigorda, Anguelos Nicolaou, Suman Ghosh, Andrew Bagdanov, Masakazu Iwamura, Jiri Matas, Lukas Neumann, Vijay Ramaseshan Chandrasekhar, and Faisal Shafait. 2015. ICDAR 2015 competition on Robust Reading. In International Conference on Document Analysis and Recognition. 1156--1160. Google ScholarDigital Library
- Dimosthenis Karatzas, Faisal Shafait, Seiichi Uchida, Masakazu Iwamura, Lluis Gomez I Bigorda, Sergi Robles Mestre, Joan Mas, David Fernandez Mota, Jon Almazàn Almazàn, and Lluís Pere De Las Heras. 2013. ICDAR 2013 Robust Reading Competition. In International Conference on Document Analysis and Recognition. 1484--1493. Google ScholarDigital Library
- Yi Li, Haozhi Qi, Jifeng Dai, Xiangyang Ji, and Yichen Wei. 2017. Fully convolutional instance-aware semantic segmentation IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 2359--2367.Google Scholar
- Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, and Wenyu Liu. 2017. TextBoxes: A Fast Text Detector with a Single Deep Neural Network. AAAI. 4161--4167.Google Scholar
- Tsung Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature Pyramid Networks for Object Detection. In IEEE Conference on Computer Vision and Pattern Recognition. 936--944.Google Scholar
- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740--755.Google ScholarCross Ref
- Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg . 2016. Ssd: Single shot multibox detector. In European conference on computer vision. Springer, 21-37.Google ScholarCross Ref
- Yuliang Liu and Lianwen Jin. 2017. Deep matching prior network: Toward tighter multi-oriented text detection Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. Vol. 2. 8.Google Scholar
- Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation IEEE Conference on Computer Vision and Pattern Recognition. 3431--3440.Google Scholar
- Lukas Neumann and Jiri Matas. 2010. A Method for Text Localization and Recognition in Real-World Images Computer Vision - ACCV 2010 - Asian Conference on Computer Vision, Queenstown, New Zealand, November 8--12, 2010, Revised Selected Papers. 770--783. Google ScholarDigital Library
- L. Neumann and J. Matas. 2012. Real-time scene text localization and recognition. In Computer Vision and Pattern Recognition. 3538--3545. Google ScholarDigital Library
- Siyang Qin and Roberto Manduchi. 2017. Cascaded Segmentation-Detection Networks for Word-Level Text Spotting. arXiv preprint arXiv:1704.00834 (2017).Google Scholar
- Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection Computer Vision and Pattern Recognition. 779--788.Google Scholar
- Shaoqing Ren, Ross Girshick, Ross Girshick, and Jian Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, 6 (2017), 1137. Google ScholarDigital Library
- Baoguang Shi, Xiang Bai, and Serge Belongie. 2017. Detecting oriented text in natural images by linking segments Proc. CVPR, Vol. Vol. 3.Google ScholarCross Ref
- Carole H Sudre, Wenqi Li, Tom Vercauteren, Sebastien Ourselin, and M Jorge Cardoso. 2017. Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Springer, 240--248.Google Scholar
- Zhi Tian, Weilin Huang, Tong He, Pan He, and Yu Qiao. 2016. Detecting Text in Natural Image with Connectionist Text Proposal Network European Conference on Computer Vision. 56--72.Google Scholar
- Andreas Veit, Tomas Matera, Lukas Neumann, Jiri Matas, and Serge Belongie. 2016. Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140 (2016).Google Scholar
- Cong Yao, Xiang Bai, Nong Sang, Xinyu Zhou, Shuchang Zhou, and Zhimin Cao. 2016. Scene text detection via holistic, multi-channel prediction. arXiv preprint arXiv:1606.09002 (2016).Google Scholar
- Chucai Yi and Yingli Tian. 2011. Assistive text reading from complex background for blind persons International Workshop on Camera-Based Document Analysis and Recognition. 15--28. Google ScholarDigital Library
- X. C. Yin, X. Yin, K. Huang, and H. W. Hao. 2014. Robust Text Detection in Natural Scene Images. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 36, 5 (2014), 970--83.Google ScholarCross Ref
- Zheng Zhang, Wei Shen, Cong Yao, and Xiang Bai. 2015. Symmetry-based text line detection in natural scenes IEEE Conference on Computer Vision and Pattern Recognition. 2558--2567.Google Scholar
- Zheng Zhang, Chengquan Zhang, Wei Shen, Cong Yao, Wenyu Liu, and Xiang Bai. 2016. Multi-oriented text detection with fully convolutional networks. arXiv preprint arXiv:1604.04018 (2016).Google Scholar
- Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, and Philip HS Torr. 2015. Conditional random fields as recurrent neural networks Proceedings of the IEEE International Conference on Computer Vision. 1529--1537. Google ScholarDigital Library
- Zhuoyao Zhong, Lianwen Jin, Shuye Zhang, and Ziyong Feng. 2016. DeepText: A Unified Framework for Text Proposal Generation and Text Detection in Natural Images. Architecture Science 12 (2016), 1--18.Google Scholar
- Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang. 2017. EAST: an efficient and accurate scene text detector. arXiv preprint arXiv:1704.03155 (2017).Google Scholar
Index Terms
- A Multi-Oriented Scene Text Detector with Position-Sensitive Segmentation
Recommendations
High-speed Scene Text Detection with Attention and Multi-scale Label Generation
AbstractScene text detection are useful in abundant areas of work and daily life. Due to the limitation of regression-based methods in detecting irregular shape text (such as curve text), segmentation- based methods, being able to detect text in various ...
A quadrilateral scene text detector with two-stage network architecture
Highlights- We propose a novel quadrilateral regression algorithm for generating quadrilateral proposals and text detections.
AbstractMany of the state-of-the-art methods can only localize scene texts with rotated rectangle boundaries, which may result in incorrect rectification of the detected scene texts and erroneous elimination of proposals or detections during ...
Could scene context be beneficial for scene text detection?
Scene text detection and scene segmentation are meaningful tasks in the computer vision field. Could the semantic scene segmentation assist scene text detection in any degree? For example, can we expect the probability of a region being text is low if ...
Comments