research-article

A Multi-Oriented Scene Text Detector with Position-Sensitive Segmentation

Authors:
Peirui Cheng

University of Chinese Academy of Sciences, Beijing, China

University of Chinese Academy of Sciences, Beijing, China
View Profile

,
Weiqiang Wang

University of Chinese Academy of Sciences, Beijing, China

University of Chinese Academy of Sciences, Beijing, China
View Profile

ICMR '18: Proceedings of the 2018 ACM on International Conference on Multimedia RetrievalJune 2018Pages 152–159https://doi.org/10.1145/3206025.3206043

Published:05 June 2018Publication History

ICMR '18: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval

Pages 152–159

ABSTRACT

Scene text detection has been studied for a long time and lots of approaches have achieved promising performances. Most approaches regard text as a specific object and utilize the popular frameworks of object detection to detect scene text. However, scene text is different from general objects in terms of orientations, sizes and aspect ratios. In this paper, we present an end-to-end multi-oriented scene text detection approach, which combines the object detection framework with the position-sensitive segmentation. For a given image, features are extracted through a fully convolutional network. Then they are input into text detection branch and position-sensitive segmentation branch simultaneously, where text detection branch is used for generating candidates and position-sensitive segmentation branch is used for generating segmentation maps. Finally the candidates generated by text detection branch are projected onto the position-sensitive segmentation maps for filtering. The proposed approach utilizes the merits of position-sensitive segmentation to improve the expressiveness of the proposed network. Additionally, the approach uses position-sensitive segmentation maps to further filter the candidates so as to highly improve the precision rate. Experiments on datasets ICDAR2015 and COCO-Text demonstrate that the proposed method outperforms previous state-of-the-art methods. For ICDAR2015 dataset, the proposed method achieves an F-score of 0.83 and a precision rate of 0.87.

References

Jifeng Dai, Kaiming He, Yi Li, Shaoqing Ren, and Jian Sun. 2016. Instance-sensitive fully convolutional networks. In European Conference on Computer Vision. Springer, 534--549.Google ScholarCross Ref
Jia Deng, Wei Dong, Richard Socher, Li Jia Li, Kai Li, and Fei Fei Li. 2009. ImageNet: A large-scale hierarchical image database Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. 248--255.Google Scholar
Boris Epshtein, Eyal Ofek, and Yonatan Wexler. 2010. Detecting text in natural scenes with stroke width transform Computer Vision and Pattern Recognition. 2963--2970.Google Scholar
Ankush Gupta, Andrea Vedaldi, and Andrew Zisserman. 2016. Synthetic Data for Text Localisation in Natural Images Computer Vision and Pattern Recognition. 2315--2324.Google Scholar
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017 a. Mask r-cnn Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 2980--2988.Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
Pan He, Weilin Huang, Tong He, Qile Zhu, Yu Qiao, and Xiaolin Li. 2017 b. Single shot text detector with regional attention. In The IEEE International Conference on Computer Vision (ICCV).Google ScholarCross Ref
Wenhao He, Xu-Yao Zhang, Fei Yin, and Cheng-Lin Liu. 2017 c. Deep direct regression for multi-oriented scene text detection. arXiv preprint arXiv:1703.08289 (2017).Google Scholar
Lichao Huang, Yi Yang, Yafeng Deng, and Yinan Yu. 2015. DenseBox: Unifying Landmark Localization with End to End Object Detection. Computer Science (2015).Google Scholar
Weilin Huang, Yu Qiao, and Xiaoou Tang. 2014. Robust scene text detection with convolution neural network induced mser trees European Conference on Computer Vision. Springer, 497--511.Google Scholar
Max Jaderberg, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014 a. Reading Text in the Wild with Convolutional Neural Networks. International Journal of Computer Vision Vol. 116, 1 (2014), 1--20. Google ScholarDigital Library
Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. 2014 b. Deep Features for Text Spotting. In European Conference on Computer Vision. 512--528.Google Scholar
Yuning Jiang, Yuning Jiang, Zhimin Cao, Zhimin Cao, and Thomas Huang. 2016. UnitBox: An Advanced Object Detection Network. In ACM on Multimedia Conference. 516--520. Google ScholarDigital Library
Dimosthenis Karatzas, Lluis Gomez-Bigorda, Anguelos Nicolaou, Suman Ghosh, Andrew Bagdanov, Masakazu Iwamura, Jiri Matas, Lukas Neumann, Vijay Ramaseshan Chandrasekhar, and Faisal Shafait. 2015. ICDAR 2015 competition on Robust Reading. In International Conference on Document Analysis and Recognition. 1156--1160. Google ScholarDigital Library
Dimosthenis Karatzas, Faisal Shafait, Seiichi Uchida, Masakazu Iwamura, Lluis Gomez I Bigorda, Sergi Robles Mestre, Joan Mas, David Fernandez Mota, Jon Almazàn Almazàn, and Lluís Pere De Las Heras. 2013. ICDAR 2013 Robust Reading Competition. In International Conference on Document Analysis and Recognition. 1484--1493. Google ScholarDigital Library
Yi Li, Haozhi Qi, Jifeng Dai, Xiangyang Ji, and Yichen Wei. 2017. Fully convolutional instance-aware semantic segmentation IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 2359--2367.Google Scholar
Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, and Wenyu Liu. 2017. TextBoxes: A Fast Text Detector with a Single Deep Neural Network. AAAI. 4161--4167.Google Scholar
Tsung Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature Pyramid Networks for Object Detection. In IEEE Conference on Computer Vision and Pattern Recognition. 936--944.Google Scholar
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740--755.Google ScholarCross Ref
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg . 2016. Ssd: Single shot multibox detector. In European conference on computer vision. Springer, 21-37.Google ScholarCross Ref
Yuliang Liu and Lianwen Jin. 2017. Deep matching prior network: Toward tighter multi-oriented text detection Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. Vol. 2. 8.Google Scholar
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation IEEE Conference on Computer Vision and Pattern Recognition. 3431--3440.Google Scholar
Lukas Neumann and Jiri Matas. 2010. A Method for Text Localization and Recognition in Real-World Images Computer Vision - ACCV 2010 - Asian Conference on Computer Vision, Queenstown, New Zealand, November 8--12, 2010, Revised Selected Papers. 770--783. Google ScholarDigital Library
L. Neumann and J. Matas. 2012. Real-time scene text localization and recognition. In Computer Vision and Pattern Recognition. 3538--3545. Google ScholarDigital Library
Siyang Qin and Roberto Manduchi. 2017. Cascaded Segmentation-Detection Networks for Word-Level Text Spotting. arXiv preprint arXiv:1704.00834 (2017).Google Scholar
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection Computer Vision and Pattern Recognition. 779--788.Google Scholar
Shaoqing Ren, Ross Girshick, Ross Girshick, and Jian Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, 6 (2017), 1137. Google ScholarDigital Library
Baoguang Shi, Xiang Bai, and Serge Belongie. 2017. Detecting oriented text in natural images by linking segments Proc. CVPR, Vol. Vol. 3.Google ScholarCross Ref
Carole H Sudre, Wenqi Li, Tom Vercauteren, Sebastien Ourselin, and M Jorge Cardoso. 2017. Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Springer, 240--248.Google Scholar
Zhi Tian, Weilin Huang, Tong He, Pan He, and Yu Qiao. 2016. Detecting Text in Natural Image with Connectionist Text Proposal Network European Conference on Computer Vision. 56--72.Google Scholar
Andreas Veit, Tomas Matera, Lukas Neumann, Jiri Matas, and Serge Belongie. 2016. Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140 (2016).Google Scholar
Cong Yao, Xiang Bai, Nong Sang, Xinyu Zhou, Shuchang Zhou, and Zhimin Cao. 2016. Scene text detection via holistic, multi-channel prediction. arXiv preprint arXiv:1606.09002 (2016).Google Scholar
Chucai Yi and Yingli Tian. 2011. Assistive text reading from complex background for blind persons International Workshop on Camera-Based Document Analysis and Recognition. 15--28. Google ScholarDigital Library
X. C. Yin, X. Yin, K. Huang, and H. W. Hao. 2014. Robust Text Detection in Natural Scene Images. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 36, 5 (2014), 970--83.Google ScholarCross Ref
Zheng Zhang, Wei Shen, Cong Yao, and Xiang Bai. 2015. Symmetry-based text line detection in natural scenes IEEE Conference on Computer Vision and Pattern Recognition. 2558--2567.Google Scholar
Zheng Zhang, Chengquan Zhang, Wei Shen, Cong Yao, Wenyu Liu, and Xiang Bai. 2016. Multi-oriented text detection with fully convolutional networks. arXiv preprint arXiv:1604.04018 (2016).Google Scholar
Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, and Philip HS Torr. 2015. Conditional random fields as recurrent neural networks Proceedings of the IEEE International Conference on Computer Vision. 1529--1537. Google ScholarDigital Library
Zhuoyao Zhong, Lianwen Jin, Shuye Zhang, and Ziyong Feng. 2016. DeepText: A Unified Framework for Text Proposal Generation and Text Detection in Natural Images. Architecture Science 12 (2016), 1--18.Google Scholar
Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang. 2017. EAST: an efficient and accurate scene text detector. arXiv preprint arXiv:1704.03155 (2017).Google Scholar

Index Terms

A Multi-Oriented Scene Text Detector with Position-Sensitive Segmentation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Image segmentation
        Object detection
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

High-speed Scene Text Detection with Attention and Multi-scale Label Generation
Abstract
Scene text detection are useful in abundant areas of work and daily life. Due to the limitation of regression-based methods in detecting irregular shape text (such as curve text), segmentation- based methods, being able to detect text in various ...
Read More
A quadrilateral scene text detector with two-stage network architecture
Highlights
- We propose a novel quadrilateral regression algorithm for generating quadrilateral proposals and text detections.
Abstract
Many of the state-of-the-art methods can only localize scene texts with rotated rectangle boundaries, which may result in incorrect rectification of the detected scene texts and erroneous elimination of proposals or detections during ...
Read More
Could scene context be beneficial for scene text detection?

Scene text detection and scene segmentation are meaningful tasks in the computer vision field. Could the semantic scene segmentation assist scene text detection in any degree? For example, can we expect the probability of a region being text is low if ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMR '18: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval
June 2018
550 pages
ISBN:9781450350464
DOI:10.1145/3206025
Conference Chairs:
Kiyoharu Aizawa
The Univ. of Tokyo, Japan
,
Michael Lew
Leiden Univ., Netherlands
,
Shin'ichi Satoh
National Inst. of Informatics, Japan
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 June 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
fully convolutional network
position-sensitive segmentation branch
scene text detection
text detection branch
Qualifiers
- research-article
Conference

Acceptance Rates
ICMR '18 Paper Acceptance Rate44of136submissions,32%Overall Acceptance Rate254of830submissions,31%
More
Upcoming Conference
ICMR '24

Sponsor:

sigmm

International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket , Thailand
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 195
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Multi-Oriented Scene Text Detector with Position-Sensitive Segmentation

ICMR '18: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

High-speed Scene Text Detection with Attention and Multi-scale Label Generation

A quadrilateral scene text detector with two-stage network architecture

Could scene context be beneficial for scene text detection?

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A Multi-Oriented Scene Text Detector with Position-Sensitive Segmentation

ICMR '18: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

High-speed Scene Text Detection with Attention and Multi-scale Label Generation

A quadrilateral scene text detector with two-stage network architecture

Could scene context be beneficial for scene text detection?

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media