skip to main content
10.1145/3206025.3206043acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

A Multi-Oriented Scene Text Detector with Position-Sensitive Segmentation

Authors Info & Claims
Published:05 June 2018Publication History

ABSTRACT

Scene text detection has been studied for a long time and lots of approaches have achieved promising performances. Most approaches regard text as a specific object and utilize the popular frameworks of object detection to detect scene text. However, scene text is different from general objects in terms of orientations, sizes and aspect ratios. In this paper, we present an end-to-end multi-oriented scene text detection approach, which combines the object detection framework with the position-sensitive segmentation. For a given image, features are extracted through a fully convolutional network. Then they are input into text detection branch and position-sensitive segmentation branch simultaneously, where text detection branch is used for generating candidates and position-sensitive segmentation branch is used for generating segmentation maps. Finally the candidates generated by text detection branch are projected onto the position-sensitive segmentation maps for filtering. The proposed approach utilizes the merits of position-sensitive segmentation to improve the expressiveness of the proposed network. Additionally, the approach uses position-sensitive segmentation maps to further filter the candidates so as to highly improve the precision rate. Experiments on datasets ICDAR2015 and COCO-Text demonstrate that the proposed method outperforms previous state-of-the-art methods. For ICDAR2015 dataset, the proposed method achieves an F-score of 0.83 and a precision rate of 0.87.

References

  1. Jifeng Dai, Kaiming He, Yi Li, Shaoqing Ren, and Jian Sun. 2016. Instance-sensitive fully convolutional networks. In European Conference on Computer Vision. Springer, 534--549.Google ScholarGoogle ScholarCross RefCross Ref
  2. Jia Deng, Wei Dong, Richard Socher, Li Jia Li, Kai Li, and Fei Fei Li. 2009. ImageNet: A large-scale hierarchical image database Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. 248--255.Google ScholarGoogle Scholar
  3. Boris Epshtein, Eyal Ofek, and Yonatan Wexler. 2010. Detecting text in natural scenes with stroke width transform Computer Vision and Pattern Recognition. 2963--2970.Google ScholarGoogle Scholar
  4. Ankush Gupta, Andrea Vedaldi, and Andrew Zisserman. 2016. Synthetic Data for Text Localisation in Natural Images Computer Vision and Pattern Recognition. 2315--2324.Google ScholarGoogle Scholar
  5. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017 a. Mask r-cnn Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 2980--2988.Google ScholarGoogle Scholar
  6. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  7. Pan He, Weilin Huang, Tong He, Qile Zhu, Yu Qiao, and Xiaolin Li. 2017 b. Single shot text detector with regional attention. In The IEEE International Conference on Computer Vision (ICCV).Google ScholarGoogle ScholarCross RefCross Ref
  8. Wenhao He, Xu-Yao Zhang, Fei Yin, and Cheng-Lin Liu. 2017 c. Deep direct regression for multi-oriented scene text detection. arXiv preprint arXiv:1703.08289 (2017).Google ScholarGoogle Scholar
  9. Lichao Huang, Yi Yang, Yafeng Deng, and Yinan Yu. 2015. DenseBox: Unifying Landmark Localization with End to End Object Detection. Computer Science (2015).Google ScholarGoogle Scholar
  10. Weilin Huang, Yu Qiao, and Xiaoou Tang. 2014. Robust scene text detection with convolution neural network induced mser trees European Conference on Computer Vision. Springer, 497--511.Google ScholarGoogle Scholar
  11. Max Jaderberg, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014 a. Reading Text in the Wild with Convolutional Neural Networks. International Journal of Computer Vision Vol. 116, 1 (2014), 1--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. 2014 b. Deep Features for Text Spotting. In European Conference on Computer Vision. 512--528.Google ScholarGoogle Scholar
  13. Yuning Jiang, Yuning Jiang, Zhimin Cao, Zhimin Cao, and Thomas Huang. 2016. UnitBox: An Advanced Object Detection Network. In ACM on Multimedia Conference. 516--520. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Dimosthenis Karatzas, Lluis Gomez-Bigorda, Anguelos Nicolaou, Suman Ghosh, Andrew Bagdanov, Masakazu Iwamura, Jiri Matas, Lukas Neumann, Vijay Ramaseshan Chandrasekhar, and Faisal Shafait. 2015. ICDAR 2015 competition on Robust Reading. In International Conference on Document Analysis and Recognition. 1156--1160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Dimosthenis Karatzas, Faisal Shafait, Seiichi Uchida, Masakazu Iwamura, Lluis Gomez I Bigorda, Sergi Robles Mestre, Joan Mas, David Fernandez Mota, Jon Almazàn Almazàn, and Lluís Pere De Las Heras. 2013. ICDAR 2013 Robust Reading Competition. In International Conference on Document Analysis and Recognition. 1484--1493. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Yi Li, Haozhi Qi, Jifeng Dai, Xiangyang Ji, and Yichen Wei. 2017. Fully convolutional instance-aware semantic segmentation IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 2359--2367.Google ScholarGoogle Scholar
  17. Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, and Wenyu Liu. 2017. TextBoxes: A Fast Text Detector with a Single Deep Neural Network. AAAI. 4161--4167.Google ScholarGoogle Scholar
  18. Tsung Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature Pyramid Networks for Object Detection. In IEEE Conference on Computer Vision and Pattern Recognition. 936--944.Google ScholarGoogle Scholar
  19. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740--755.Google ScholarGoogle ScholarCross RefCross Ref
  20. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg . 2016. Ssd: Single shot multibox detector. In European conference on computer vision. Springer, 21-37.Google ScholarGoogle ScholarCross RefCross Ref
  21. Yuliang Liu and Lianwen Jin. 2017. Deep matching prior network: Toward tighter multi-oriented text detection Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. Vol. 2. 8.Google ScholarGoogle Scholar
  22. Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation IEEE Conference on Computer Vision and Pattern Recognition. 3431--3440.Google ScholarGoogle Scholar
  23. Lukas Neumann and Jiri Matas. 2010. A Method for Text Localization and Recognition in Real-World Images Computer Vision - ACCV 2010 - Asian Conference on Computer Vision, Queenstown, New Zealand, November 8--12, 2010, Revised Selected Papers. 770--783. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. L. Neumann and J. Matas. 2012. Real-time scene text localization and recognition. In Computer Vision and Pattern Recognition. 3538--3545. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Siyang Qin and Roberto Manduchi. 2017. Cascaded Segmentation-Detection Networks for Word-Level Text Spotting. arXiv preprint arXiv:1704.00834 (2017).Google ScholarGoogle Scholar
  26. Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection Computer Vision and Pattern Recognition. 779--788.Google ScholarGoogle Scholar
  27. Shaoqing Ren, Ross Girshick, Ross Girshick, and Jian Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, 6 (2017), 1137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Baoguang Shi, Xiang Bai, and Serge Belongie. 2017. Detecting oriented text in natural images by linking segments Proc. CVPR, Vol. Vol. 3.Google ScholarGoogle ScholarCross RefCross Ref
  29. Carole H Sudre, Wenqi Li, Tom Vercauteren, Sebastien Ourselin, and M Jorge Cardoso. 2017. Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Springer, 240--248.Google ScholarGoogle Scholar
  30. Zhi Tian, Weilin Huang, Tong He, Pan He, and Yu Qiao. 2016. Detecting Text in Natural Image with Connectionist Text Proposal Network European Conference on Computer Vision. 56--72.Google ScholarGoogle Scholar
  31. Andreas Veit, Tomas Matera, Lukas Neumann, Jiri Matas, and Serge Belongie. 2016. Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140 (2016).Google ScholarGoogle Scholar
  32. Cong Yao, Xiang Bai, Nong Sang, Xinyu Zhou, Shuchang Zhou, and Zhimin Cao. 2016. Scene text detection via holistic, multi-channel prediction. arXiv preprint arXiv:1606.09002 (2016).Google ScholarGoogle Scholar
  33. Chucai Yi and Yingli Tian. 2011. Assistive text reading from complex background for blind persons International Workshop on Camera-Based Document Analysis and Recognition. 15--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. X. C. Yin, X. Yin, K. Huang, and H. W. Hao. 2014. Robust Text Detection in Natural Scene Images. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 36, 5 (2014), 970--83.Google ScholarGoogle ScholarCross RefCross Ref
  35. Zheng Zhang, Wei Shen, Cong Yao, and Xiang Bai. 2015. Symmetry-based text line detection in natural scenes IEEE Conference on Computer Vision and Pattern Recognition. 2558--2567.Google ScholarGoogle Scholar
  36. Zheng Zhang, Chengquan Zhang, Wei Shen, Cong Yao, Wenyu Liu, and Xiang Bai. 2016. Multi-oriented text detection with fully convolutional networks. arXiv preprint arXiv:1604.04018 (2016).Google ScholarGoogle Scholar
  37. Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, and Philip HS Torr. 2015. Conditional random fields as recurrent neural networks Proceedings of the IEEE International Conference on Computer Vision. 1529--1537. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Zhuoyao Zhong, Lianwen Jin, Shuye Zhang, and Ziyong Feng. 2016. DeepText: A Unified Framework for Text Proposal Generation and Text Detection in Natural Images. Architecture Science 12 (2016), 1--18.Google ScholarGoogle Scholar
  39. Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang. 2017. EAST: an efficient and accurate scene text detector. arXiv preprint arXiv:1704.03155 (2017).Google ScholarGoogle Scholar

Index Terms

  1. A Multi-Oriented Scene Text Detector with Position-Sensitive Segmentation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ICMR '18: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval
          June 2018
          550 pages
          ISBN:9781450350464
          DOI:10.1145/3206025

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 5 June 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          ICMR '18 Paper Acceptance Rate44of136submissions,32%Overall Acceptance Rate254of830submissions,31%

          Upcoming Conference

          ICMR '24
          International Conference on Multimedia Retrieval
          June 10 - 14, 2024
          Phuket , Thailand

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader