Skip to main content
Log in

Portable and fast text detection

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

In this paper, we describe an efficient pipeline for real-time text detection to be implemented on different architectures, with particular reference to smart phones. The text detection pipeline is based on a rather standard segmentation followed by a classification of each segmented connected component. Segmentation is performed by a linear implementation of MSER, state-of-the-art for text detection, where we control the overall computational cost of the method by computing a set of descriptive features as segmentation goes on. Classification is carried out by a cascade of SVM classifiers, where each layer captures different levels of complexity by means of an appropriate choice of descriptive features and kernel functions. Each detected text element, or character, is finally merged into lines of text and words. Further on, each element can be fed to a multi-class classifier that performs character recognition—this functionality is currently under development. We report experiments aiming at assessing the appropriateness of the text detection procedure, in terms of both performance and speed, when running on both x86 and ARM processors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. http://www.dubout.ch/en/coding.html.

  2. Source code of the Text Segmentation, the libERtxt library, is available for download at https://bitbucket.org/slipguru.

  3. Source code of the general-purpose optimized classification library libMsC is available for download at https://bitbucket.org/slipguru.

  4. http://dag.cvc.uab.es/icdar2013competition.

  5. Dataset acquired for the project VIT—Vision for Innovative Transport—VII FP EU—SP4 Capacities Research for SMEs—n. 222199 http://www.vitproject.eu.

  6. http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/.

  7. https://developer.qualcomm.com/.

  8. http://www.csie.ntu.edu.tw/~cjlin/libsvm/.

  9. Eigen 3 http://eigen.tuxfamily.org.

  10. GLASSENSE is a regional project developed within the SI4Life Ligurian Regional Hub—Research and Innovation—Live Sciences http://www.si4life.com/.

References

  1. Ezaki, N., Bulacu, M., Schomaker, L.: Text detection from natural scene images: towards a system for visually impaired persons. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, vol. 2, pp. 683–686. IEEE (2004)

  2. Destrero, A., Zini, L., Odone, F.: A classification architecture based on connected components for text detection in unconstrained environments. In: IEEE AVSS, pp. 176–181 (2009)

  3. Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3538–3545. IEEE (2012)

  4. Shao, Y., Wang, C., Xiao, B., Zhang, Y., Zhang, L., Ma, L.: Text detection in natural images based on character classification. In: Advances in Multimedia Information Processing-PCM 2010, pp. 736–746. Springer, Berlin (2011)

  5. Shivakumara, P., Sreedhar, R.P., Phan, T.Q., Lu, S., Tan, C.L.: Multioriented video scene text detection through Bayesian classification and boundary growing. IEEE Trans. Circuits Syst. Video Technol. 22(8), 1227–1235 (2012)

    Article  Google Scholar 

  6. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2002)

    Article  Google Scholar 

  7. Nistér, D., Stewénius, H.: Linear time maximally stable extremal regions. In: Computer Vision—ECCV 2008, pp. 183–196. Springer, Berlin (2008)

  8. Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R., Ashida, K., Nagai, Hiroki, Okamoto, Masayuki, Yamamoto, Hiroaki, et al.: ICDAR 2003 robust reading competitions: entries, results, and future directions. Int. J. Doc. Anal. Recognit. (IJDAR) 7(2–3), 105–122 (2005)

    Article  Google Scholar 

  9. Pavlidis, T.: Algorithms for Graphics and Image Processing. Computer Science Press, Rockville (1982)

    Book  MATH  Google Scholar 

  10. Lucas, S.M.: ICDAR 2005 text locating competition results. In: Eighth International Conference on Document Analysis and Recognition, 2005. Proceedings, pp. 80–84. IEEE (2005)

  11. Wu, V., Manmatha, R., Riseman, E.M.: Textfinder: an automatic system to detect and recognize text in images. IEEE Trans. Pattern Anal. Mach. Intell. 21(11), 1224–1229 (1999)

    Article  Google Scholar 

  12. Shahab, A., Shafait, F., Dengel, A.: ICDAR 2011 robust reading competition challenge 2: reading text in scene images. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 1491–1496. IEEE (2011)

  13. Yi, C., Tian, Y.L.: Text string detection from natural scenes by structure-based partition and grouping. IEEE Trans. Image Process. 20(9), 2594–2605 (2011)

    Article  MathSciNet  Google Scholar 

  14. Chen, X., Yuille, A.L.: Detecting and reading text in natural scenes. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004, vol. 2, pp. II-366. IEEE (2004)

  15. Neumann, L., Matas, J.: Text localization in real-world images using efficiently pruned exhaustive search. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 687–691. IEEE (2011)

  16. Yin, X., Yin, X., Huang, K., Hao, H.: Robust text detection in natural scene images. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 970–983 (2014)

    Article  Google Scholar 

  17. Gomez, L., Karatzas, D.: Multi-script text extraction from natural scenes. In: ICDAR (2013)

  18. Chen, X., Yuille, A.L.: A time-efficient cascade for real-time object detection: with applications for the visually impaired. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, 2005. CVPR Workshops. pp. 28–28. IEEE (2005)

  19. Hou, X., Pan, Y., Liu, C.-L.: A hybrid approach to detect and localize texts in natural scene images. IEEE Trans. Image Process. 20(3), 800–813 (2011)

    Article  MathSciNet  Google Scholar 

  20. Wolf, C., Jolion, J.-M.: Extraction and recognition of artificial text in multimedia documents. Form. Pattern Anal. Appl. 6(4), 309–326 (2004)

    MathSciNet  Google Scholar 

  21. Ahonen, T., Hadid, A., Pietikäinen, M.: Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28(12), 2037–2041 (2006)

    Article  MATH  Google Scholar 

  22. Viola, P., Jones, M.: Robust real-time object detection. Int. J. Computer. Vis. 57(2), 137–154 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to F. Odone.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zini, L., Odone, F. Portable and fast text detection. Machine Vision and Applications 27, 845–859 (2016). https://doi.org/10.1007/s00138-016-0778-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-016-0778-2

Keywords

Navigation