Skip to main content

An Automatic Video Text Detection, Localization and Extraction Approach

  • Conference paper
Book cover Advanced Internet Based Systems and Applications (SITIS 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4879))

Abstract

Text in video is a very compact and accurate clue for video indexing and summarization. This paper presents an algorithm regarding word group as a special symbol to detect, localize and extract video text using support vector machine (SVM) automatically. First, four sobel operators are applied to get the EM(edge map) of the video frame and the EM is segmented into N×2N size blocks. Then character features and characters group structure features are extracted to construct a 19-dimension feature vector. We use a pre-trained SVM to partition each block into two classes: text and non-text blocks. Secondly a dilatation-shrink process is employed to adjust the text position. Finally text regions are enhanced by multiple frame information. After binarization of enhanced text region, the text region with clean background is recognized by OCR software. Experimental results show that the proposed method can detect, localize, and extract video texts with high accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aslandogan, Y.A., Yu, C.T.: Techniques and systems for image and video retrieval. IEEE Trans. Knowledge Data Eng. 11, 56–63 (1999)

    Article  Google Scholar 

  2. Lyu, M.R.: Jiqiang Song; Min Cai: A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Transactions on Circuits and Systems for Video Technology 15(2), 243–255 (2005)

    Article  Google Scholar 

  3. Tang, X., Gao, X., Liu, J., et al.: A Spatial-Temporal Approach for Video Caption Detection and Recognition. IEEE Trans On Neural Networks, 961–971 (2002); special issue on Intelligent Multimedia Processing

    Google Scholar 

  4. Zhang, H.J.: Content-based video analysis, retrieval and browsing. Microsoft Research Asia, Beijing (2001)

    Google Scholar 

  5. Chen, D., Bourlard, H., Thiran, J.-P.: Text Identification in Complex Back-ground Using SVM. In: CVPR 2001, vol. II, pp. 621–626 (2001)

    Google Scholar 

  6. Vapnik, V.: The Nature of Statistical Learning Theory 361, 581–585 (1996)

    Google Scholar 

  7. Sato, T., Kanade, T., Kughes, E.K., Smith, M.A., Satoh, S.: Video OCR: Indexing digital news libraries by recognition of superimposed captions. ACM Multimedia Syst (Special Is-sue on Video Libraries) 7(5), 385–395 (1999)

    Article  Google Scholar 

  8. Li, H.P., Doemann, D., Kia, O.: Text extraction, enhancement and OCR in digital video. In: Proc. 3rd IAPR Workshop, Nagoya, Japan, pp. 363–377 (1998)

    Google Scholar 

  9. Otsu, N.: A Threshold Selection Method from Grey-Level Histograms. IEEE Trans. Systems, Man, and Cybernetics 9(1), 377–393 (1979)

    Article  MathSciNet  Google Scholar 

  10. Song, J., Cai, M., Lyu, M.R.: A robust statistic method for classifying color polar-ity of video text. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), April 2003, vol. 3, pp. 581–584 (2003)

    Google Scholar 

  11. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm

  12. Hua, X.-S., Wenyin, L., Zhang, H.-J.: Automatic Performance Evaluation for Video Text Detection, icdar. In: Sixth International Conference on Document Analysis and Recognition (ICDAR 2001), p. 0545 (2001)

    Google Scholar 

  13. Zhou, S., Wang, K.: Localization site prediction for membrane proteins by integrating rule and SVM classification. IEEE Transactions on Knowledge and Data Engineering 17(12), 1694–1705 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhu, C., Ouyang, Y., Gao, L., Chen, Z., Xiong, Z. (2009). An Automatic Video Text Detection, Localization and Extraction Approach. In: Damiani, E., Yetongnon, K., Chbeir, R., Dipanda, A. (eds) Advanced Internet Based Systems and Applications. SITIS 2006. Lecture Notes in Computer Science, vol 4879. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01350-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01350-8_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01349-2

  • Online ISBN: 978-3-642-01350-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics