An Automatic Video Text Detection, Localization and Extraction Approach

Zhu, Chengjun; Ouyang, Yuanxin; Gao, Lei; Chen, Zhenyong; Xiong, Zhang

doi:10.1007/978-3-642-01350-8_1

Chengjun Zhu¹⁸,
Yuanxin Ouyang¹⁸,
Lei Gao¹⁸,
Zhenyong Chen¹⁸ &
…
Zhang Xiong¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4879))

Included in the following conference series:

International Conference on Signal-Image Technology and Internet-Based Systems

407 Accesses
4 Citations

Abstract

Text in video is a very compact and accurate clue for video indexing and summarization. This paper presents an algorithm regarding word group as a special symbol to detect, localize and extract video text using support vector machine (SVM) automatically. First, four sobel operators are applied to get the EM(edge map) of the video frame and the EM is segmented into N×2N size blocks. Then character features and characters group structure features are extracted to construct a 19-dimension feature vector. We use a pre-trained SVM to partition each block into two classes: text and non-text blocks. Secondly a dilatation-shrink process is employed to adjust the text position. Finally text regions are enhanced by multiple frame information. After binarization of enhanced text region, the text region with clean background is recognized by OCR software. Experimental results show that the proposed method can detect, localize, and extract video texts with high accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aslandogan, Y.A., Yu, C.T.: Techniques and systems for image and video retrieval. IEEE Trans. Knowledge Data Eng. 11, 56–63 (1999)
Article Google Scholar
Lyu, M.R.: Jiqiang Song; Min Cai: A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Transactions on Circuits and Systems for Video Technology 15(2), 243–255 (2005)
Article Google Scholar
Tang, X., Gao, X., Liu, J., et al.: A Spatial-Temporal Approach for Video Caption Detection and Recognition. IEEE Trans On Neural Networks, 961–971 (2002); special issue on Intelligent Multimedia Processing
Google Scholar
Zhang, H.J.: Content-based video analysis, retrieval and browsing. Microsoft Research Asia, Beijing (2001)
Google Scholar
Chen, D., Bourlard, H., Thiran, J.-P.: Text Identification in Complex Back-ground Using SVM. In: CVPR 2001, vol. II, pp. 621–626 (2001)
Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory 361, 581–585 (1996)
Google Scholar
Sato, T., Kanade, T., Kughes, E.K., Smith, M.A., Satoh, S.: Video OCR: Indexing digital news libraries by recognition of superimposed captions. ACM Multimedia Syst (Special Is-sue on Video Libraries) 7(5), 385–395 (1999)
Article Google Scholar
Li, H.P., Doemann, D., Kia, O.: Text extraction, enhancement and OCR in digital video. In: Proc. 3rd IAPR Workshop, Nagoya, Japan, pp. 363–377 (1998)
Google Scholar
Otsu, N.: A Threshold Selection Method from Grey-Level Histograms. IEEE Trans. Systems, Man, and Cybernetics 9(1), 377–393 (1979)
Article MathSciNet Google Scholar
Song, J., Cai, M., Lyu, M.R.: A robust statistic method for classifying color polar-ity of video text. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), April 2003, vol. 3, pp. 581–584 (2003)
Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Hua, X.-S., Wenyin, L., Zhang, H.-J.: Automatic Performance Evaluation for Video Text Detection, icdar. In: Sixth International Conference on Document Analysis and Recognition (ICDAR 2001), p. 0545 (2001)
Google Scholar
Zhou, S., Wang, K.: Localization site prediction for membrane proteins by integrating rule and SVM classification. IEEE Transactions on Knowledge and Data Engineering 17(12), 1694–1705 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Beihang University, No. 37 Xue Yuan Road, Haidian District, Beijing, P.R. China
Chengjun Zhu, Yuanxin Ouyang, Lei Gao, Zhenyong Chen & Zhang Xiong

Authors

Chengjun Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yuanxin Ouyang
View author publications
You can also search for this author in PubMed Google Scholar
Lei Gao
View author publications
You can also search for this author in PubMed Google Scholar
Zhenyong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhang Xiong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartemento Tecnologie dell’Informazione, Universitá degli Studi di Milano, Via Bramante 65, 26013, Crema, Italy
Ernesto Damiani
LE2I-CNRS, Université de Bourgogne, Aile de l’Ingénieur, 21078, Dijon Cedex, France
Kokou Yetongnon , Richard Chbeir & Albert Dipanda , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, C., Ouyang, Y., Gao, L., Chen, Z., Xiong, Z. (2009). An Automatic Video Text Detection, Localization and Extraction Approach. In: Damiani, E., Yetongnon, K., Chbeir, R., Dipanda, A. (eds) Advanced Internet Based Systems and Applications. SITIS 2006. Lecture Notes in Computer Science, vol 4879. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01350-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-01350-8_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01349-2
Online ISBN: 978-3-642-01350-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics