An Efficient Method for Text Detection in Video Based on Stroke Width Similarity

Dinh, Viet Cuong; Chun, Seong Soo; Cha, Seungwook; Ryu, Hanjin; Sull, Sanghoon

doi:10.1007/978-3-540-76386-4_18

Viet Cuong Dinh¹,
Seong Soo Chun¹,
Seungwook Cha¹,
Hanjin Ryu¹ &
…
Sanghoon Sull¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4843))

Included in the following conference series:

Asian Conference on Computer Vision

3411 Accesses
15 Citations

Abstract

Text appearing in video provides semantic knowledge and significant information for video indexing and retrieval system. This paper proposes an effective method for text detection in video based on the similarity in stroke width of text (which is defined as the distance between two edges of a stroke). From the observation that text regions can be characterized by a dominant fixed stroke width, edge detection with local adaptive thresholds is first devised to keep text- while reducing background-regions. Second, morphological dilation operator with adaptive structuring element size determined by stroke width value is exploited to roughly localize text regions. Finally, to reduce false alarm and refine text location, a new multi-frame refinement method is applied. Experimental results show that the proposed method is not only robust to different levels of background complexity, but also effective to different fonts (size, color) and languages of text.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Zhu, Q., Yeh, M.C., Cheng, K.T.: Multimodal fusion using learned text concepts for image categorization. In: Proc. of ACM Int’l. Conf. on Multimedia, pp. 211–220. ACM Press, New York (2006)
Google Scholar
Lienhart, R.: Dynamic video summarization of home video. In: Proc. of SPIE, vol. 3972, pp. 378–389 (1999)
Google Scholar
Fan, J., Luo, H., Elmagarmid, A.K.: Concept-oriented indexing of video databases: toward semantic sensitive retrieval and browsing. IEEE Trans. on Image Processing 13, 974–992 (2004)
Article Google Scholar
Zhong, Y., Karu, K., Jain, A.K.: Locating text in complex color images. Pattern Recognition 28, 1523–1536 (1995)
Article Google Scholar
Jain, A.K., Yu, B.: Automatic text location in images and video frames. In: Proc. of Int’l. Conf. on Pattern Recognition, vol. 2, pp. 1497–1499 (August 1998)
Google Scholar
Ohya, J., Shio, A., Akamatsu, S.: Recognition characters in scene images. IEEE Trans. on Pattern Analysis and Machine Intelligence 16, 214–220 (1994)
Article Google Scholar
Qiao, Y.L., Li, M., Lu, Z.M., Sun, S.H.: Gabor filter based text extraction from digital document images. In: Proc. of Int’l. Conf. on Intelligent Information Hiding and Multimedia Signal Processing, pp. 297–300 (December 2006)
Google Scholar
Li, H., Doermann, D., Kia, O.: Automatic text detection and tracking in digital video. IEEE Trans. on Image Processing, 147–156 (2000)
Google Scholar
Chen, D., Bourlard, H., Thiran, J.P.: Text identification in complex background using SVM. In: Proc. of Int’l. Conf. on Document Analysis and Recognition, vol. 2, pp. 621–626 (December 2001)
Google Scholar
Lyu, M.R., Song, J., Cai, M.: A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Trans. on Circuits Systems Video Technology, 243–255 (2005)
Google Scholar
Jung, K.C., Han, J.H., Kim, K.I., Park, S.H.: Support vector machines for text location in news video images. In: Proc. of Int’l. Conf. on System Technology, pp. 176–189 (September 2000)
Google Scholar
Gonzalez, R.-C., Woods, R.E.: Digital Image Processing, 2nd edn., pp. 602–608. Prentice-Hall, Englewood Cliffs (2002)
Google Scholar
Lienhart, R., Wernicke, A.: Localizing and segmenting text in images and videos. IEEE Trans. on Circuits Systems Video Technology, 256–268 (2002)
Google Scholar
Li, H., Doermann, D.: Text enhancement in digital video using multiple frame integration. In: Proc. of ACM Int’l. Conf. on Multimedia, pp. 19–22. ACM Press, New York (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Computer Engineering, Korea University, 5-1 Anam-dong, Seongbuk-gu, Seoul, 136-701, Korea
Viet Cuong Dinh, Seong Soo Chun, Seungwook Cha, Hanjin Ryu & Sanghoon Sull

Authors

Viet Cuong Dinh
View author publications
You can also search for this author in PubMed Google Scholar
Seong Soo Chun
View author publications
You can also search for this author in PubMed Google Scholar
Seungwook Cha
View author publications
You can also search for this author in PubMed Google Scholar
Hanjin Ryu
View author publications
You can also search for this author in PubMed Google Scholar
Sanghoon Sull
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Yasushi Yagi Sing Bing Kang In So Kweon Hongbin Zha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dinh, V.C., Chun, S.S., Cha, S., Ryu, H., Sull, S. (2007). An Efficient Method for Text Detection in Video Based on Stroke Width Similarity. In: Yagi, Y., Kang, S.B., Kweon, I.S., Zha, H. (eds) Computer Vision – ACCV 2007. ACCV 2007. Lecture Notes in Computer Science, vol 4843. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76386-4_18

Download citation

DOI: https://doi.org/10.1007/978-3-540-76386-4_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76385-7
Online ISBN: 978-3-540-76386-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics