Detecting both superimposed and scene text with multiple languages and multiple alignments in video

Huang, Xiaodong; Ma, Huadong; Ling, Charles X.; Gao, Guangyu

doi:10.1007/s11042-012-1201-2

Detecting both superimposed and scene text with multiple languages and multiple alignments in video

Published: 12 August 2012

Volume 70, pages 1703–1727, (2014)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Xiaodong Huang^1,2,
Huadong Ma¹,
Charles X. Ling¹ &
…
Guangyu Gao¹

277 Accesses
3 Citations
Explore all metrics

Abstract

Video text often contains highly useful semantic information that can contribute significantly to video retrieval and understanding. Video text can be classified into scene text and superimposed text. Most of the previous methods detect superimposed or scene text separately due to different text alignments. Moreover, because different language characters have different edge and texture features, it is very difficult to detect the multilingual text. In this paper, we first perform a detailed analysis of motion patterns of video text, and show that the superimposed and scene text exhibit different motion patterns on consecutive frames, which is insensitive to multiple language characters and multiple text alignments. Based on our analysis, we define Motion Perception Field (MPF) to represent the text motion patterns. Finally, we propose a text detection algorithms using MPF for both superimposed and scene text with multiple languages and multiple alignments. Experimental results on diverse videos demonstrate that our algorithms are robust, and outperform previous methods for detecting both superimposed and scene texts with multiple languages and multiple alignments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Systematic Survey on CAPTCHA Recognition: Types, Creation and Breaking Techniques

Article 14 June 2021

Scene Text Detection and Recognition: The Deep Learning Era

Article 27 August 2020

Movie Description

Article Open access 25 January 2017

Notes

China Central Television, or CCTV, is a national television station of the People’s Republic of China. It was on its trial broadcast on May 1st, 1958, and formally launched on September 2nd of the year. As the most important medium in China, CCTV not only provides information to general public throughout China, but also works as an open window between China and the rest of the world.

References

Barron JL, Fleet DJ, Beauchemin S (1994) Performance of optical flow techniques. Int J Comp Vision 12:43–77
Article Google Scholar
Boreczky JS, Wilcox LD (1998) A hidden markov model framework for video segmentation using audio and image features. Proc ICASSP’98. Seattle, WA, May. pp 3741–3744
CCTV channel website: “http://cctv.cntv.cn/”
Chen X, Yang J, Zhang J, Waibel A (2004) Automatic detection and recognition of signs from natural scenes. IEEE Trans IP 13(1):87–99
Article Google Scholar
Di Zenzo S (1986) A note on the gradient of a multi-image. Comp Vision Graph Image Process 33(1):116–125
Article MATH Google Scholar
Gao J, Yang J (2001) An adaptive algorithm for text detection from Natural scenes. Proc CVPR 1:84–89
Google Scholar
Goto H (2008) Redefining the DCT-based feature for scene text detection: analysis and comparison of spatial frequency-based features. Int J Doc Anal Recognit 11(1):1–8
Article MathSciNet Google Scholar
Harris C, Stephens M (1988) A combined corner and edge detector. Fourth Alvey Vision Conf 147–151
Horn BKP (1986) Robot vision, Chapter 12. MIT Press
Hua X-S, Chert X-R, Wenyin L, Zhang H-J (2001) Automatic location of text in video frames. Proceedings of the 2001 ACM workshops on Multimedia. Sept: 24–27
Hua X, Yin P, Zhang HJ (2002) Efficient video text recognition using multiple frame integration. IEEE Int Conf Image Process (ICIP) 2:397–400
Google Scholar
Huang X, Ma H, Yuan H (2008) A novel video text detection and localization approach. IEEE Pac Rim Conf Multimed (PCM) 525–534
Kim KI, Jung K, Kim JH (2003) Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm. IEEE Trans PAMI 25(12):1631–1639
Article MathSciNet Google Scholar
Kim KC, Byun HR, Song YJ, Choi YW, Chi SY, Kim KK, Chung YK (2004) Scene text extraction in natural scene images using hierarchical feature combining and verification. ICPR 2:679–682
Google Scholar
Li H, Doermann D (2000) A video text detection system based on automated training. ICPR 2:223–226
Google Scholar
Li H, Doermann D, Kia O (2000) Automatic text detection and tracking in digital video. IEEE Trans IP 9(1):147–156
Google Scholar
Lyu MR, Song J, Cai M (2005) A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Trans CSVT 15(2):243–255
Google Scholar
Mariano VY, Kasturi R (2000) Locating uniform-colored text in video frames. ICPR 4:539–542
Google Scholar
Miao G, Huang Q, Jiang S, Gao W (2008) Coarse-to-fine video text detection. ICME 569–572
Sato T, Kanade T, Hughes E, Smith M, Satoh S (1998) Video OCR: Indexing Digital News Libraries by Recognition of Superimposed Caption. ACM Multimedia Systems Special Issue on Video Libraries. February
Shivakumara P, Phan TQ, Tan CL (2009) A gradient difference based technique for video text detection. Proc IEEE ICDAR 156–160
Sin B-K, Kim S-K, Cho B-J (2002) Locating characters in scene images using frequency features. Proc Int Conf Pattern Recog 3:489–492
Google Scholar
Singh A (1992) Optic flow computation: a unified perspective. IEEE Comput Soc Press
Soffer A (1997) Image categorization using texture features. ICDAR 1:233–237
Google Scholar
Wang Y-K, Chen J-M (2006) Detecting video texts using spatial-temporal wavelet transform. ICPR 4:754–757
Google Scholar
Wang R, Jin W, Wu L (2004) A novel video caption detection approach using multi-frame integration. ICPR 1:449–452
Google Scholar
Winger LL, Robinson JA, Jernigan ME (2000) Low-complexity character extraction in low-contrast scene images. Int J Pattern Recognit Artif Intell 14(2):113–135
Article Google Scholar
Ye Q, Huang Q (2004) A New text detection algorithm in images/video frames. PCM, LNCS 3332:858–865
Google Scholar
Yi J, Peng Y, Xiao J (2007) Color-based clustering for text detection and extraction in image. ACM MM 847–850

Download references

Acknowledgment

The authors would like to thank the reviewers for their thorough comments and suggestions that helped to improve this paper. This work is supported by the National Natural Science Foundation for Distinguished Young Scholars under Grant No. 60925010; the National Natural Science Foundation of China under Grant No. 60833009; the Cosponsored Project of Beijing Committee of Education, the Funds for Creative Research Groups of China under Grant No.61121001, and the Program for Changjiang Scholars and Innovative Research Team in University under Grant No.IRT1049.

Author information

Authors and Affiliations

Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing, 100876, China
Xiaodong Huang, Huadong Ma, Charles X. Ling & Guangyu Gao
Capital Normal University, Beijing, 100048, China
Xiaodong Huang

Authors

Xiaodong Huang
View author publications
You can also search for this author in PubMed Google Scholar
Huadong Ma
View author publications
You can also search for this author in PubMed Google Scholar
Charles X. Ling
View author publications
You can also search for this author in PubMed Google Scholar
Guangyu Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huadong Ma.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, X., Ma, H., Ling, C.X. et al. Detecting both superimposed and scene text with multiple languages and multiple alignments in video. Multimed Tools Appl 70, 1703–1727 (2014). https://doi.org/10.1007/s11042-012-1201-2

Download citation

Published: 12 August 2012
Issue Date: June 2014
DOI: https://doi.org/10.1007/s11042-012-1201-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detecting both superimposed and scene text with multiple languages and multiple alignments in video

Abstract

Access this article

Similar content being viewed by others

A Systematic Survey on CAPTCHA Recognition: Types, Creation and Breaking Techniques

Scene Text Detection and Recognition: The Deep Learning Era

Movie Description

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Detecting both superimposed and scene text with multiple languages and multiple alignments in video

Abstract

Access this article

Similar content being viewed by others

A Systematic Survey on CAPTCHA Recognition: Types, Creation and Breaking Techniques

Scene Text Detection and Recognition: The Deep Learning Era

Movie Description

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation