Skip to main content
Log in

Detecting both superimposed and scene text with multiple languages and multiple alignments in video

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Video text often contains highly useful semantic information that can contribute significantly to video retrieval and understanding. Video text can be classified into scene text and superimposed text. Most of the previous methods detect superimposed or scene text separately due to different text alignments. Moreover, because different language characters have different edge and texture features, it is very difficult to detect the multilingual text. In this paper, we first perform a detailed analysis of motion patterns of video text, and show that the superimposed and scene text exhibit different motion patterns on consecutive frames, which is insensitive to multiple language characters and multiple text alignments. Based on our analysis, we define Motion Perception Field (MPF) to represent the text motion patterns. Finally, we propose a text detection algorithms using MPF for both superimposed and scene text with multiple languages and multiple alignments. Experimental results on diverse videos demonstrate that our algorithms are robust, and outperform previous methods for detecting both superimposed and scene texts with multiple languages and multiple alignments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. China Central Television, or CCTV, is a national television station of the People’s Republic of China. It was on its trial broadcast on May 1st, 1958, and formally launched on September 2nd of the year. As the most important medium in China, CCTV not only provides information to general public throughout China, but also works as an open window between China and the rest of the world.

References

  1. Barron JL, Fleet DJ, Beauchemin S (1994) Performance of optical flow techniques. Int J Comp Vision 12:43–77

    Article  Google Scholar 

  2. Boreczky JS, Wilcox LD (1998) A hidden markov model framework for video segmentation using audio and image features. Proc ICASSP’98. Seattle, WA, May. pp 3741–3744

  3. CCTV channel website: “http://cctv.cntv.cn/

  4. Chen X, Yang J, Zhang J, Waibel A (2004) Automatic detection and recognition of signs from natural scenes. IEEE Trans IP 13(1):87–99

    Article  Google Scholar 

  5. Di Zenzo S (1986) A note on the gradient of a multi-image. Comp Vision Graph Image Process 33(1):116–125

    Article  MATH  Google Scholar 

  6. Gao J, Yang J (2001) An adaptive algorithm for text detection from Natural scenes. Proc CVPR 1:84–89

    Google Scholar 

  7. Goto H (2008) Redefining the DCT-based feature for scene text detection: analysis and comparison of spatial frequency-based features. Int J Doc Anal Recognit 11(1):1–8

    Article  MathSciNet  Google Scholar 

  8. Harris C, Stephens M (1988) A combined corner and edge detector. Fourth Alvey Vision Conf 147–151

  9. Horn BKP (1986) Robot vision, Chapter 12. MIT Press

  10. Hua X-S, Chert X-R, Wenyin L, Zhang H-J (2001) Automatic location of text in video frames. Proceedings of the 2001 ACM workshops on Multimedia. Sept: 24–27

  11. Hua X, Yin P, Zhang HJ (2002) Efficient video text recognition using multiple frame integration. IEEE Int Conf Image Process (ICIP) 2:397–400

    Google Scholar 

  12. Huang X, Ma H, Yuan H (2008) A novel video text detection and localization approach. IEEE Pac Rim Conf Multimed (PCM) 525–534

  13. Kim KI, Jung K, Kim JH (2003) Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm. IEEE Trans PAMI 25(12):1631–1639

    Article  MathSciNet  Google Scholar 

  14. Kim KC, Byun HR, Song YJ, Choi YW, Chi SY, Kim KK, Chung YK (2004) Scene text extraction in natural scene images using hierarchical feature combining and verification. ICPR 2:679–682

    Google Scholar 

  15. Li H, Doermann D (2000) A video text detection system based on automated training. ICPR 2:223–226

    Google Scholar 

  16. Li H, Doermann D, Kia O (2000) Automatic text detection and tracking in digital video. IEEE Trans IP 9(1):147–156

    Google Scholar 

  17. Lyu MR, Song J, Cai M (2005) A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Trans CSVT 15(2):243–255

    Google Scholar 

  18. Mariano VY, Kasturi R (2000) Locating uniform-colored text in video frames. ICPR 4:539–542

    Google Scholar 

  19. Miao G, Huang Q, Jiang S, Gao W (2008) Coarse-to-fine video text detection. ICME 569–572

  20. Sato T, Kanade T, Hughes E, Smith M, Satoh S (1998) Video OCR: Indexing Digital News Libraries by Recognition of Superimposed Caption. ACM Multimedia Systems Special Issue on Video Libraries. February

  21. Shivakumara P, Phan TQ, Tan CL (2009) A gradient difference based technique for video text detection. Proc IEEE ICDAR 156–160

  22. Sin B-K, Kim S-K, Cho B-J (2002) Locating characters in scene images using frequency features. Proc Int Conf Pattern Recog 3:489–492

    Google Scholar 

  23. Singh A (1992) Optic flow computation: a unified perspective. IEEE Comput Soc Press

  24. Soffer A (1997) Image categorization using texture features. ICDAR 1:233–237

    Google Scholar 

  25. Wang Y-K, Chen J-M (2006) Detecting video texts using spatial-temporal wavelet transform. ICPR 4:754–757

    Google Scholar 

  26. Wang R, Jin W, Wu L (2004) A novel video caption detection approach using multi-frame integration. ICPR 1:449–452

    Google Scholar 

  27. Winger LL, Robinson JA, Jernigan ME (2000) Low-complexity character extraction in low-contrast scene images. Int J Pattern Recognit Artif Intell 14(2):113–135

    Article  Google Scholar 

  28. Ye Q, Huang Q (2004) A New text detection algorithm in images/video frames. PCM, LNCS 3332:858–865

    Google Scholar 

  29. Yi J, Peng Y, Xiao J (2007) Color-based clustering for text detection and extraction in image. ACM MM 847–850

Download references

Acknowledgment

The authors would like to thank the reviewers for their thorough comments and suggestions that helped to improve this paper. This work is supported by the National Natural Science Foundation for Distinguished Young Scholars under Grant No. 60925010; the National Natural Science Foundation of China under Grant No. 60833009; the Cosponsored Project of Beijing Committee of Education, the Funds for Creative Research Groups of China under Grant No.61121001, and the Program for Changjiang Scholars and Innovative Research Team in University under Grant No.IRT1049.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huadong Ma.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, X., Ma, H., Ling, C.X. et al. Detecting both superimposed and scene text with multiple languages and multiple alignments in video. Multimed Tools Appl 70, 1703–1727 (2014). https://doi.org/10.1007/s11042-012-1201-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-012-1201-2

Keywords

Navigation