Skip to main content
Log in

Caption analysis and recognition for building video indexing systems

  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract.

In this paper, we propose several methods for analyzing and recognizing Chinese video captions, which constitute a very useful information source for video content. Image binarization, performed by combining a global threshold method and a window-based method, is used to obtain clearer images of characters, and a caption-tracking scheme is used to locate caption regions and detect caption changes. The separation of characters from possibly complex backgrounds is achieved by using size and color constraints and by cross examination of multiframe images. To segment individual characters, we use a dynamic split-and-merge strategy. Finally, we propose a character recognition process using a prototype classification method, supplemented by a disambiguation process using support vector machines, to improve recognition outcomes. This is followed by a postprocess that integrates multiple recognition results. The overall accuracy rate for the entire process applied to test video films is 94.11%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Antani S, Crandall D, Kasturi R (2000) Robust extraction of text in video. In: Proceedings of the IEEE international conference on pattern recognition, 1:831-834

  2. Aslandogan YA, Yu CT (1999) Techniques and systems for image and video retrieval. IEEE Trans Knowl Data Eng 11:56-63

    Article  Google Scholar 

  3. Chang CC, Lin CJ (2001b) LIBSVM - A library for support vector machines. http://www.csie.edu.tw/~cjlin/libsvm/

  4. Chang F (2001) Retrieving information from document images: problems and solutions. Int J Doc Anal Recog 4:46-55

    Google Scholar 

  5. Chang F, Liang KH, Tan TM, Hwang WL (1999) Binarization of document images using Hadamard multiresolution analysis. In: 5th international conference on document analysis and recognition, Bangalore, India

  6. Chang F, Chen CJ, Lu CJ (2004) A linear-time component-labeling algorithm using contour tracing technique. Comput Vis Image Understand 93:206-220

    Article  Google Scholar 

  7. Dasarathy BV (1991) NN concepts and techniques, nearest neighbor (NN) norms: NN pattern classification techniques. IEEE Press, New York, pp 1-30

  8. Doermann D, Liang J, Li H (2003) Progress in camera-based document image analysis. In: Proceedings of the IEEE international conference on document analysis and recognition, pp 606-616

  9. Hua XS, Yin P, Zhang HJ (2002) Efficient video text recognition using multiple frame integration. In: Proceedings of the IEEE international conference on image processing, 2:397-400

  10. Jain AK, Yu B (1998) Automatic text location in images and video frames. In: Proceedings of the IEEE international conference on pattern recognition, 2:1497-1499

  11. Kamada H, Fujimoto K (1999) High-speed, High-accuracy binarization method for recognizing text in images of low spatial resolutions. In Proceedings of the 5th international conference on document analysis and recognition, pp 139-142

  12. Kim EY, Kim KI, Jung K, Kim HJ (2000) A video indexing system using character recognition. In: Proceedings of the international conference on consumer electronics, pp 358-359

  13. Knerr S, Personnaz L, and Dreyfus G (1990) Single-layer learning revisited: a stepwise procedure for building and training a neural network. In: Neurocomputing: algorithms, architectures and applications. Springer, Berlin Heidelberg New York

  14. Kuwano H, Taniguchi Y, Arai H, Mori M, Kurakake S, Kojima H (2000) Telop-on-demand: video structuring and retrieval based on text recognition. In: Proceedings of the IEEE international conference on multimedia and expo, 2:759-762

  15. Lee SW, Lee DJ, Park HS (1996) A new methodology for gray-scale character segmentation and recognition. IEEE Trans Pattern Anal Mach Intell 18:1045-1050

    Article  Google Scholar 

  16. Li H, Doermann D (1999) Text enhancement in digital video using multiple frame integration. ACM Multimedia 1:19-22

    Google Scholar 

  17. Li H, Doermann D, Kia O (2000) Automatic text detection and tracking in digital video. IEEE Trans Image Process 9:147-156

    Google Scholar 

  18. Lienhart R (2003) Video OCR: a survey and practitioner’s guide. Kluwer, Dordrecht

    Google Scholar 

  19. Lienhart R, Effelsberg W (2000) Automatic text segmentation and text recognition for video indexing. Multimedia Syst 8:69-81

    Article  Google Scholar 

  20. Lienhart R, Wernicke A (2002) Localizing and segmenting text in images and videos. IEEE Trans Circuits Syst Video Technol 12:256-268

    Article  Google Scholar 

  21. Lin CJ, Liu CC, Chen HH (2001) A simple method for Chinese video OCR and its application to question answering. Int J Comput Linguist Chinese Lang Process 6:11-30

    MATH  Google Scholar 

  22. Lu Y (1995) Machine printed character segmentation - an overview. Pattern Recog 28:67-80

    Article  Google Scholar 

  23. Mita T, Hori O (2001) Improvement of video text recognition by character selection. In: Proceedings of the IEEE international conference on document analysis and recognition, pp 1089-1093

  24. Otsu N (1979) A threshold selection method from gray-scale histograms. IEEE Trans Syst Man Cybern 1:62-66

    Google Scholar 

  25. Platt JC, Cristianini N, Shawe-Taylor J (2000) Large margin DAG’s for multiclass classification. In: Advances in neural information processing systems. MIT Press, Cambridge, MA, pp 547-553

  26. Sato T, Kanade T, Hughes EK, Smith MA, Satoh S (1999) Video OCR: indexing digital news libraries by recognition of superimposed captions. Multimedia Syst 7:385-395

    Article  Google Scholar 

  27. Smith MA, Kanade T (1997) Video skimming and characterization through the combination of image and language understanding techniques. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Puerto Rico, pp 775-781

  28. Shim JC, Dorai C, Bolle R (1998) Automatic text extraction from video for content-based annotation and retrieval. In: Proceedings of the international conference on pattern recognition, 1:16-20

  29. Vapnik V (1995) The nature of statistical learning theory. Springer, Berlin Heidelberg New York

  30. Wong EK, Chen M (2000) A robust algorithm for text extraction in color video. In: IEEE international conference on multimedia and expo, 2:797-800

  31. Wu V, Manmatha R, Riseman EM (1999) TextFinder: an automatic system to detect and recognize text in images. IEEE Trans Pattern Anal Mach Intell 21:1224-1229

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fu Chang.

Additional information

Published online: 2 February 2005

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chang, F., Chen, GC., Lin, CC. et al. Caption analysis and recognition for building video indexing systems. Multimedia Systems 10, 344–355 (2005). https://doi.org/10.1007/s00530-004-0159-y

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-004-0159-y

Keywords:

Navigation