Skip to main content
Log in

Extraction of special effects caption text events from digital video

  • Original Research Paper
  • Published:
International Journal on Document Analysis and Recognition Aims and scope Submit manuscript

Abstract. The popularity of digital video is increasing rapidly. To help users navigate libraries of video, algorithms that automatically index video based on content are needed. One approach is to extract text appearing in video, which often reflects a scene's semantic content. This is a difficult problem due to the unconstrained nature of general-purpose video. Text can have arbitrary color, size, and orientation. Backgrounds may be complex and changing. Most work so far has made restrictive assumptions about the nature of text occurring in video. Such work is therefore not directly applicable to unconstrained, general-purpose video. In addition, most work so far has focused only on detecting the spatial extent of text in individual video frames. However, text occurring in video usually persists for several seconds. This constitutes a text event that should be entered only once in the video index. Therefore it is also necessary to determine the temporal extent of text events. This is a non-trivial problem because text may move, rotate, grow, shrink, or otherwise change over time. Such text effects are common in television programs and commercials but so far have received little attention in the literature. This paper discusses detecting, binarizing, and tracking caption text in general-purpose MPEG-1 video. Solutions are proposed for each of these problems and compared with existing work found in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Author information

Authors and Affiliations

Authors

Additional information

Received: January 29, 2002 / Accepted: September 13, 2002

D. Crandall is now with Eastman Kodak Company, 1700 Dewey Avenue, Rochester, NY 14650-1816, USA; e-mail: david.crandall@kodak.com

S. Antani is now with the National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894, USA; e-mail: antani@nlm.nih.gov

Correspondence to: David Crandall

Rights and permissions

Reprints and permissions

About this article

Cite this article

Crandall, D., Antani, S. & Kasturi, R. Extraction of special effects caption text events from digital video. IJDAR 5, 138–157 (2003). https://doi.org/10.1007/s10032-002-0091-7

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-002-0091-7

Navigation