skip to main content
10.1145/1873951.1874093acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
short-paper

Extracting captions from videos using temporal feature

Authors Info & Claims
Published:25 October 2010Publication History

ABSTRACT

Captions in videos provide much useful semantic information for indexing and retrieving video contents. In this paper, we present an effective approach to extracting captions from videos. Its novelty comes from exploiting the temporal information in both localization and segmentation of captions. Since some simple features such as edges, corners and color are utilized, our approach is efficient. It involves four steps. First, we exploit the distribution of corners to spatially detect and locate the caption in a frame. Then the temporal localization for different captions in a video is performed by identifying the change of stroke directions. After that, we segment the caption pixels in a clip with a same caption based on the consistency and dominant distribution of caption color. Finally, the segmentation results are further refined. The experimental results on two representative movies have preliminarily verified the validity of our approach.

References

  1. K. Jung, K. I. Kim, and A. K. Jain, "Text information extraction in images and video: A survey", Pattern Recognit., vol. 37, no. 5, pp. 977--997, May 2004.Google ScholarGoogle ScholarCross RefCross Ref
  2. E. K. Wong and M. Chen, "A new robust algorithm for video text extraction", Pattern Recognition 36, pp.1397--1406, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  3. C. Liu, C. Wang and R. Dai, "Text Detection in Images Based on Unsupervised Classification of Edge-based Features", IEEE ICDAR, pp. 610--614, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. X. S. Hua, X. R. Chen, W. Y. Liu and H. J. Zhong, "Automatic location of text in video frames", In Proc. of the 3rd Intl. Workshop on Multimedia Information Retrieval, Ottawa, Canada, October, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. H. Hase, T. Shinokawa, M. Yoneda, and C. Y. Suen, "Character String Extraction from Color Documents", Pattern Recognition, 34 (7), pp.1349--1365, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  6. X. L. Li, W. Q. Wang, S. Q. Jiang, Q. M. Huang and W. Gao, "Fast and Effective Text Detection", IEEE International Conference on Image Processing, San Diego, California, U.S.A., pp.969--972, Oct. 2008.Google ScholarGoogle ScholarCross RefCross Ref
  7. Q. Liu, C. Jung, and Y. Moon, "Text segmentation based on stroke filter", In Proceedings of the 14th Annual ACM international Conference on Multimedia (Santa Barbara, CA, USA, October), pp. 129--132, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. V. C. Dinh et al, "An Efficient Method for Text Detection in Video Based on Stroke Width Similarity", ACCV, Part I, LNCS 4843, pp. 200--209, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. K. Jain and S. Bhattacharjee, "Text Segmentation using Gabor Filters for Automatic Document Processing", Machine Vision and Applications, vol.5, pp.169--184, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. H. Park, K. I. Kim, K. Jung, and H. J. Kim, "Locating Car License Plates using Neural Networks", IEEE Electronics Letters, 35 (17), pp.1475--1477, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  11. K. C. Jung, J. H. Han, K. I. Kim, and S. H. Park, "Support vector machines for text location in news video images", in Proc. IEEE Region 10 Conf. Syst. Technolog. Next Millennium, vol. 2, pp. 176--180, 2000.Google ScholarGoogle Scholar
  12. X. Tang, X. Gao, J. Liu, and H. J. Zhang, "A spatial temporal approach for video caption detection and recognition", IEEE Trans. on Neural Networks, special issue on intelligent multimedia processing, July, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. X. Tang, B. Luo, X. Gao, E. Pissaloux, and H. Zhang, "Video text extraction using temporal feature vectors", in Proc. of IEEE International Conference on Multimedia and Expo, Lausanne, Switzerland, Aug. 2002.Google ScholarGoogle ScholarCross RefCross Ref
  14. S. M. Smith, J. M. Brady, "SUSAN-A New Approach to Low Level Image Processing", Int. Jour. of Computer Vision. 23(1), pp. 45--78, May 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Shi, C. Tomasi, "Good features to track", 9th IEEE Conference on Computer Vision and Pattern Recognition, June 1994.Google ScholarGoogle Scholar

Index Terms

  1. Extracting captions from videos using temporal feature

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          MM '10: Proceedings of the 18th ACM international conference on Multimedia
          October 2010
          1836 pages
          ISBN:9781605589336
          DOI:10.1145/1873951

          Copyright © 2010 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 October 2010

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • short-paper

          Acceptance Rates

          Overall Acceptance Rate995of4,171submissions,24%

          Upcoming Conference

          MM '24
          MM '24: The 32nd ACM International Conference on Multimedia
          October 28 - November 1, 2024
          Melbourne , VIC , Australia

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader