Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5226))

Included in the following conference series:

  • 1646 Accesses

Abstract

We present a heuristic approach to enhancing speech synchronized captions for video OCR, as a pre-process for subsequent tasks of multimedia indexing, segmentation and retrieval. We use a bi-search based caption transition detection method to improve efficiency, which adopts a simple heuristics that the same caption content usually lasts for a period for stable viewing. We propose a combination of color mask, changing mask and region mask to perform caption enhancement based on the discriminative characteristics of captions and backgrounds. Elaborate enhancement on individual characters is further used to remove small background residues. OCR experiments show that our caption enhancement approach brings a high character accuracy of 89.24%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. TRECVID, http://www-nlpir.nist.gov/projects/t01v/

  2. Xie, L., Liu, C., Meng, H.: Combined Use of Speaker- and Tone-Normalized Pitch Reset with Pause Duration for Automatic Story Segmentation in Mandarin Broadcast News. In: Proc. HLT-NAACL, pp. 193–196 (2007)

    Google Scholar 

  3. Xie, L., Zeng, J., Feng, W.: Multi-scale TextTiling for Automatic Story Segmentation in Chinese Broadcast News. In: Li, H., Liu, T., Ma, W.-Y., Sakai, T., Wong, K.-F., Zhou, G. (eds.) AIRS 2008. LNCS, vol. 4993, pp. 345–355. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  4. Tang, X.-O., Gao, X.-B., Liu, J.-Z., Zhang, H.-J.: A Spatial-Temporal Approach for Video Caption Detection and Recognition. IEEE Trans. Neural Networks. 13(4), 961–971 (2002)

    Article  Google Scholar 

  5. Lyu, M., Song, J., Cai, M.: A Comprehensive Method for Multilingual Video Text Detection, Locolization and Extraction. IEEE Trans. Circuits and Systems for Video Technology. 15, 243–255 (2005)

    Article  Google Scholar 

  6. Sato, T., Kanade, T., Hughes, E.K., Smith, M.A., Satoh, S.: Video OCR: indexing digital news libraries by recognition of superimposed captions. Multimedia Systems 7, 385–395 (1999)

    Article  Google Scholar 

  7. Otsu, N.: A threshold selection method from gray level histograms. IEEE Trans. Systems, Man and Cybernetics 9, 62–66 (1979)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xie, L., Tan, X. (2008). A Heuristic Approach to Caption Enhancement for Effective Video OCR. In: Huang, DS., Wunsch, D.C., Levine, D.S., Jo, KH. (eds) Advanced Intelligent Computing Theories and Applications. With Aspects of Theoretical and Methodological Issues. ICIC 2008. Lecture Notes in Computer Science, vol 5226. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87442-3_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87442-3_44

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87440-9

  • Online ISBN: 978-3-540-87442-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics