Skip to main content

Fast Text Caption Localization on Video Using Visual Rhythm

  • Conference paper
  • First Online:
Recent Advances in Visual Information Systems (VISUAL 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2314))

Included in the following conference series:

Abstract

In this paper, a fast DCT-based algorithm is proposed to efficiently locate text captions embedded on specific areas in a video sequence through visual rhythm, which can be fast constructed by sampling certain portions of a DC image sequence and temporally accumulating the samples along time. Our proposed approach is based on the observations that the text captions carrying important information suitable for indexing often appear on specific areas on video frames, from where sampling strategies are derived for a visual rhythm. Our method then uses a combination of contrast and temporal coherence information on the visual rhythm to detect text frames such that each detected text frame represents consecutive frames containing identical text strings, thus significantly reducing the amount of text frames needed to be examined for text localization from a video sequence. It then utilizes several important properties of text caption to locate the text caption from the detected frames.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ohya, J., Shio, A., Akamatsu, S.: Recognizing Characters in Scene Image. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 16. (1994) 214–220

    Article  Google Scholar 

  2. Haupmann, A., Smith, M.: Text, Speech, and Vision for Video Segmentation: The Informedia Project. AAAI Symposium on Computational Models for Integrating Language and Vision, (1995)

    Google Scholar 

  3. Shim, J., Dorai, C., Bolle, R.: Automatic Text Extraction from Video for Content-Based Annotation and Retrieval. IEEE International Conference on Pattern Recognition, Vol. 1. (1998) 618–620

    Google Scholar 

  4. Wu, V., Manmatha, R., Riseman, E.: Finding Text in Images. Proceedings of the 2nd ACM International conference on Digital Libraries (1997) 3–12

    Google Scholar 

  5. Lienhart, R.: Automatic Text Recognition for Video Indexing. Proceedings of ACM Multimedia (1996) 11–20

    Google Scholar 

  6. Li, H., Doermann, D., Kia, O.: Automatic Text Detection and Tracking in Digital Video. IEEE Transactions on Image Processing, Vol. 9. (2000) 147–156

    Article  Google Scholar 

  7. Sato, T., Kanade, T., Hughes, E., Smith, M., Satoh, S.: Video OCR: Indexing Digital News Libraries by Recognition of Superimposed Caption. ACM Multimedia Systems, Vol. 7 (1998) 385–394

    Article  Google Scholar 

  8. Yeo, B.L., Liu, B.: Visual Content Highlighting Via Automatic Extraction of Embedded Captions on MPEG Compressed Video. IS&T/SPIE/IS&T Symposium on Electronic Imaging: Digital Video Compression, (1996)

    Google Scholar 

  9. Zhong, Y., Karu, K. Jain, A.: Automatic Caption Localization in Compressed Video. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22. (2000) 385–392

    Article  Google Scholar 

  10. Zhang, Y., Chua, T.: Detection of Text Captions in Compressed Domain Video. Proceedings of Multimedia Information Retrieval ACM Multimedia. (2000) 201–204

    Google Scholar 

  11. Yeo, B.L., Liu, B.: Rapid Scene Analysis on Compressed Video. IEEE Transactions on Circuit and Systems for Video Technology, Vol. 5. (1995) 533–544

    Article  Google Scholar 

  12. Kim, H., Lee, J., Song, S.M.: An Efficient Graphical Shot Verifier Incorporating Visual Rhythm. Proceedings of IEEE International Conference on Multimedia Computing and Systems. (1999) 827–834

    Google Scholar 

  13. Song, J., Yeo, B.L.: Spatially Reduced Image Extraction from MPEG-2 Video: Fast Algorithms and Application. Proceedings of SPIE Storage and Retrieval for Image and Video Database VI, Vol. 3312. (1998) 92–107

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chun, S.S., Kim, H., Jung-Rim, K., Oh, S., Sull, S. (2002). Fast Text Caption Localization on Video Using Visual Rhythm. In: Chang, SK., Chen, Z., Lee, SY. (eds) Recent Advances in Visual Information Systems. VISUAL 2002. Lecture Notes in Computer Science, vol 2314. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45925-1_24

Download citation

  • DOI: https://doi.org/10.1007/3-540-45925-1_24

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43358-3

  • Online ISBN: 978-3-540-45925-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics