Fast Text Caption Localization on Video Using Visual Rhythm

Chun, Seong Soo; Kim, Hyeokman; Jung-Rim, Kim; Oh, Sangwook; Sull, Sanghoon

doi:10.1007/3-540-45925-1_24

Seong Soo Chun⁶,
Hyeokman Kim⁷,
Kim Jung-Rim⁶,
Sangwook Oh⁶ &
…
Sanghoon Sull⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2314))

Included in the following conference series:

International Conference on Advances in Visual Information Systems

676 Accesses
6 Citations

Abstract

In this paper, a fast DCT-based algorithm is proposed to efficiently locate text captions embedded on specific areas in a video sequence through visual rhythm, which can be fast constructed by sampling certain portions of a DC image sequence and temporally accumulating the samples along time. Our proposed approach is based on the observations that the text captions carrying important information suitable for indexing often appear on specific areas on video frames, from where sampling strategies are derived for a visual rhythm. Our method then uses a combination of contrast and temporal coherence information on the visual rhythm to detect text frames such that each detected text frame represents consecutive frames containing identical text strings, thus significantly reducing the amount of text frames needed to be examined for text localization from a video sequence. It then utilizes several important properties of text caption to locate the text caption from the detected frames.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ohya, J., Shio, A., Akamatsu, S.: Recognizing Characters in Scene Image. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 16. (1994) 214–220
Article Google Scholar
Haupmann, A., Smith, M.: Text, Speech, and Vision for Video Segmentation: The Informedia Project. AAAI Symposium on Computational Models for Integrating Language and Vision, (1995)
Google Scholar
Shim, J., Dorai, C., Bolle, R.: Automatic Text Extraction from Video for Content-Based Annotation and Retrieval. IEEE International Conference on Pattern Recognition, Vol. 1. (1998) 618–620
Google Scholar
Wu, V., Manmatha, R., Riseman, E.: Finding Text in Images. Proceedings of the 2^nd ACM International conference on Digital Libraries (1997) 3–12
Google Scholar
Lienhart, R.: Automatic Text Recognition for Video Indexing. Proceedings of ACM Multimedia (1996) 11–20
Google Scholar
Li, H., Doermann, D., Kia, O.: Automatic Text Detection and Tracking in Digital Video. IEEE Transactions on Image Processing, Vol. 9. (2000) 147–156
Article Google Scholar
Sato, T., Kanade, T., Hughes, E., Smith, M., Satoh, S.: Video OCR: Indexing Digital News Libraries by Recognition of Superimposed Caption. ACM Multimedia Systems, Vol. 7 (1998) 385–394
Article Google Scholar
Yeo, B.L., Liu, B.: Visual Content Highlighting Via Automatic Extraction of Embedded Captions on MPEG Compressed Video. IS&T/SPIE/IS&T Symposium on Electronic Imaging: Digital Video Compression, (1996)
Google Scholar
Zhong, Y., Karu, K. Jain, A.: Automatic Caption Localization in Compressed Video. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22. (2000) 385–392
Article Google Scholar
Zhang, Y., Chua, T.: Detection of Text Captions in Compressed Domain Video. Proceedings of Multimedia Information Retrieval ACM Multimedia. (2000) 201–204
Google Scholar
Yeo, B.L., Liu, B.: Rapid Scene Analysis on Compressed Video. IEEE Transactions on Circuit and Systems for Video Technology, Vol. 5. (1995) 533–544
Article Google Scholar
Kim, H., Lee, J., Song, S.M.: An Efficient Graphical Shot Verifier Incorporating Visual Rhythm. Proceedings of IEEE International Conference on Multimedia Computing and Systems. (1999) 827–834
Google Scholar
Song, J., Yeo, B.L.: Spatially Reduced Image Extraction from MPEG-2 Video: Fast Algorithms and Application. Proceedings of SPIE Storage and Retrieval for Image and Video Database VI, Vol. 3312. (1998) 92–107
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical Engineering, Korea University, Seoul, Korea
Seong Soo Chun, Kim Jung-Rim, Sangwook Oh & Sanghoon Sull
School of Computer Science, Kookmin University, Seoul, Korea
Hyeokman Kim

Authors

Seong Soo Chun
View author publications
You can also search for this author in PubMed Google Scholar
Hyeokman Kim
View author publications
You can also search for this author in PubMed Google Scholar
Kim Jung-Rim
View author publications
You can also search for this author in PubMed Google Scholar
Sangwook Oh
View author publications
You can also search for this author in PubMed Google Scholar
Sanghoon Sull
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Knowledge Systems Institute, 3420 Main Street, 60076, Skokie, IL, USA
Shi-Kuo Chang
Dept. of Comp. Science & Information Engineering, National Chiao Tung University, 1001 Ta HsuehRoad, Hsin Chu, Taiwan
Zen Chen & Suh-Yin Lee &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chun, S.S., Kim, H., Jung-Rim, K., Oh, S., Sull, S. (2002). Fast Text Caption Localization on Video Using Visual Rhythm. In: Chang, SK., Chen, Z., Lee, SY. (eds) Recent Advances in Visual Information Systems. VISUAL 2002. Lecture Notes in Computer Science, vol 2314. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45925-1_24

Download citation

DOI: https://doi.org/10.1007/3-540-45925-1_24
Published: 23 April 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43358-3
Online ISBN: 978-3-540-45925-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics