ABSTRACT
Captions in videos provide much useful semantic information for indexing and retrieving video contents. In this paper, we present an effective approach to extracting captions from videos. Its novelty comes from exploiting the temporal information in both localization and segmentation of captions. Since some simple features such as edges, corners and color are utilized, our approach is efficient. It involves four steps. First, we exploit the distribution of corners to spatially detect and locate the caption in a frame. Then the temporal localization for different captions in a video is performed by identifying the change of stroke directions. After that, we segment the caption pixels in a clip with a same caption based on the consistency and dominant distribution of caption color. Finally, the segmentation results are further refined. The experimental results on two representative movies have preliminarily verified the validity of our approach.
- K. Jung, K. I. Kim, and A. K. Jain, "Text information extraction in images and video: A survey", Pattern Recognit., vol. 37, no. 5, pp. 977--997, May 2004.Google ScholarCross Ref
- E. K. Wong and M. Chen, "A new robust algorithm for video text extraction", Pattern Recognition 36, pp.1397--1406, 2003.Google ScholarCross Ref
- C. Liu, C. Wang and R. Dai, "Text Detection in Images Based on Unsupervised Classification of Edge-based Features", IEEE ICDAR, pp. 610--614, 2005. Google ScholarDigital Library
- X. S. Hua, X. R. Chen, W. Y. Liu and H. J. Zhong, "Automatic location of text in video frames", In Proc. of the 3rd Intl. Workshop on Multimedia Information Retrieval, Ottawa, Canada, October, 2001. Google ScholarDigital Library
- H. Hase, T. Shinokawa, M. Yoneda, and C. Y. Suen, "Character String Extraction from Color Documents", Pattern Recognition, 34 (7), pp.1349--1365, 2001.Google ScholarCross Ref
- X. L. Li, W. Q. Wang, S. Q. Jiang, Q. M. Huang and W. Gao, "Fast and Effective Text Detection", IEEE International Conference on Image Processing, San Diego, California, U.S.A., pp.969--972, Oct. 2008.Google ScholarCross Ref
- Q. Liu, C. Jung, and Y. Moon, "Text segmentation based on stroke filter", In Proceedings of the 14th Annual ACM international Conference on Multimedia (Santa Barbara, CA, USA, October), pp. 129--132, 2006. Google ScholarDigital Library
- V. C. Dinh et al, "An Efficient Method for Text Detection in Video Based on Stroke Width Similarity", ACCV, Part I, LNCS 4843, pp. 200--209, 2007. Google ScholarDigital Library
- A. K. Jain and S. Bhattacharjee, "Text Segmentation using Gabor Filters for Automatic Document Processing", Machine Vision and Applications, vol.5, pp.169--184, 1992. Google ScholarDigital Library
- S. H. Park, K. I. Kim, K. Jung, and H. J. Kim, "Locating Car License Plates using Neural Networks", IEEE Electronics Letters, 35 (17), pp.1475--1477, 1999.Google ScholarCross Ref
- K. C. Jung, J. H. Han, K. I. Kim, and S. H. Park, "Support vector machines for text location in news video images", in Proc. IEEE Region 10 Conf. Syst. Technolog. Next Millennium, vol. 2, pp. 176--180, 2000.Google Scholar
- X. Tang, X. Gao, J. Liu, and H. J. Zhang, "A spatial temporal approach for video caption detection and recognition", IEEE Trans. on Neural Networks, special issue on intelligent multimedia processing, July, 2002. Google ScholarDigital Library
- X. Tang, B. Luo, X. Gao, E. Pissaloux, and H. Zhang, "Video text extraction using temporal feature vectors", in Proc. of IEEE International Conference on Multimedia and Expo, Lausanne, Switzerland, Aug. 2002.Google ScholarCross Ref
- S. M. Smith, J. M. Brady, "SUSAN-A New Approach to Low Level Image Processing", Int. Jour. of Computer Vision. 23(1), pp. 45--78, May 1997. Google ScholarDigital Library
- J. Shi, C. Tomasi, "Good features to track", 9th IEEE Conference on Computer Vision and Pattern Recognition, June 1994.Google Scholar
Index Terms
Extracting captions from videos using temporal feature
Recommendations
Extracting Figures and Captions from Scientific Publications
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge ManagementFigures and captions convey essential information in scientific publications. As such, there is a growing interest in mining published figures and in utilizing their respective captions as a source of knowledge. There is also much interest in image ...
Robustly Extracting Captions in Videos Based on Stroke-Like Edges and Spatio-Temporal Analysis
This paper presents an effective and efficient approach to extracting captions from videos. The robustness of our system comes from two aspects of contributions. First, we propose a novel stroke-like edge detection method based on contours, which can ...
Caption Detection, Localization and Type Recognition in Arabic News Video
INFOS '16: Proceedings of the 10th International Conference on Informatics and SystemsIn this paper, we propose a method to detect and localize all caption types in Arabic news videos. Moreover, different types of captions are considered including static, horizontal scrolling and vertical scrolling captions. Our method is able to deal ...
Comments