Robust Segmentation for Video Captions with Complex Backgrounds

Xing, Zong-Heng; Zhou, Fang; Tian, Shu; Yin, Xu-Cheng

doi:10.1007/978-981-10-3005-5_8

Zong-Heng Xing¹⁶,
Fang Zhou¹⁶,
Shu Tian¹⁶ &
…
Xu-Cheng Yin¹⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 663))

Included in the following conference series:

Chinese Conference on Pattern Recognition

2272 Accesses

Abstract

Caption text contains rich information that can be used for video indexing and summarization. In this paper, we propose an effective caption text segmentation approach to improve OCR accuracy. Here, an AlexNet CNN is first trained with path signature for text tracking. Then we utilize an improved adaptive thresholding method to segment caption text in individual frames. Finally, the multi-frame integration is conducted with gamma correction and region growing. In contrast to conventional methods which extract video text in individual frames independently, we exploit the specific temporal characteristics of videos to perform segmentation. Moreover, the proposed method can effectively remove the complex backgrounds with similar intensity to text. Experimental results on different videos and comparisons with other methods show the efficiency of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wang, X., Huang, L., Liu, C.: A video text location method based on background classification. Int. J. Doc. Anal. Recogn. (IJDAR) 13(3), 173–186 (2010)
Article Google Scholar
Kim, W., Kim, C.: A new approach for overlay text detection and extraction from complex video scene. IEEE Trans. Image Process 18(2), 401–411 (2009)
Article MathSciNet Google Scholar
Liu, Y., Srihari, S.N.: Document image binarization based on texture features. IEEE Trans. Pattern Anal. Mach. Intell. 19(5), 540–544 (1997)
Article Google Scholar
Cheriet, M., Said, J.N., Suen, C.Y.: A recursive thresholding technique for image segmentation. IEEE Trans. Image Process. 7(6), 918–921 (1998)
Article Google Scholar
Otsu, N.: A threshold selection method from gray-level histograms. Automatica 11(285–296), 23–27 (1975)
Google Scholar
Ohya, J., Shio, A., Akamatsu, S.: Recognizing characters in scene images. IEEE Trans. Pattern Anal. Mach. Intell. 16(2), 214–220 (1994)
Article Google Scholar
Lyu, M.R., Song, J., Cai, M.: A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Trans. Circ. Syst. Video Technol. 15(2), 243–255 (2005)
Article Google Scholar
Ye, Q., Gao, W., Huang, Q.: Automatic text segmentation from complex background. In: 2004 International Conference on Image Processing, ICIP 2004, vol. 5, pp. 2905–2908. IEEE (2004)
Google Scholar
Wang, X., Huang, L., Liu, C.: A novel method for embedded text segmentation based on stroke and color. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 151–155. IEEE (2011)
Google Scholar
Mishra, A., Alahari, K., Jawahar, C.V.: An MRF model for binarization of natural scene text. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 11–16. IEEE (2011)
Google Scholar
Wu, V., Manmatha, R., Riseman, E.M.: Textfinder: an automatic system to detect and recognize text in images. IEEE Trans. Pattern Anal. Mach. Intell. 11, 1224–1229 (1999)
Article Google Scholar
Wakahara, T., Kita, K.: Binarization of color character strings in scene images using K-means clustering and support vector machines. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 274–278. IEEE (2011)
Google Scholar
Mancas-Thillou, C., Gosselin, B.: Spatial and color spaces combination for natural scene text extraction. In: 2006 IEEE International Conference on Image Processing, pp. 985–988. IEEE (2006)
Google Scholar
Lienhart, R., Wernicke, A.: Localizing and segmenting text in images and videos. IEEE Trans. Circ. Syst. Video Technol. 12(4), 256–268 (2002)
Article Google Scholar
Liu, X., Wang, W.: Robustly extracting captions in videos based on stroke-like edges and spatio-temporal analysis. IEEE Trans. Multimedia 14(2), 482–489 (2012)
Article Google Scholar
Phan, T.Q., Shivakumara, P., Lu, T. et al.: Recognition of video text through temporal integration. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 589–593. IEEE (2013)
Google Scholar
Chen, K.T.: Integration of paths–a faithful representation of paths by noncommutative formal power series. Trans. Am. Math. Soc. 89(2), 395–407 (1958)
MATH Google Scholar
Graham, B.: Sparse arrays of signatures for online character recognition. arXiv preprint arXiv:1308.0371 (2013)
Yang, W., Jin, L., Xie, Z. et al.: Improved deep convolutional neural network for online handwritten Chinese character recognition using domain-specific knowledge. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 551–555. IEEE (2015)
Google Scholar
Jia, Y., Shelhamer, E., Donahue, J. et al.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on Multimedia, pp. 675–678. ACM (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
Zong-Heng Xing, Fang Zhou, Shu Tian & Xu-Cheng Yin

Authors

Zong-Heng Xing
View author publications
You can also search for this author in PubMed Google Scholar
Fang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Shu Tian
View author publications
You can also search for this author in PubMed Google Scholar
Xu-Cheng Yin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Fang Zhou or Xu-Cheng Yin .

Editor information

Editors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi'an, China
Xuelong Li
Chinese Academy of Sciences, Institute of Computing Technology, Beijing, China
Xilin Chen
Tsinghua University , Beijing, China
Jie Zhou
Nanjing University of Science and Technology, Nanjing, China
Jian Yang
University of Electronic Science and Technology, Chengdu, Sichuan, China
Hong Cheng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xing, ZH., Zhou, F., Tian, S., Yin, XC. (2016). Robust Segmentation for Video Captions with Complex Backgrounds. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds) Pattern Recognition. CCPR 2016. Communications in Computer and Information Science, vol 663. Springer, Singapore. https://doi.org/10.1007/978-981-10-3005-5_8

Download citation

DOI: https://doi.org/10.1007/978-981-10-3005-5_8
Published: 22 October 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3004-8
Online ISBN: 978-981-10-3005-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics