Abstract
Caption text contains rich information that can be used for video indexing and summarization. In this paper, we propose an effective caption text segmentation approach to improve OCR accuracy. Here, an AlexNet CNN is first trained with path signature for text tracking. Then we utilize an improved adaptive thresholding method to segment caption text in individual frames. Finally, the multi-frame integration is conducted with gamma correction and region growing. In contrast to conventional methods which extract video text in individual frames independently, we exploit the specific temporal characteristics of videos to perform segmentation. Moreover, the proposed method can effectively remove the complex backgrounds with similar intensity to text. Experimental results on different videos and comparisons with other methods show the efficiency of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wang, X., Huang, L., Liu, C.: A video text location method based on background classification. Int. J. Doc. Anal. Recogn. (IJDAR) 13(3), 173–186 (2010)
Kim, W., Kim, C.: A new approach for overlay text detection and extraction from complex video scene. IEEE Trans. Image Process 18(2), 401–411 (2009)
Liu, Y., Srihari, S.N.: Document image binarization based on texture features. IEEE Trans. Pattern Anal. Mach. Intell. 19(5), 540–544 (1997)
Cheriet, M., Said, J.N., Suen, C.Y.: A recursive thresholding technique for image segmentation. IEEE Trans. Image Process. 7(6), 918–921 (1998)
Otsu, N.: A threshold selection method from gray-level histograms. Automatica 11(285–296), 23–27 (1975)
Ohya, J., Shio, A., Akamatsu, S.: Recognizing characters in scene images. IEEE Trans. Pattern Anal. Mach. Intell. 16(2), 214–220 (1994)
Lyu, M.R., Song, J., Cai, M.: A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Trans. Circ. Syst. Video Technol. 15(2), 243–255 (2005)
Ye, Q., Gao, W., Huang, Q.: Automatic text segmentation from complex background. In: 2004 International Conference on Image Processing, ICIP 2004, vol. 5, pp. 2905–2908. IEEE (2004)
Wang, X., Huang, L., Liu, C.: A novel method for embedded text segmentation based on stroke and color. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 151–155. IEEE (2011)
Mishra, A., Alahari, K., Jawahar, C.V.: An MRF model for binarization of natural scene text. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 11–16. IEEE (2011)
Wu, V., Manmatha, R., Riseman, E.M.: Textfinder: an automatic system to detect and recognize text in images. IEEE Trans. Pattern Anal. Mach. Intell. 11, 1224–1229 (1999)
Wakahara, T., Kita, K.: Binarization of color character strings in scene images using K-means clustering and support vector machines. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 274–278. IEEE (2011)
Mancas-Thillou, C., Gosselin, B.: Spatial and color spaces combination for natural scene text extraction. In: 2006 IEEE International Conference on Image Processing, pp. 985–988. IEEE (2006)
Lienhart, R., Wernicke, A.: Localizing and segmenting text in images and videos. IEEE Trans. Circ. Syst. Video Technol. 12(4), 256–268 (2002)
Liu, X., Wang, W.: Robustly extracting captions in videos based on stroke-like edges and spatio-temporal analysis. IEEE Trans. Multimedia 14(2), 482–489 (2012)
Phan, T.Q., Shivakumara, P., Lu, T. et al.: Recognition of video text through temporal integration. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 589–593. IEEE (2013)
Chen, K.T.: Integration of paths–a faithful representation of paths by noncommutative formal power series. Trans. Am. Math. Soc. 89(2), 395–407 (1958)
Graham, B.: Sparse arrays of signatures for online character recognition. arXiv preprint arXiv:1308.0371 (2013)
Yang, W., Jin, L., Xie, Z. et al.: Improved deep convolutional neural network for online handwritten Chinese character recognition using domain-specific knowledge. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 551–555. IEEE (2015)
Jia, Y., Shelhamer, E., Donahue, J. et al.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on Multimedia, pp. 675–678. ACM (2014)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Xing, ZH., Zhou, F., Tian, S., Yin, XC. (2016). Robust Segmentation for Video Captions with Complex Backgrounds. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds) Pattern Recognition. CCPR 2016. Communications in Computer and Information Science, vol 663. Springer, Singapore. https://doi.org/10.1007/978-981-10-3005-5_8
Download citation
DOI: https://doi.org/10.1007/978-981-10-3005-5_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3004-8
Online ISBN: 978-981-10-3005-5
eBook Packages: Computer ScienceComputer Science (R0)