Abstract
Video text is very important semantic information, which brings precise and meaningful clues for video indexing and retrieval. However, most previous approaches did video text extraction and recognition separately, while the main difficulty of extraction and recognition with complex background wasn’t handled very well. In this paper, these difficulty is investigated by combining text extraction and recognition together as well as using OCR feedback information. The following features are highlighted in our approach: (i) an efficient character image segmentation method is proposed in consideration of most prior knowledge. (ii) text extraction are implemented both on text-row and segmented single character images, since text-row based extraction maintains the color consistency of characters and backgrounds while single character has simpler background. After that, the best binary image is chosen for recognition with OCR feedback. (iii) The K-means algorithm is used for extraction which ensures that the best extraction result is involved, which is the binary image with clear classification of text strokes and background. Finally, extensive experiments and empirical evaluations on several video text images are conducted to demonstrate the satisfying performance of the proposed approach.
This work was supported by National Natural Science Foundation of China (Grant No. 61401023) and Fundamental University Research Fund of BIT (Grand No. 20140842001).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Zhang, D., Chang, S.: Event detection in basketball video using superimposed caption recognition. In: Proceedings of the ACM MM, pp. 315–318 (2002)
Zhang, D., Rajendran, R., Chang, S.: General and domain-specific techniques for detecting and recognizing superimposed text in video. In: Proceedings of ICIP, pp. I-593–I-596
Kim, H.H.: Toward video semantic search based on a structured folksonomy. J. Am. Soc. Inf. Sci. Technol. 62(3), 478–492 (2011)
Bhute, A.N., Meshram, B.B.: Text based approach for indexing and retrieval of image and video: a review. Adv. Vis. Comput. 1(1), 27–38 (2014)
Mitra, V., Franco, H., Graciarena, M., Vergyri, D.: Medium-duration modulation cepstral feature for robust speech recognition. In: Proceedings of ICASSP, pp. 1749–1753 (2014)
Lyu, M.R., Song, J., Cai, M.: A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Trans. Circ. Syst. Video Technol. 15(2), 243–255 (2005)
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Circ. Syst. Video Technol. 9(1), 62–66 (1979)
Leedham, G., Yan, C., Takru, K., Tan, J.H.N., Mian, L.: Comparison of some thresholding algorithms for text/background segmentation in difficult document images. In: Proceedings of ICDAR, pp. 859–864 (2003)
Ngo, C.W., Chan, C.K.: Video text detection and segmentation for optical character recognition. Multimedia Syst. 10(3), 261–272 (2005)
Kim, W., Kim, C.: A new approach for overlay text detection and extraction from complex video scene. IEEE Trans. Image Process. 18(2), 401–411 (2009)
Gao, J., Yang, J.: An adaptive algorithm for text detection from natural scenes. In: Proceedings of CVPR, pp. II-84–II-89 (2001)
Chen, D., Olobez, J.M., Bourlard, H.: Text segmentation and recognition in complex background based on Markov random field. In: Proceedings of ICPR, pp. 227–230 (2002)
Fu, H., Liu, X., Jia, Y., Deng, H.: Gaussian mixture modeling of neighbor characters for multilingual text extraction in images. In: Proceedings of ICIP, pp. 3321–3324 (2006)
Roy, A., Parui, S.K., Roy, U.: A pair-copula based scheme for text extraction from digital images. In: Proceedings of ICDA, pp. 892–896 (2013)
Lienhart, R., Wernicke, A.: Localizing and segmenting text in images and videos. IEEE Trans. Circ. Syst. Video Technol. 12(4), 256–268 (2002)
Song, Y., Liu, A., Pang, L., Lin, S., Zhang, Y., Tang, S.: A novel image text extraction method based on k-means clustering. In: Proceedings of ICIS, pp. 185–190 (2008)
Li, X., Wang, W., Huang, Q., Gao, W., Qing, L.: A hybrid text segmentation approach. In: Proceedings of ICME, pp. 510–513 (2009)
Li, Z., Liu, G., Qian, X., Guo, D., Jiang, H.: Effective and efficient video text extraction using key text points. IET Image Process. 5(8), 671–683 (2011)
Liu, Y., Song, Y., Zhang, Y., Meng, Q.: A novel multi-oriented Chinese text extraction approach from videos. In: Proceedings of ICDAR, pp. 1355–1359 (2013)
Sharma, N., Shivakumara, P., Pal, U., Blumenstein, M., Tan, C.L.: A new gradient based character segmentation method for video text recognition. In: ICDAR, pp. 126–130 (2011)
Huang, X., Ma, H., Zhang, H.: A new video text extraction approach. In: Proceedings of ICME 2009, pp. 650–653 (2009)
Shivakumara, P., Phan, T.Q., Tan, C.L.: A Laplacian approach to multi-oriented text detection in video. IEEE Trans. Pattern Anal. Mach. Intell. 33(2), 412–419 (2011)
Huang, X., Ma, H., Yuan, H.: A novel video text detection and localization approach. In: Huang, Y.-M.R., Xu, C., Cheng, K.-S., Yang, J.-F.K., Swamy, M.N.S., Li, S., Ding, J.-W. (eds.) PCM 2008. LNCS, vol. 5353, pp. 525–534. Springer, Heidelberg (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Gao, G., Zhang, H., Chen, H. (2015). A Robust Video Text Extraction and Recognition Approach Using OCR Feedback Information. In: Ho, YS., Sang, J., Ro, Y., Kim, J., Wu, F. (eds) Advances in Multimedia Information Processing -- PCM 2015. PCM 2015. Lecture Notes in Computer Science(), vol 9314. Springer, Cham. https://doi.org/10.1007/978-3-319-24075-6_49
Download citation
DOI: https://doi.org/10.1007/978-3-319-24075-6_49
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24074-9
Online ISBN: 978-3-319-24075-6
eBook Packages: Computer ScienceComputer Science (R0)