A Robust Video Text Extraction and Recognition Approach Using OCR Feedback Information

Gao, Guangyu; Zhang, He; Chen, Hongting

doi:10.1007/978-3-319-24075-6_49

Guangyu Gao¹⁸,
He Zhang¹⁹ &
Hongting Chen²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9314))

Included in the following conference series:

Pacific Rim Conference on Multimedia

1830 Accesses
1 Citations

Abstract

Video text is very important semantic information, which brings precise and meaningful clues for video indexing and retrieval. However, most previous approaches did video text extraction and recognition separately, while the main difficulty of extraction and recognition with complex background wasn’t handled very well. In this paper, these difficulty is investigated by combining text extraction and recognition together as well as using OCR feedback information. The following features are highlighted in our approach: (i) an efficient character image segmentation method is proposed in consideration of most prior knowledge. (ii) text extraction are implemented both on text-row and segmented single character images, since text-row based extraction maintains the color consistency of characters and backgrounds while single character has simpler background. After that, the best binary image is chosen for recognition with OCR feedback. (iii) The K-means algorithm is used for extraction which ensures that the best extraction result is involved, which is the binary image with clear classification of text strokes and background. Finally, extensive experiments and empirical evaluations on several video text images are conducted to demonstrate the satisfying performance of the proposed approach.

This work was supported by National Natural Science Foundation of China (Grant No. 61401023) and Fundamental University Research Fund of BIT (Grand No. 20140842001).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Automatic video scene text detection based on saliency edge map

Article 10 August 2019

Automatic video superimposed text detection based on Nonsubsampled Contourlet Transform

Article 25 March 2017

An automatic video text detection method based on BP-adaboost

Article 14 August 2015

References

https://www.youtube.com/yt/press/statistics.html
Zhang, D., Chang, S.: Event detection in basketball video using superimposed caption recognition. In: Proceedings of the ACM MM, pp. 315–318 (2002)
Google Scholar
Zhang, D., Rajendran, R., Chang, S.: General and domain-specific techniques for detecting and recognizing superimposed text in video. In: Proceedings of ICIP, pp. I-593–I-596
Google Scholar
Kim, H.H.: Toward video semantic search based on a structured folksonomy. J. Am. Soc. Inf. Sci. Technol. 62(3), 478–492 (2011)
Google Scholar
Bhute, A.N., Meshram, B.B.: Text based approach for indexing and retrieval of image and video: a review. Adv. Vis. Comput. 1(1), 27–38 (2014)
Google Scholar
Mitra, V., Franco, H., Graciarena, M., Vergyri, D.: Medium-duration modulation cepstral feature for robust speech recognition. In: Proceedings of ICASSP, pp. 1749–1753 (2014)
Google Scholar
Lyu, M.R., Song, J., Cai, M.: A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Trans. Circ. Syst. Video Technol. 15(2), 243–255 (2005)
Article Google Scholar
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Circ. Syst. Video Technol. 9(1), 62–66 (1979)
Google Scholar
Leedham, G., Yan, C., Takru, K., Tan, J.H.N., Mian, L.: Comparison of some thresholding algorithms for text/background segmentation in difficult document images. In: Proceedings of ICDAR, pp. 859–864 (2003)
Google Scholar
Ngo, C.W., Chan, C.K.: Video text detection and segmentation for optical character recognition. Multimedia Syst. 10(3), 261–272 (2005)
Article Google Scholar
Kim, W., Kim, C.: A new approach for overlay text detection and extraction from complex video scene. IEEE Trans. Image Process. 18(2), 401–411 (2009)
Article MathSciNet Google Scholar
Gao, J., Yang, J.: An adaptive algorithm for text detection from natural scenes. In: Proceedings of CVPR, pp. II-84–II-89 (2001)
Google Scholar
Chen, D., Olobez, J.M., Bourlard, H.: Text segmentation and recognition in complex background based on Markov random field. In: Proceedings of ICPR, pp. 227–230 (2002)
Google Scholar
Fu, H., Liu, X., Jia, Y., Deng, H.: Gaussian mixture modeling of neighbor characters for multilingual text extraction in images. In: Proceedings of ICIP, pp. 3321–3324 (2006)
Google Scholar
Roy, A., Parui, S.K., Roy, U.: A pair-copula based scheme for text extraction from digital images. In: Proceedings of ICDA, pp. 892–896 (2013)
Google Scholar
Lienhart, R., Wernicke, A.: Localizing and segmenting text in images and videos. IEEE Trans. Circ. Syst. Video Technol. 12(4), 256–268 (2002)
Article Google Scholar
Song, Y., Liu, A., Pang, L., Lin, S., Zhang, Y., Tang, S.: A novel image text extraction method based on k-means clustering. In: Proceedings of ICIS, pp. 185–190 (2008)
Google Scholar
Li, X., Wang, W., Huang, Q., Gao, W., Qing, L.: A hybrid text segmentation approach. In: Proceedings of ICME, pp. 510–513 (2009)
Google Scholar
Li, Z., Liu, G., Qian, X., Guo, D., Jiang, H.: Effective and efficient video text extraction using key text points. IET Image Process. 5(8), 671–683 (2011)
Article MathSciNet Google Scholar
Liu, Y., Song, Y., Zhang, Y., Meng, Q.: A novel multi-oriented Chinese text extraction approach from videos. In: Proceedings of ICDAR, pp. 1355–1359 (2013)
Google Scholar
Sharma, N., Shivakumara, P., Pal, U., Blumenstein, M., Tan, C.L.: A new gradient based character segmentation method for video text recognition. In: ICDAR, pp. 126–130 (2011)
Google Scholar
Huang, X., Ma, H., Zhang, H.: A new video text extraction approach. In: Proceedings of ICME 2009, pp. 650–653 (2009)
Google Scholar
Shivakumara, P., Phan, T.Q., Tan, C.L.: A Laplacian approach to multi-oriented text detection in video. IEEE Trans. Pattern Anal. Mach. Intell. 33(2), 412–419 (2011)
Article Google Scholar
Huang, X., Ma, H., Yuan, H.: A novel video text detection and localization approach. In: Huang, Y.-M.R., Xu, C., Cheng, K.-S., Yang, J.-F.K., Swamy, M.N.S., Li, S., Ding, J.-W. (eds.) PCM 2008. LNCS, vol. 5353, pp. 525–534. Springer, Heidelberg (2008)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

School of Software, Beijing Institute of Technology, Beijing, China
Guangyu Gao
Operations Office (Beijing), People’s Bank of China, Beijing, China
He Zhang
School of Computer, Beijing University of Posts and Telecom, Beijing, China
Hongting Chen

Authors

Guangyu Gao
View author publications
You can also search for this author in PubMed Google Scholar
He Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hongting Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guangyu Gao .

Editor information

Editors and Affiliations

Gwangju Institute of Science and Technology, Gwangju, Korea (Republic of)
Yo-Sung Ho
Chinese Academy of Sciences, Institute of Automation, Beijing, China
Jitao Sang
ICU, IVY Lab, KAIST, Daejeon, Korea (Republic of)
Yong Man Ro
KAIST, Daejeon, Korea (Republic of)
Junmo Kim
College of Computer Science, Zhejiang University, Hangzhou, China
Fei Wu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gao, G., Zhang, H., Chen, H. (2015). A Robust Video Text Extraction and Recognition Approach Using OCR Feedback Information. In: Ho, YS., Sang, J., Ro, Y., Kim, J., Wu, F. (eds) Advances in Multimedia Information Processing -- PCM 2015. PCM 2015. Lecture Notes in Computer Science(), vol 9314. Springer, Cham. https://doi.org/10.1007/978-3-319-24075-6_49

Download citation

DOI: https://doi.org/10.1007/978-3-319-24075-6_49
Published: 22 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24074-9
Online ISBN: 978-3-319-24075-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics