Skip to main content

Robust Segmentation for Video Captions with Complex Backgrounds

  • Conference paper
  • First Online:
Book cover Pattern Recognition (CCPR 2016)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 663))

Included in the following conference series:

  • 2272 Accesses

Abstract

Caption text contains rich information that can be used for video indexing and summarization. In this paper, we propose an effective caption text segmentation approach to improve OCR accuracy. Here, an AlexNet CNN is first trained with path signature for text tracking. Then we utilize an improved adaptive thresholding method to segment caption text in individual frames. Finally, the multi-frame integration is conducted with gamma correction and region growing. In contrast to conventional methods which extract video text in individual frames independently, we exploit the specific temporal characteristics of videos to perform segmentation. Moreover, the proposed method can effectively remove the complex backgrounds with similar intensity to text. Experimental results on different videos and comparisons with other methods show the efficiency of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wang, X., Huang, L., Liu, C.: A video text location method based on background classification. Int. J. Doc. Anal. Recogn. (IJDAR) 13(3), 173–186 (2010)

    Article  Google Scholar 

  2. Kim, W., Kim, C.: A new approach for overlay text detection and extraction from complex video scene. IEEE Trans. Image Process 18(2), 401–411 (2009)

    Article  MathSciNet  Google Scholar 

  3. Liu, Y., Srihari, S.N.: Document image binarization based on texture features. IEEE Trans. Pattern Anal. Mach. Intell. 19(5), 540–544 (1997)

    Article  Google Scholar 

  4. Cheriet, M., Said, J.N., Suen, C.Y.: A recursive thresholding technique for image segmentation. IEEE Trans. Image Process. 7(6), 918–921 (1998)

    Article  Google Scholar 

  5. Otsu, N.: A threshold selection method from gray-level histograms. Automatica 11(285–296), 23–27 (1975)

    Google Scholar 

  6. Ohya, J., Shio, A., Akamatsu, S.: Recognizing characters in scene images. IEEE Trans. Pattern Anal. Mach. Intell. 16(2), 214–220 (1994)

    Article  Google Scholar 

  7. Lyu, M.R., Song, J., Cai, M.: A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Trans. Circ. Syst. Video Technol. 15(2), 243–255 (2005)

    Article  Google Scholar 

  8. Ye, Q., Gao, W., Huang, Q.: Automatic text segmentation from complex background. In: 2004 International Conference on Image Processing, ICIP 2004, vol. 5, pp. 2905–2908. IEEE (2004)

    Google Scholar 

  9. Wang, X., Huang, L., Liu, C.: A novel method for embedded text segmentation based on stroke and color. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 151–155. IEEE (2011)

    Google Scholar 

  10. Mishra, A., Alahari, K., Jawahar, C.V.: An MRF model for binarization of natural scene text. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 11–16. IEEE (2011)

    Google Scholar 

  11. Wu, V., Manmatha, R., Riseman, E.M.: Textfinder: an automatic system to detect and recognize text in images. IEEE Trans. Pattern Anal. Mach. Intell. 11, 1224–1229 (1999)

    Article  Google Scholar 

  12. Wakahara, T., Kita, K.: Binarization of color character strings in scene images using K-means clustering and support vector machines. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 274–278. IEEE (2011)

    Google Scholar 

  13. Mancas-Thillou, C., Gosselin, B.: Spatial and color spaces combination for natural scene text extraction. In: 2006 IEEE International Conference on Image Processing, pp. 985–988. IEEE (2006)

    Google Scholar 

  14. Lienhart, R., Wernicke, A.: Localizing and segmenting text in images and videos. IEEE Trans. Circ. Syst. Video Technol. 12(4), 256–268 (2002)

    Article  Google Scholar 

  15. Liu, X., Wang, W.: Robustly extracting captions in videos based on stroke-like edges and spatio-temporal analysis. IEEE Trans. Multimedia 14(2), 482–489 (2012)

    Article  Google Scholar 

  16. Phan, T.Q., Shivakumara, P., Lu, T. et al.: Recognition of video text through temporal integration. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 589–593. IEEE (2013)

    Google Scholar 

  17. Chen, K.T.: Integration of paths–a faithful representation of paths by noncommutative formal power series. Trans. Am. Math. Soc. 89(2), 395–407 (1958)

    MATH  Google Scholar 

  18. Graham, B.: Sparse arrays of signatures for online character recognition. arXiv preprint arXiv:1308.0371 (2013)

  19. Yang, W., Jin, L., Xie, Z. et al.: Improved deep convolutional neural network for online handwritten Chinese character recognition using domain-specific knowledge. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 551–555. IEEE (2015)

    Google Scholar 

  20. Jia, Y., Shelhamer, E., Donahue, J. et al.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on Multimedia, pp. 675–678. ACM (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Fang Zhou or Xu-Cheng Yin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Xing, ZH., Zhou, F., Tian, S., Yin, XC. (2016). Robust Segmentation for Video Captions with Complex Backgrounds. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds) Pattern Recognition. CCPR 2016. Communications in Computer and Information Science, vol 663. Springer, Singapore. https://doi.org/10.1007/978-981-10-3005-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-3005-5_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-3004-8

  • Online ISBN: 978-981-10-3005-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics