Skip to main content

A New Multi-spectral Fusion Method for Degraded Video Text Frame Enhancement

  • Conference paper
  • First Online:
Advances in Multimedia Information Processing -- PCM 2015 (PCM 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9314))

Included in the following conference series:

Abstract

Text detection and recognition in degraded video is complex and challenging due to lighting effect, sensor and motion blurring. This paper presents a new method that derives multi-spectral images from each input video frame by studying non-linear intensity values in Gray, R, G and B color spaces to increase the contrast of text pixels, which results in four respective multi-spectral images. Then we propose a multiple fusion criteria for the four multi-spectral images to enhance text information in degraded video frames. We propose median operation to obtain a single image from the results of the multiple fusion criteria, which we name fusion-1. We further apply k-means clustering on the fused images obtained by the multiple fusion criteria to classify text clusters, which results in binary images. Then we propose the same median operation to obtain a single image by fusing binary images, which we name fusion-2. We evaluate the enhanced images at fusion-1 and fusion-2 using quality measures, such as Mean Square Error, Peak Signal to Noise Ratio and Structural Symmetry. Furthermore, the enhanced images are validated through text detection and recognition accuracies in video frames to show the effectiveness of enhancement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. IEEE. Trans. Pattern Anal. Mach Intell. 1, 1 (2014)

    Google Scholar 

  2. Sharma, N., Pal, U., Blumenstein, M.: Recent advances in video based document processing: a review. In: Proceedings of DAS, pp. 63–68 (2012)

    Google Scholar 

  3. Yu, S., Li, B., Zhang, Q., Liu, C., Meng, M.A.H.: A novel license plate location method based on wavelet transform and EMD analysis. Pattern Recogn. 48, 114–125 (2015)

    Article  Google Scholar 

  4. Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Proceedings of CVPR, pp. 2963–2970 (2010)

    Google Scholar 

  5. Otsu, N.: A threshold selection method from gray level histogram. IEEE Trans. Syst. Man Cybern. 11, 62–66 (1978)

    Google Scholar 

  6. Tesseract. http://code.google.com/p/tesseract-ocr/

  7. Niblack, W.: An Introduction to Digital Image Processing. Strandberg Publishing Company, Birkeroed (1985)

    Google Scholar 

  8. Sauvola, J., Seeppanen, T., Haapakoski, S., Pietikainen, M.: Adaptive document binarization. In: Proceedings of ICDAR, pp. 147–152 (1997)

    Google Scholar 

  9. Zhou, Y., Feid, J., Miller, E.L., Wang, R.: Scene text segmentation via inverse rendering. In: Proceedings of ICDAR, pp. 457–461 (2013)

    Google Scholar 

  10. Su, B., Lu, S., Tan, C.L.: A robust document image binarization for degraded document images. IEEE Trans. Image Process. 22, 1408–1417 (2013)

    Article  MathSciNet  Google Scholar 

  11. Su, B., Lu, S., Tan, C.L.: Binarization of historical document images using the local maximum and minimum. In: Proceedings of DAS, pp. 159–166 (2010)

    Google Scholar 

  12. Nayef, N., Chazalon, J., Kramer, P.G., Ogier, J.M.: Efficient example-based super-resolution of single text images based on selective patch processing. In: Proceedings of DAS, pp. 227–231 (2014)

    Google Scholar 

  13. Zheng, Y., Li, X.K.S., Sun, Y.H.J.: Real-time document image super-resolution by fast matting. In: Proceedings of DAS, pp. 232–236 (2014)

    Google Scholar 

  14. Saleem, S., Hollaus, F., Sablatnig, R.: Recognition of degraded ancient characters based on dense SIFT. In: Proceedings of DATeCH, pp. 15–20 (2014)

    Google Scholar 

  15. Minetto, R., Thome, N., Cord, M., Leite, N.J., Stolfi, J.: SnooperText: a text detection system for automatic indexing of urban scenes. In: CVIU, pp. 92–104 (2014)

    Google Scholar 

  16. Yi, C., Tian, Y.: Text string detection from natural scenes by structure-based partition and grouping. IEEE Trans. Image Process. 20, 2594–2605 (2011)

    Article  MathSciNet  Google Scholar 

  17. Shivakumara, P., Phan, T.Q., Lu, S., Tan, C.L.: Gradient vector flow and grouping based method for arbitrarily-oriented scene text detection in video images. IEEE Trans. Circ. Syst. Video Technol. 23, 1729–1739 (2013)

    Article  Google Scholar 

  18. Xu, J., Shivakumara, P., Lu, T., Phan, T.Q., Tan, C.L.: Graphics and scene text classification in video. In: Proceedings of ICPR, pp. 4714–4719 (2014)

    Google Scholar 

  19. Cui, Y., Huang, Q.: Character extraction of license plate from video. In: Proceedings of CVPR, pp. 502–507 (1997)

    Google Scholar 

  20. Li, H., Doermann, D.: Super-resolution-based enhancement for text in digital video. In: Proceedings of ICPR, pp 847–850 (2000)

    Google Scholar 

  21. Suresh, K.V., Kumar, G.M., Rajagopalan, A.N.: Superresolution of license plates in real traffic videos. IEEE Trans. Intell. Transp. Syst. 8, 321–331 (2007)

    Article  Google Scholar 

  22. Saleeem, S., Sablatnig, R.: A robust SIFT descriptor for multi-spectral images. IEEE Signal Process. Lett. 21, 400–403 (2014)

    Article  Google Scholar 

  23. Rusinol, M., Chazalon, J., Ogier, J. M.: Combining focus measure operators to predict OCR accuracy in mobile-captured document images. In: Proceedings of IWDAS, pp 181–185 (2014)

    Google Scholar 

  24. Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Boorda, L.G.I., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De las Heras, L.P.: ICDAR 2013 robust reading competition. In: Proceedings of ICDAR, pp. 1115–1124 (2013)

    Google Scholar 

  25. Lu, W., Tao, D.: Multiview Hessian regularization for image annotation. IEEE Trans. Image Process. 22, 2676–2687 (2013)

    Article  MathSciNet  Google Scholar 

  26. Xu, C., Tao, D., Xu, C.: Large-margin multi-view information bottleneck. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1559–1572 (2014)

    Article  Google Scholar 

Download references

Acknowledgment

The work described in this paper was supported by the Natural Science Foundation of China under Grant No. 61272218 and No. 61321491, and the Program for New Century Excellent Talents under NCET-11-0232.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tong Lu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Weng, Y., Shivakumara, P., Lu, T., Meng, L.K., Woon, H.H. (2015). A New Multi-spectral Fusion Method for Degraded Video Text Frame Enhancement. In: Ho, YS., Sang, J., Ro, Y., Kim, J., Wu, F. (eds) Advances in Multimedia Information Processing -- PCM 2015. PCM 2015. Lecture Notes in Computer Science(), vol 9314. Springer, Cham. https://doi.org/10.1007/978-3-319-24075-6_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24075-6_48

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24074-9

  • Online ISBN: 978-3-319-24075-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics