A New Multi-spectral Fusion Method for Degraded Video Text Frame Enhancement

Weng, Yangbing; Shivakumara, Palaiahnakote; Lu, Tong; Meng, Liang Kim; Woon, Hon Hock

doi:10.1007/978-3-319-24075-6_48

Yangbing Weng¹⁸,
Palaiahnakote Shivakumara¹⁹,
Tong Lu¹⁸,
Liang Kim Meng²⁰ &
…
Hon Hock Woon²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9314))

Included in the following conference series:

Pacific Rim Conference on Multimedia

1827 Accesses
1 Citations

Abstract

Text detection and recognition in degraded video is complex and challenging due to lighting effect, sensor and motion blurring. This paper presents a new method that derives multi-spectral images from each input video frame by studying non-linear intensity values in Gray, R, G and B color spaces to increase the contrast of text pixels, which results in four respective multi-spectral images. Then we propose a multiple fusion criteria for the four multi-spectral images to enhance text information in degraded video frames. We propose median operation to obtain a single image from the results of the multiple fusion criteria, which we name fusion-1. We further apply k-means clustering on the fused images obtained by the multiple fusion criteria to classify text clusters, which results in binary images. Then we propose the same median operation to obtain a single image by fusing binary images, which we name fusion-2. We evaluate the enhanced images at fusion-1 and fusion-2 using quality measures, such as Mean Square Error, Peak Signal to Noise Ratio and Structural Symmetry. Furthermore, the enhanced images are validated through text detection and recognition accuracies in video frames to show the effectiveness of enhancement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

New Fusion Based Enhancement for Text Detection in Night Video Footage

K-Means Algorithm-Based Text Extraction from Complex Video Images Using 2D Wavelet

A novel multiscale transform decomposition based multi-focus image fusion framework

Article 11 January 2021

References

Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. IEEE. Trans. Pattern Anal. Mach Intell. 1, 1 (2014)
Google Scholar
Sharma, N., Pal, U., Blumenstein, M.: Recent advances in video based document processing: a review. In: Proceedings of DAS, pp. 63–68 (2012)
Google Scholar
Yu, S., Li, B., Zhang, Q., Liu, C., Meng, M.A.H.: A novel license plate location method based on wavelet transform and EMD analysis. Pattern Recogn. 48, 114–125 (2015)
Article Google Scholar
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Proceedings of CVPR, pp. 2963–2970 (2010)
Google Scholar
Otsu, N.: A threshold selection method from gray level histogram. IEEE Trans. Syst. Man Cybern. 11, 62–66 (1978)
Google Scholar
Tesseract. http://code.google.com/p/tesseract-ocr/
Niblack, W.: An Introduction to Digital Image Processing. Strandberg Publishing Company, Birkeroed (1985)
Google Scholar
Sauvola, J., Seeppanen, T., Haapakoski, S., Pietikainen, M.: Adaptive document binarization. In: Proceedings of ICDAR, pp. 147–152 (1997)
Google Scholar
Zhou, Y., Feid, J., Miller, E.L., Wang, R.: Scene text segmentation via inverse rendering. In: Proceedings of ICDAR, pp. 457–461 (2013)
Google Scholar
Su, B., Lu, S., Tan, C.L.: A robust document image binarization for degraded document images. IEEE Trans. Image Process. 22, 1408–1417 (2013)
Article MathSciNet Google Scholar
Su, B., Lu, S., Tan, C.L.: Binarization of historical document images using the local maximum and minimum. In: Proceedings of DAS, pp. 159–166 (2010)
Google Scholar
Nayef, N., Chazalon, J., Kramer, P.G., Ogier, J.M.: Efficient example-based super-resolution of single text images based on selective patch processing. In: Proceedings of DAS, pp. 227–231 (2014)
Google Scholar
Zheng, Y., Li, X.K.S., Sun, Y.H.J.: Real-time document image super-resolution by fast matting. In: Proceedings of DAS, pp. 232–236 (2014)
Google Scholar
Saleem, S., Hollaus, F., Sablatnig, R.: Recognition of degraded ancient characters based on dense SIFT. In: Proceedings of DATeCH, pp. 15–20 (2014)
Google Scholar
Minetto, R., Thome, N., Cord, M., Leite, N.J., Stolfi, J.: SnooperText: a text detection system for automatic indexing of urban scenes. In: CVIU, pp. 92–104 (2014)
Google Scholar
Yi, C., Tian, Y.: Text string detection from natural scenes by structure-based partition and grouping. IEEE Trans. Image Process. 20, 2594–2605 (2011)
Article MathSciNet Google Scholar
Shivakumara, P., Phan, T.Q., Lu, S., Tan, C.L.: Gradient vector flow and grouping based method for arbitrarily-oriented scene text detection in video images. IEEE Trans. Circ. Syst. Video Technol. 23, 1729–1739 (2013)
Article Google Scholar
Xu, J., Shivakumara, P., Lu, T., Phan, T.Q., Tan, C.L.: Graphics and scene text classification in video. In: Proceedings of ICPR, pp. 4714–4719 (2014)
Google Scholar
Cui, Y., Huang, Q.: Character extraction of license plate from video. In: Proceedings of CVPR, pp. 502–507 (1997)
Google Scholar
Li, H., Doermann, D.: Super-resolution-based enhancement for text in digital video. In: Proceedings of ICPR, pp 847–850 (2000)
Google Scholar
Suresh, K.V., Kumar, G.M., Rajagopalan, A.N.: Superresolution of license plates in real traffic videos. IEEE Trans. Intell. Transp. Syst. 8, 321–331 (2007)
Article Google Scholar
Saleeem, S., Sablatnig, R.: A robust SIFT descriptor for multi-spectral images. IEEE Signal Process. Lett. 21, 400–403 (2014)
Article Google Scholar
Rusinol, M., Chazalon, J., Ogier, J. M.: Combining focus measure operators to predict OCR accuracy in mobile-captured document images. In: Proceedings of IWDAS, pp 181–185 (2014)
Google Scholar
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Boorda, L.G.I., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De las Heras, L.P.: ICDAR 2013 robust reading competition. In: Proceedings of ICDAR, pp. 1115–1124 (2013)
Google Scholar
Lu, W., Tao, D.: Multiview Hessian regularization for image annotation. IEEE Trans. Image Process. 22, 2676–2687 (2013)
Article MathSciNet Google Scholar
Xu, C., Tao, D., Xu, C.: Large-margin multi-view information bottleneck. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1559–1572 (2014)
Article Google Scholar

Download references

Acknowledgment

The work described in this paper was supported by the Natural Science Foundation of China under Grant No. 61272218 and No. 61321491, and the Program for New Century Excellent Talents under NCET-11-0232.

Author information

Authors and Affiliations

National Key Lab for Novel Software Technology, Nanjing University, Nanjing, China
Yangbing Weng & Tong Lu
Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia
Palaiahnakote Shivakumara
Advanced Informatics Lab, MIMOS Berhad, Kuala Lumpur, Malaysia
Liang Kim Meng & Hon Hock Woon

Authors

Yangbing Weng
View author publications
You can also search for this author in PubMed Google Scholar
Palaiahnakote Shivakumara
View author publications
You can also search for this author in PubMed Google Scholar
Tong Lu
View author publications
You can also search for this author in PubMed Google Scholar
Liang Kim Meng
View author publications
You can also search for this author in PubMed Google Scholar
Hon Hock Woon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tong Lu .

Editor information

Editors and Affiliations

Gwangju Institute of Science and Technology, Gwangju, Korea (Republic of)
Yo-Sung Ho
Chinese Academy of Sciences, Institute of Automation, Beijing, China
Jitao Sang
ICU, IVY Lab, KAIST, Daejeon, Korea (Republic of)
Yong Man Ro
KAIST, Daejeon, Korea (Republic of)
Junmo Kim
College of Computer Science, Zhejiang University, Hangzhou, China
Fei Wu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Weng, Y., Shivakumara, P., Lu, T., Meng, L.K., Woon, H.H. (2015). A New Multi-spectral Fusion Method for Degraded Video Text Frame Enhancement. In: Ho, YS., Sang, J., Ro, Y., Kim, J., Wu, F. (eds) Advances in Multimedia Information Processing -- PCM 2015. PCM 2015. Lecture Notes in Computer Science(), vol 9314. Springer, Cham. https://doi.org/10.1007/978-3-319-24075-6_48

Download citation

DOI: https://doi.org/10.1007/978-3-319-24075-6_48
Published: 22 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24074-9
Online ISBN: 978-3-319-24075-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics