Skip to main content
Log in

Comparative Analysis of Multi-scale Wavelet Decomposition and k-Means Clustering Based Text Extraction

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

Text present in images provides important information for automatic annotation, indexing and retrieval. Therefore, its extraction is a well-known research area in computer vision. However, variations of text due to differences in orientation, alignment, font, size, contrast and complex background makes the problem of text extraction extremely challenging. In this paper, we propose an efficient method to extract text regions even under complex background using DWT and k-means clustering along with voting decision process. As textures of text have abrupt variation and irregular texture property in the wavelet transform domain, so wavelet transform seems to be the best choice for achieving the objective of image segmentation. A small size overlapping sliding window is used to scan high frequency component sub-bands from which texture features are extracted. On the basis of these features, k-means clustering is employed to classify the image into text and background clusters. Finally, voting decision process and area-based filtering are used to locate text regions accurately. We examined and evaluated the performance by varying wavelet functions and decomposition levels. The proposed method is evaluated on four standard datasets (ICDAR 2013, KAIST, MSRA-TD500, SVT) and own created dataset. Further, performance analysis reveals that this method is robust and efficient for extracting text regions under various conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Zhang, H., Zhao, K., Song, Y. Z., & Guo, J. (2013). Text extraction from natural scene image: A survey. Neurocomputing, 122, 310–323.

    Article  Google Scholar 

  2. Jung, K., Kim, K. I., & Jain, A. K. (2004). Text information extraction in images and video: A survey. Pattern Recognition, 37, 977–997.

    Article  Google Scholar 

  3. Antani, S., Kasturi, R., & Jain, R. (2002). A survey on the use of pattern recognition methods for abstraction, indexing, and retrieval of images and video. Pattern Recognition, 35, 945–965.

    Article  Google Scholar 

  4. Sumathi, C. P., Santhanam, T., & Devi, G. G. (2012). A survey on various approaches of text extraction in images. International Journal of Computer Science & Engineering Survey, 3, 27–42.

    Article  Google Scholar 

  5. Liu, X., & Samarabandu, J. (2005). An edge-based text region extraction algorithm for indoor mobile robot navigation. In Proceedings of the IEEE international conference on mechatronics & automation (pp. 701–706). Niagara Falls: IEEE.

  6. Liu, C., Wang, C., & Dai, R. (2005). Text detection in images based on unsupervised classification of edge-based features. In Proceedings of the 8th international conference on document analysis and recognition (pp. 610–614). IEEE Computer Society.

  7. Lyu, M. R., Song, J., & Cai, M. (2005). A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Transactions on Circuits and Systems for Video Technology, 15, 243–255.

    Article  Google Scholar 

  8. Dinh, T. N., Park, J., & Lee, G. (2008). Low-complexity text extraction in Korean signboards for mobile applications. In 8th IEEE international conference on computer and information technology (pp. 333–337). Sydney, NSW: IEEE.

  9. Lai, A. N., & Lee, G. (2008). Binarization by local k-means clustering for Korean text extraction. In IEEE international symposium on signal processing and information technology (pp. 117–122). Sarajevo: IEEE.

  10. Grover, S., Arora, K., & Mitra, S. K. (2009). Text extraction from document images using edge information. In Annual IEEE India conference (pp. 1–4). Gujarat: IEEE.

  11. Phan, T. Q., Shivakumara, P., & Tan, C. L. (2009). A Laplacian method for video text detection. In 10th International conference on document analysis and recognition (pp. 66–70). Barcelona: IEEE Computer Society.

  12. Shivakumara, P., Phan, T. Q., & Tan, C. L. (2009). Video text detection based on filters and edge features. In IEEE international conference on multimedia and expo (pp. 514–517). New York: IEEE.

  13. Shivakumara, P., Phan, T. Q., & Tan, C. L. (2009). A gradient difference based technique for video text detection. In 10th International conference on document analysis and recognition (pp. 156–160). Barcelona: IEEE Computer Society.

  14. Zhang, X., Sun, F., & Gu, L. (2010). A combined algorithm for video text extraction. In 7th International conference on fuzzy systems and knowledge discovery (pp. 2294–2298), Yantai, Shandong: IEEE.

  15. Anoual, H., Aboutajdine, D., Ensias, S. E., & Enset, A. J. (2010). Features extraction for text detection and localization. In 5th International symposium on I/V on communications and mobile network (pp. 1–4). Rabat: IEEE.

  16. Shah, S., Modi, C., & Patel, M. (2011). Novel approach for text extraction from natural images using ISEF edge detection. In International conference on emerging trends in networks and computer communications (pp. 487–491). Udaipur: IEEE.

  17. Seeri, S. V., Giraddi, S., & Prashant, B. M. (2012). A novel approach for Kannada text extraction. In Proceedings of the international conference on pattern recognition, informatics and medical engineering (pp. 444–448). Salem, Tamilnadu: IEEE.

  18. Zheng, L., He, X., Samali, B., & Yang, L. T. (2013). An algorithm for accuracy enhancement of license plate recognition. Journal of Computer and System Sciences, 79, 245–255.

    Article  MathSciNet  Google Scholar 

  19. Yao, J. L., Wang, Y. Q., Weng, L. B., & Yang, Y. P. (2007). Locating text based on connected component and SVM. In Proceedings of the 2007 international conference on wavelet analysis and pattern recognition (pp. 1418–1423). Beijing: IEEE.

  20. Kim, W., & Kim, C. (2009). A new approach for overlay text detection and extraction from complex video scene. IEEE Transactions on Image Processing, 18, 401–411.

    Article  MathSciNet  Google Scholar 

  21. Sun, L., Liu, G., Qian, X., & Guo, D. (2009). A novel text detection and localization method based on corner response. In IEEE international conference on multimedia and expo (pp. 390–393). New York: IEEE.

  22. Kumar, M., Kim, Y. C., & Lee, G. S. (2010). Text detection using multilayer separation in real scene images. In 10th IEEE international conference on computer and information technology (pp. 1413–1417). Bradford: IEEE Computer Society.

  23. Zhang, Y., Wang, C., Xiao, B., & Shi, C. (2012). A new text extraction method incorporating local information. In International conference on frontiers in handwriting recognition (pp. 252–255). Bari: IEEE.

  24. Raj, H., & Ghosh, R. (2014). Devanagari text extraction from natural scene images. In International conference on advances in computing, communications and informatics (pp. 513–517). New Delhi: IEEE.

  25. Qiao, Y. L., Li, M., Lu, Z. M., & Sun, S. H. (2006). Gabor filter based text extraction from digital document images. In Proceedings of the 2006 international conference on intelligent information hiding and multimedia signal processing (pp. 297–300). Pasadena: IEEE Computer Society.

  26. Angadi, S. A., & Kodabagi, M. M. (2010). A texture based methodology for text region extraction from low resolution natural scene images. International Journal of Image Processing, 3, 229–245.

    Google Scholar 

  27. Nagabhushan, P., & Nirmala, S. (2010). Text extraction in complex color document images for enhanced readability. Intelligent Information Management, 2, 120–133.

    Article  Google Scholar 

  28. Aradhya, V. N. M., Pavithra, M. S., & Naveena, C. (2012). A robust multilingual text detection approach based on transforms and wavelet entropy. Procedia Technology, 4, 232–237.

    Article  Google Scholar 

  29. Azadboni, M. K., & Behrad, A. (2012). Text detection and character extraction in color images using FFT domain filtering and SVM classification. In 6th International symposium on telecommunications (pp. 794–799). Tehran: IEEE.

  30. Shekar, B. H., Smitha, M. L., & Shivakumara, P. (2014). Discrete wavelet transform and gradient difference based approach for text localization in videos. In 5th International conference on signals and image processing (pp. 280–284). Jeju Island: IEEE.

  31. Bai, B., Yin, F., & Liu, C. L. (2014). A seed-based segmentation method for scene text extraction. In 11th IAPR international workshop on document analysis systems (pp. 262–266). Tours: IEEE.

  32. Kumar, A., & Awasthi, N. (2013). An efficient algorithm for text localization and extraction in complex video text images. In 2nd International conference on information management in the knowledge economy (pp. 14–19). Chandigarh: IEEE.

  33. Shivakumara, P., Sreedhar, R. P., Phan, T. Q., Lu, S., & Tan, C. L. (2012). Multioriented video scene text detection through bayesian classification and boundary growing. IEEE Transactions on Circuits and Systems for Video Technology, 22, 1227–1235.

    Article  Google Scholar 

  34. Yi, C., & Tian, Y. (2012). Localizing text in scene images by boundary clustering, stroke segmentation, and string fragment classification. IEEE Transactions on Image Processing, 21, 4256–4268.

    Article  MathSciNet  Google Scholar 

  35. Khodadadi, M., & Behrad, A. (2012). Text localization, extraction and inpainting in color images (pp. 1035–1040). In 20th Iranian conference on electrical engineering. Tehran: IEEE.

  36. Zhao, M., Li, S., & Kwok, J. (2010). Text detection in images using sparse representation with discriminative dictionaries. Image and Vision Computing, 28, 1590–1599.

    Article  Google Scholar 

  37. Pan, Y. F., Hou, X., & Liu, C. L. (2009). Text localization in natural scene images based on conditional random field. In 10th International conference on document analysis and recognition (pp. 6–10). Barcelona: IEEE Computer Society.

  38. Jung, C., Liu, Q., & Kim, J. (2009). Accurate text localization in images based on SVM output scores. Image and Vision Computing, 27, 1295–1301.

    Article  Google Scholar 

  39. Zhao, T., Sun, G., Zhang, C., & Chen, D. (2008). Study on video text processing. In IEEE International symposium on industrial electronics (pp. 1215–1218). Cambridge: IEEE.

  40. Saeedi, J., Safabakhsh, R., & Mozaffari, S. (2009). Document image segmentation using fuzzy classifier and the dual-tree DWT. In Proceedings of the 14th international CSI computer conference (pp. 385–391). Tehran: IEEE.

  41. Shivakumara, P., Huang, W., Phan, T. Q., & Tan, C. L. (2010). Accurate video text detection through classification of low and high contrast images. Pattern Recognition, 43, 2165–2185.

    Article  Google Scholar 

  42. Wei, Y. C., & Lin, C. H. (2012). A robust video text detection approach using SVM. Expert Systems with Applications, 39, 10832–10840.

    Article  Google Scholar 

  43. Xu, H., & Su, F. (2015). A robust hierarchical detection method for scene text based on convolutional neural networks. In IEEE international conference on multimedia and expo (pp. 1–6). Turin: IEEE.

  44. Zhang, Z., Shen, W., Yao, C., & Bai, X. (2015). Symmetry-based text line detection in natural scenes. In IEEE conference on computer vision and pattern recognition (pp. 2558–2567). Boston: IEEE.

  45. Chen, K., Yin, F., Hussain, A., & Liu, C. L. (2015). Efficient text localization in born-digital images by local contrast-based segmentation. In 13th International conference on document analysis and recognition (pp. 291–295). Tunis: IEEE.

  46. Jung, J., Lee, S., Min Su, C., & Kim, J. H. (2011). Touch TT: Scene text extractor using touchscreen interface. ETRI Journal, 33, 78–88.

    Article  Google Scholar 

  47. Gomez, L., & Karatzas, D. (2014). A fast hierarchical method for multi-script and arbitrary oriented scene text extraction. Computer Vision and Pattern Recognition. arXiv:1407.7504v1 [cs.CV].

  48. Khatib, T., Karajeh, H., Mohammad, H., & Rajab, L. (2015). A hybrid multilevel text extraction algorithm in scene images. Scientific Research and Essays, 10, 105–113.

    Article  Google Scholar 

  49. Yao, C., Zhang, X., Bai, X., Liu, W., Ma, Y., & Tu, Z. (2012). Detecting texts of arbitrary orientations in natural images. In IEEE conference on computer vision and pattern recognition (pp. 1083–1090). RI: IEEE.

  50. Kang, L., Li, Y., & Doermann, D. (2014). Orientation robust text line detection in natural images. In IEEE conference on computer vision and pattern recognition (pp. 4034–4041). Columbus, Ohio: IEEE Computer Society.

  51. Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., & Bai, X. (2016). Multi-oriented text detection with fully convolutional networks. In IEEE conference on computer vision and pattern recognition (pp. 4159–4167). Los Alamitos, CA: IEEE Computer Society.

  52. Neumann, L., & Matas, J. (2012). Real-time scene text localization and recognition. In 25th IEEE conference on computer vision and pattern recognition (pp. 3538–3545). RI: IEEE.

  53. Lu, S., Chen, T., Tian, S., Lim, J. H., & Tan, C. L. (2015). Scene text extraction based on edges and support vector regression. International Journal on Document Analysis and Recognition, 18, 125–135.

    Article  Google Scholar 

  54. Lucas, S. M., Panaretos, A., Sosa, L., Tang, A., Wong, S., & Young, R. (2003). ICDAR 2003 robust reading competitions. In Proceedings of the seventh international conference on document analysis and recognition (pp. 682–687). Edinburgh: IEEE Computer Society.

  55. Shahab, A., Shafait, F., & Dengel, A. (2011). ICDAR 2011 robust reading competition challenge 2: Reading text in scene images. In 11th International conference on document analysis and recognition (pp. 1491–1496). Beijing: IEEE Computer Society.

  56. Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., & Bigorda, L. G. (2013). ICDAR 2013 robust reading competition. In 12th International conference on document analysis and recognition (pp. 1484–1493). Washington, DC: IEEE.

  57. Wolf, C., & Jolion, J. M. (2006). Object count/area graphs for the evaluation of object detection and segmentation algorithms. International Journal on Document Analysis and Recognition, 8, 280–296.

    Article  Google Scholar 

  58. Kim, J. H., & Lee, S. (2011). KAIST scene text database. http://www.iapr-tc11.org/mediawiki/index.php/KAIST_Scene_Text_Database. Accessed 17 Oct 2012.

  59. Yao, C. (2012). MSRA text detection 500 database (MSRA-TD500). http://www.iapr-tc11.org/mediawiki/index.php/MSRA_Text_Detection_500_Database_(MSRA-TD500). Accessed 30 Oct 2012.

  60. Wang, K. (2014). The street view text dataset (SVT). http://tc11.cvc.uab.es/datasets/SVT_1. Accessed 13 Jan 2014.

Download references

Acknowledgements

Authors would like to thank ECE Department, PEC University of Technology, Chandigarh for providing necessary facilities and CSIR for providing funds (grant file No:08/423(0001)/2015-EMR-1) required for carrying out this research work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deepika Ghai.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghai, D., Jain, N. Comparative Analysis of Multi-scale Wavelet Decomposition and k-Means Clustering Based Text Extraction. Wireless Pers Commun 109, 455–490 (2019). https://doi.org/10.1007/s11277-019-06574-w

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-019-06574-w

Keywords

Navigation