Abstract
Text present in images provides important information for automatic annotation, indexing and retrieval. Therefore, its extraction is a well-known research area in computer vision. However, variations of text due to differences in orientation, alignment, font, size, contrast and complex background makes the problem of text extraction extremely challenging. In this paper, we propose an efficient method to extract text regions even under complex background using DWT and k-means clustering along with voting decision process. As textures of text have abrupt variation and irregular texture property in the wavelet transform domain, so wavelet transform seems to be the best choice for achieving the objective of image segmentation. A small size overlapping sliding window is used to scan high frequency component sub-bands from which texture features are extracted. On the basis of these features, k-means clustering is employed to classify the image into text and background clusters. Finally, voting decision process and area-based filtering are used to locate text regions accurately. We examined and evaluated the performance by varying wavelet functions and decomposition levels. The proposed method is evaluated on four standard datasets (ICDAR 2013, KAIST, MSRA-TD500, SVT) and own created dataset. Further, performance analysis reveals that this method is robust and efficient for extracting text regions under various conditions.
Similar content being viewed by others
References
Zhang, H., Zhao, K., Song, Y. Z., & Guo, J. (2013). Text extraction from natural scene image: A survey. Neurocomputing, 122, 310–323.
Jung, K., Kim, K. I., & Jain, A. K. (2004). Text information extraction in images and video: A survey. Pattern Recognition, 37, 977–997.
Antani, S., Kasturi, R., & Jain, R. (2002). A survey on the use of pattern recognition methods for abstraction, indexing, and retrieval of images and video. Pattern Recognition, 35, 945–965.
Sumathi, C. P., Santhanam, T., & Devi, G. G. (2012). A survey on various approaches of text extraction in images. International Journal of Computer Science & Engineering Survey, 3, 27–42.
Liu, X., & Samarabandu, J. (2005). An edge-based text region extraction algorithm for indoor mobile robot navigation. In Proceedings of the IEEE international conference on mechatronics & automation (pp. 701–706). Niagara Falls: IEEE.
Liu, C., Wang, C., & Dai, R. (2005). Text detection in images based on unsupervised classification of edge-based features. In Proceedings of the 8th international conference on document analysis and recognition (pp. 610–614). IEEE Computer Society.
Lyu, M. R., Song, J., & Cai, M. (2005). A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Transactions on Circuits and Systems for Video Technology, 15, 243–255.
Dinh, T. N., Park, J., & Lee, G. (2008). Low-complexity text extraction in Korean signboards for mobile applications. In 8th IEEE international conference on computer and information technology (pp. 333–337). Sydney, NSW: IEEE.
Lai, A. N., & Lee, G. (2008). Binarization by local k-means clustering for Korean text extraction. In IEEE international symposium on signal processing and information technology (pp. 117–122). Sarajevo: IEEE.
Grover, S., Arora, K., & Mitra, S. K. (2009). Text extraction from document images using edge information. In Annual IEEE India conference (pp. 1–4). Gujarat: IEEE.
Phan, T. Q., Shivakumara, P., & Tan, C. L. (2009). A Laplacian method for video text detection. In 10th International conference on document analysis and recognition (pp. 66–70). Barcelona: IEEE Computer Society.
Shivakumara, P., Phan, T. Q., & Tan, C. L. (2009). Video text detection based on filters and edge features. In IEEE international conference on multimedia and expo (pp. 514–517). New York: IEEE.
Shivakumara, P., Phan, T. Q., & Tan, C. L. (2009). A gradient difference based technique for video text detection. In 10th International conference on document analysis and recognition (pp. 156–160). Barcelona: IEEE Computer Society.
Zhang, X., Sun, F., & Gu, L. (2010). A combined algorithm for video text extraction. In 7th International conference on fuzzy systems and knowledge discovery (pp. 2294–2298), Yantai, Shandong: IEEE.
Anoual, H., Aboutajdine, D., Ensias, S. E., & Enset, A. J. (2010). Features extraction for text detection and localization. In 5th International symposium on I/V on communications and mobile network (pp. 1–4). Rabat: IEEE.
Shah, S., Modi, C., & Patel, M. (2011). Novel approach for text extraction from natural images using ISEF edge detection. In International conference on emerging trends in networks and computer communications (pp. 487–491). Udaipur: IEEE.
Seeri, S. V., Giraddi, S., & Prashant, B. M. (2012). A novel approach for Kannada text extraction. In Proceedings of the international conference on pattern recognition, informatics and medical engineering (pp. 444–448). Salem, Tamilnadu: IEEE.
Zheng, L., He, X., Samali, B., & Yang, L. T. (2013). An algorithm for accuracy enhancement of license plate recognition. Journal of Computer and System Sciences, 79, 245–255.
Yao, J. L., Wang, Y. Q., Weng, L. B., & Yang, Y. P. (2007). Locating text based on connected component and SVM. In Proceedings of the 2007 international conference on wavelet analysis and pattern recognition (pp. 1418–1423). Beijing: IEEE.
Kim, W., & Kim, C. (2009). A new approach for overlay text detection and extraction from complex video scene. IEEE Transactions on Image Processing, 18, 401–411.
Sun, L., Liu, G., Qian, X., & Guo, D. (2009). A novel text detection and localization method based on corner response. In IEEE international conference on multimedia and expo (pp. 390–393). New York: IEEE.
Kumar, M., Kim, Y. C., & Lee, G. S. (2010). Text detection using multilayer separation in real scene images. In 10th IEEE international conference on computer and information technology (pp. 1413–1417). Bradford: IEEE Computer Society.
Zhang, Y., Wang, C., Xiao, B., & Shi, C. (2012). A new text extraction method incorporating local information. In International conference on frontiers in handwriting recognition (pp. 252–255). Bari: IEEE.
Raj, H., & Ghosh, R. (2014). Devanagari text extraction from natural scene images. In International conference on advances in computing, communications and informatics (pp. 513–517). New Delhi: IEEE.
Qiao, Y. L., Li, M., Lu, Z. M., & Sun, S. H. (2006). Gabor filter based text extraction from digital document images. In Proceedings of the 2006 international conference on intelligent information hiding and multimedia signal processing (pp. 297–300). Pasadena: IEEE Computer Society.
Angadi, S. A., & Kodabagi, M. M. (2010). A texture based methodology for text region extraction from low resolution natural scene images. International Journal of Image Processing, 3, 229–245.
Nagabhushan, P., & Nirmala, S. (2010). Text extraction in complex color document images for enhanced readability. Intelligent Information Management, 2, 120–133.
Aradhya, V. N. M., Pavithra, M. S., & Naveena, C. (2012). A robust multilingual text detection approach based on transforms and wavelet entropy. Procedia Technology, 4, 232–237.
Azadboni, M. K., & Behrad, A. (2012). Text detection and character extraction in color images using FFT domain filtering and SVM classification. In 6th International symposium on telecommunications (pp. 794–799). Tehran: IEEE.
Shekar, B. H., Smitha, M. L., & Shivakumara, P. (2014). Discrete wavelet transform and gradient difference based approach for text localization in videos. In 5th International conference on signals and image processing (pp. 280–284). Jeju Island: IEEE.
Bai, B., Yin, F., & Liu, C. L. (2014). A seed-based segmentation method for scene text extraction. In 11th IAPR international workshop on document analysis systems (pp. 262–266). Tours: IEEE.
Kumar, A., & Awasthi, N. (2013). An efficient algorithm for text localization and extraction in complex video text images. In 2nd International conference on information management in the knowledge economy (pp. 14–19). Chandigarh: IEEE.
Shivakumara, P., Sreedhar, R. P., Phan, T. Q., Lu, S., & Tan, C. L. (2012). Multioriented video scene text detection through bayesian classification and boundary growing. IEEE Transactions on Circuits and Systems for Video Technology, 22, 1227–1235.
Yi, C., & Tian, Y. (2012). Localizing text in scene images by boundary clustering, stroke segmentation, and string fragment classification. IEEE Transactions on Image Processing, 21, 4256–4268.
Khodadadi, M., & Behrad, A. (2012). Text localization, extraction and inpainting in color images (pp. 1035–1040). In 20th Iranian conference on electrical engineering. Tehran: IEEE.
Zhao, M., Li, S., & Kwok, J. (2010). Text detection in images using sparse representation with discriminative dictionaries. Image and Vision Computing, 28, 1590–1599.
Pan, Y. F., Hou, X., & Liu, C. L. (2009). Text localization in natural scene images based on conditional random field. In 10th International conference on document analysis and recognition (pp. 6–10). Barcelona: IEEE Computer Society.
Jung, C., Liu, Q., & Kim, J. (2009). Accurate text localization in images based on SVM output scores. Image and Vision Computing, 27, 1295–1301.
Zhao, T., Sun, G., Zhang, C., & Chen, D. (2008). Study on video text processing. In IEEE International symposium on industrial electronics (pp. 1215–1218). Cambridge: IEEE.
Saeedi, J., Safabakhsh, R., & Mozaffari, S. (2009). Document image segmentation using fuzzy classifier and the dual-tree DWT. In Proceedings of the 14th international CSI computer conference (pp. 385–391). Tehran: IEEE.
Shivakumara, P., Huang, W., Phan, T. Q., & Tan, C. L. (2010). Accurate video text detection through classification of low and high contrast images. Pattern Recognition, 43, 2165–2185.
Wei, Y. C., & Lin, C. H. (2012). A robust video text detection approach using SVM. Expert Systems with Applications, 39, 10832–10840.
Xu, H., & Su, F. (2015). A robust hierarchical detection method for scene text based on convolutional neural networks. In IEEE international conference on multimedia and expo (pp. 1–6). Turin: IEEE.
Zhang, Z., Shen, W., Yao, C., & Bai, X. (2015). Symmetry-based text line detection in natural scenes. In IEEE conference on computer vision and pattern recognition (pp. 2558–2567). Boston: IEEE.
Chen, K., Yin, F., Hussain, A., & Liu, C. L. (2015). Efficient text localization in born-digital images by local contrast-based segmentation. In 13th International conference on document analysis and recognition (pp. 291–295). Tunis: IEEE.
Jung, J., Lee, S., Min Su, C., & Kim, J. H. (2011). Touch TT: Scene text extractor using touchscreen interface. ETRI Journal, 33, 78–88.
Gomez, L., & Karatzas, D. (2014). A fast hierarchical method for multi-script and arbitrary oriented scene text extraction. Computer Vision and Pattern Recognition. arXiv:1407.7504v1 [cs.CV].
Khatib, T., Karajeh, H., Mohammad, H., & Rajab, L. (2015). A hybrid multilevel text extraction algorithm in scene images. Scientific Research and Essays, 10, 105–113.
Yao, C., Zhang, X., Bai, X., Liu, W., Ma, Y., & Tu, Z. (2012). Detecting texts of arbitrary orientations in natural images. In IEEE conference on computer vision and pattern recognition (pp. 1083–1090). RI: IEEE.
Kang, L., Li, Y., & Doermann, D. (2014). Orientation robust text line detection in natural images. In IEEE conference on computer vision and pattern recognition (pp. 4034–4041). Columbus, Ohio: IEEE Computer Society.
Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., & Bai, X. (2016). Multi-oriented text detection with fully convolutional networks. In IEEE conference on computer vision and pattern recognition (pp. 4159–4167). Los Alamitos, CA: IEEE Computer Society.
Neumann, L., & Matas, J. (2012). Real-time scene text localization and recognition. In 25th IEEE conference on computer vision and pattern recognition (pp. 3538–3545). RI: IEEE.
Lu, S., Chen, T., Tian, S., Lim, J. H., & Tan, C. L. (2015). Scene text extraction based on edges and support vector regression. International Journal on Document Analysis and Recognition, 18, 125–135.
Lucas, S. M., Panaretos, A., Sosa, L., Tang, A., Wong, S., & Young, R. (2003). ICDAR 2003 robust reading competitions. In Proceedings of the seventh international conference on document analysis and recognition (pp. 682–687). Edinburgh: IEEE Computer Society.
Shahab, A., Shafait, F., & Dengel, A. (2011). ICDAR 2011 robust reading competition challenge 2: Reading text in scene images. In 11th International conference on document analysis and recognition (pp. 1491–1496). Beijing: IEEE Computer Society.
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., & Bigorda, L. G. (2013). ICDAR 2013 robust reading competition. In 12th International conference on document analysis and recognition (pp. 1484–1493). Washington, DC: IEEE.
Wolf, C., & Jolion, J. M. (2006). Object count/area graphs for the evaluation of object detection and segmentation algorithms. International Journal on Document Analysis and Recognition, 8, 280–296.
Kim, J. H., & Lee, S. (2011). KAIST scene text database. http://www.iapr-tc11.org/mediawiki/index.php/KAIST_Scene_Text_Database. Accessed 17 Oct 2012.
Yao, C. (2012). MSRA text detection 500 database (MSRA-TD500). http://www.iapr-tc11.org/mediawiki/index.php/MSRA_Text_Detection_500_Database_(MSRA-TD500). Accessed 30 Oct 2012.
Wang, K. (2014). The street view text dataset (SVT). http://tc11.cvc.uab.es/datasets/SVT_1. Accessed 13 Jan 2014.
Acknowledgements
Authors would like to thank ECE Department, PEC University of Technology, Chandigarh for providing necessary facilities and CSIR for providing funds (grant file No:08/423(0001)/2015-EMR-1) required for carrying out this research work.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ghai, D., Jain, N. Comparative Analysis of Multi-scale Wavelet Decomposition and k-Means Clustering Based Text Extraction. Wireless Pers Commun 109, 455–490 (2019). https://doi.org/10.1007/s11277-019-06574-w
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-019-06574-w