Abstract
This paper proposes a method of effectively segmenting text areas that exist in images by using the texture features of various types of input images obtained in social multimedia networks with an artificial neural network. The proposed text segmentation method consists of four main steps: a step for extracting candidate text areas, a step for localizing the text areas, a step for separating the text from the background, and a step for verifying the candidate text areas. In the candidate text area extraction step, candidate blocks that have any text areas are segmented in an input image on the basis of the texture features of the candidate blocks. In the text area localization step, only strings are extracted from the candidate text blocks. In the text and background separation step, the text areas are separated from the background area in the localized text blocks. In the candidate text area verification step, an artificial neural network is used to verify whether the extracted text blocks include actual text areas and exclude non-text areas. In the experimental results, the proposed method was applied to various types of news and non-news images, and it was found that the proposed method extracted text regions more accurately than existing methods.
Similar content being viewed by others
References
Affonso C, Sassi RJ, Barreiros RM (2015) Biological image classification using rough-fuzzy artificial neural network. Expert Syst Appl 42(24):9482–9488
Aggoune A, Bouramoul A, Kholladi MK (2014) Personalized indexing for heterogeneous multimedia data. In: Proc. International Symposium on Concepts and Tools for knowledge Management (ISKO-Maghreb), 1–7
Angadi SA, Kodabagi MM (2014) A robust segmentation technique for line, word and character extraction from Kannada text in low resolution display board images. In: Proc. International Conference on Signal and Image Processing (ICSIP), 42–49
Chen D, Odobez JM, Bourlard H (2004) Text detection and recognition in images and video frames. Pattern Recogn 37(3):595–608
Dan Z (2013) Improving the accuracy in software effort estimation: using artificial neural network model based on particle swarm optimization. In: Proc. International Conference on Service Operations and Logistics, and Informatics (SOLI), 180–185
Deng C, Ma W, Yin Y (2011) An edge detection approach of image fusion based on improved Sobel operator. In: Proc. International Conference on Image and Signal Processing (CISP), 3:1189–1193
Gong Y, Ke Q, Isard M, Lazebnik S (2014) A multi-view embedding space for modeling Internet images, tags, and their semantics. Int J Comput Vis 106(2):210–233
Haneda E, Bouman CA (2011) Text segmentation for MRC document compression. IEEE Trans Image Process 20(6):611–1626
Herrera PJ, Pajares G, Guijarro M (2011) A segmentation method using Otsu and fuzzy k-Means for stereovision matching in hemispherical images from forest environments. Appl Soft Comput 11(8):4738–4747
Hsia SC, Ho CN, Liu CH (2014) Real-time text detection using PAC/DUE embedded system. In: Proc. International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP), 321–324
Huang X (2011) A novel video text extraction approach based on Log-Gabor filters. In: Proc. International Congress on Image and Signal Processing (CISP), 1:474–478
Huang S, Ahmadi M, Sid-Ahmed MA (2008) A hidden Markov model-based character extraction method. Pattern Recogn 41(9):2890–2900
Huang X, Yang L, Yang Z (2009) A method of text segmentation from scanned image with complex background. In: Proc. International Conference on Management and Service Science (MASS), 1–4
Ilkucar M, Isik AH, Cifci A (2014) Classification of breast cancer data with harmony search and back propagation based artificial neural network. In: Proc. International Conference on Signal Processing and Communications Applications (SIU), 762–765
Jee HK, Lim S, Youn J, Lee J (2014) An augmented reality-based authoring tool for E-learning applications. Multimed Tool Appl 68(2):225–235
Jiang N, Yang W, Duan L, Xu X, Huang C, Liu Q (2012) Acceleration of CT reconstruction for wheat tiller inspection based on adaptive minimum enclosing rectangle. Comput Electron Agr 85:123–133
Kim W, Kim C (2009) A new approach for overlay text detection and extraction from complex video scene. IEEE Trans Image Process 18(2):401–411
Kim T, Kim EJ (2015) Hybrid storage-based caching strategy for content delivery network services. Multimed Tool Appl 74(5):1697–1709
Kim WJ, Kim SD, Radha H (2008) 3D binary morphological operations using run-length representation. Signal Process Image Commun 23(6):442–450
Kolesnikov A, Trichina E, Kauranne T (2015) Estimating the number of clusters in a numerical data set via quantization error modeling. Pattern Recogn 48(3):941–952
Li J, Tian Y, Huang T, Gao W (2008) Multi-polarity text segmentation using graph theory. In: Proc. IEEE International Conference on Image Processing (ICIP), 3008–3011
Lyu MR, Song J, Cai M (2005) A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Trans Circ Syst Video Technol 15(2):243–255
Marquez D, Besccs J (2007) A model-based iterative method for caption extraction in compressed MPEG video. Lect Notes Comput Sci 4816:91–94
Nguyen TN, Miyata K (2015) Multi-scale region perpendicular local binary pattern: an effective feature for interest region description. Vis Comput 31(4):391–406
Otsu N (1979) A threshold selection method from gray-level histogram. IEEE Trans Syst Man Cybern 9(1):62–66
Qian X, Liu G, Wang H, Su R (2007) Text detection, localization, and tracking in compressed video. Signal Process Image Commun 22(9):752–768
Rahman MA, Kim HN, Saddik AE, Gueaieb W (2012) A context-aware multimedia framework toward personal social network services. Multimed Tool Appl 71(3):1717–1747
Roccetti M, Salomoni P, Ghini V, Ferretti S (2005) Bringing the wireless Internet to UMTS devices: a case study with music distribution. Multimed Tool Appl 25(2):217–251
Roy PP, Pal U, Llados J, Delalandre M (2012) Multi-oriented touching text character segmentation in graphical documents using dynamic programming. Pattern Recogn 45(5):1972–1983
Song J, Cai M, Lyu MR (2003) A robust statistic method for classifying color polarity of video text. In: Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 3:385–388
Strauss O, Comby F (2007) Variable structuring element-based fuzzy morphological operations for single viewpoint omnidirectional images. Pattern Recogn 40(12):3578–3596
Su R, Sun C, Zhang C, Pham TD (2014) A new method for linear feature and junction enhancement in 2D images based on morphological operation, oriented anisotropic Gaussian function and Hessian information. Pattern Recogn 47(10):3193–3208
Thepade SD, Subhedarpage KS, Mali AA (2013) Performance rise in content based video retrieval using multi-level Thepade’s sorted ternary block truncation coding with intermediate block videos and even-odd videos, In: Proc. International Conference on Advances in Computing, Communications and Informatics (ICACCI), 962–966
Tian S, Lu S, Su B, Tan CL (2014) Scene text segmentation with multi-level maximally stable extremal regions. In: Proc. International Conference on Pattern Recognition (ICPR), 2703–2708
Vasudev T, Hemanthkumar G, Nagabhushan P (2007) Transformation of arc-form-text to linear-form-text suitable for OCR. Pattern Recogn Lett 28(16):2343–2351
Wu JW, Tseng JCR, Tsai WN (2010) A discrete particle swarm optimization algorithm for domain independent linear text segmentation. In: Proc. IEEE International Conference on granular computing (GRC), 519–524
Zhang DQ, Rajendran RK, Chang SF (2002) General and domain-specific techniques for detecting and recognizing superimposed text in video. In: Proc. International Conference on Image Processing (ICIP), 1:I-593–I-596
Zhang H, Zhu Q, Guan XF (2012) Probe into image segmentation based on Sobel operator and maximum entropy algorithm. In: Proc. International Conference on Computer Science and Service System (CSSS), 238–241
Acknowledgments
This work was supported by the ICT R&D program of MSIP/IITP. [2014(R0112-14-1014), The Development of Open Platform for Service of Convergence Contents.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kim, SH., An, KJ., Jang, SW. et al. Texture feature-based text region segmentation in social multimedia data. Multimed Tools Appl 75, 12815–12829 (2016). https://doi.org/10.1007/s11042-015-3237-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-015-3237-6