Abstract
In this paper, we propose an algorithm for extracting text regions from images in spam-mails. The Color Layer-Based Text Extraction(CLTE) algorithm divides the input image into eight planes as color layers. It extracts connected components on the eight planes, and then classifies them into either text regions or non-text. We also propose an algorithm to recover damaged text strokes in Korean text images. There are two types of damaged strokes: (1) middle strokes such as ‘⌉’ or ‘—’ are deleted, and (2) the first and last strokes such as ‘∘’ or ‘□’ are filled with black pixels. An experiment with 200 spammail images shows that the proposed approach is more accurate than conventional methods by over 10%.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Zhong, Y., Zhang, H., Jain, A.K.: Automatic Caption Localization in Compressed Video. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(4), 385–392 (2000)
Zhong, Y., Karu, K., Jain, A.K.: Locating Text in Complex Color Images. Pattern Recognition 28(10), 1523–1535 (1995)
Wolf, C., Jolion, J.M.: Extraction and Recognition of Artificial Text in Multimedia Documents. Pattern Analysis and Applications 6(4), 306–326 (2003)
Wang, X., Ding, X., Liu, C.: Character Extraction and Recognition in Natural Scene Images. In: Proc. Sixth ICDAR, pp. 1084–1088 (2001)
Kim, J.S., Park, S.C., Kim, S.H.: Text locating from Natural Scene Images Using Image Intensities. In: Proc. 8th ICDAR, Seoul, Korea, pp. 655–659 (August 2005)
Choi, Y.U.: Scene Text Extraction in Natural Images Using Hierarchical Feature Combining and Verification. In: The 2nd KAIST-Tsinghua JWPR, Daejeon, Korea, pp. 76–102 (2003)
Ballard, D.H., Brown, C.M.: Computer Vision. Prentice-Hall, Englewood Cliffs (1982)
Kim, S.H., Park, S.C., Jeong, C.B., Kim, J.S., Park, H.R., Lee, G.S.: Keyword Spotting on Korean Document Images by Matching the Keyword Image. In: Fox, E.A., Neuhold, E.J., Premsmit, P., Wuwongse, V. (eds.) ICADL 2005. LNCS, vol. 3815, pp. 158–166. Springer, Heidelberg (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Kim, JS., Kim, S.H., Yang, H.J., Son, H.J., Kim, W.P. (2007). Text Extraction for Spam-Mail Image Filtering Using a Text Color Estimation Technique. In: Okuno, H.G., Ali, M. (eds) New Trends in Applied Artificial Intelligence. IEA/AIE 2007. Lecture Notes in Computer Science(), vol 4570. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73325-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-73325-6_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73322-5
Online ISBN: 978-3-540-73325-6
eBook Packages: Computer ScienceComputer Science (R0)