Abstract
Most conventional characters extraction methods include binarization (background determination), region segmentation, and region identification. Incorrect binarization results adversely influence the segmentation and identification results. This can be a problem when color documents are printed with different background color regions as the binarization will not have effective threshold results and subsequent segmentation and identification steps will not work properly. Conventional region segmentation methods are time-consuming for large document images. Conventional region identification methods are applied for the preceding segmentation results, using a bottom-up method. This study presents an intelligent method to solve these problems, which integrates background determination, region segmentation, and region identification to extract characters in color documents with highlight regions. The results demonstrate that the proposed method is more effective and efficient than other methods in terms of binarization results, extraction results, and computational performance.
This paper is supported by the National Science Council, R.O.C., under Grants NSC 99-2221-E-133-002-.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Smith, E.B., Monn, D., Veeramachaneni, H., Kise, K., Malizia, A., Todoran, L., El-Nasan, A., Ingold, R.: Reports of the DAS02 working groups. IJDAR 6, 211–217 (2004)
Tsai, C.M., Lee, H.J.: Binarization of Color Document Images via Luminance and Saturation Color Features. IEEE Transactions on IP 11(4), 434–451 (2002)
Tsai, C.M., Lee, H.J.: Efficiently Extracting and Classifying Objects for Analyzing Color Documents. online version in Machine Vision and Applications (2009)
Chen, Y.L., Wu, B.F.: A Multi-Plane Approach for Text Segmentation of Complex Document Images. PR 42(7), 1419–1444 (2009)
Otsu, N.: A thresholding selection method from gray-scale histogram. IEEE Trans. Systems, Men, and Cybernetics 9, 62–66 (1979)
Tsai, C.M.: An Efficient and Effective Background Determination for Color Document Images. In: ICMLC 2009, Baoding, Hebei, China, vol. 5, pp. 2857–2862 (2009)
Chou, C.H., Lin, W.H., Chang, F.: A binarization method with learning-built rules for document images produced by cameras. PR 43(4), 1518–1530 (2010)
Niblack, W.: An Introduction to Digital Image Processing, pp. 115–116. Prentice Hall, Englewood Cliffs (1986)
Tseng, Y.H., Lee, H.J.: Document Image Binarization by Two-Stage Block Extraction and Background Intensity Determination. PAA 11, 33–44 (2008)
Nagy, G., Seth, S., Viswanathan, M.: A Prototype Document Image Analysis System for Technical Journals. IEEE Computer 25(7), 10–22 (1992)
Fletcher, L.A., Kasturi, R.: A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images. IEEE Transactions on PAMI 10, 910–918 (1988)
Lee, S.W., Ryu, D.S.: Parameter-Free Geometric Document Layout Analysis. IEEE Transactions on PAMI 23(11), 1240–1256 (2001)
Lee, K.H., Choy, Y.C., Cho, S.B.: Geometric Structure Analysis of Document Images: A Knowledge-Based Approach. IEEE Transactions PAMI 2(11), 1224–1240 (2000)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tsai, CM. (2011). An Intelligent Method to Extract Characters in Color Document with Highlight Regions. In: Mehrotra, K.G., Mohan, C.K., Oh, J.C., Varshney, P.K., Ali, M. (eds) Modern Approaches in Applied Intelligence. IEA/AIE 2011. Lecture Notes in Computer Science(), vol 6704. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21827-9_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-21827-9_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21826-2
Online ISBN: 978-3-642-21827-9
eBook Packages: Computer ScienceComputer Science (R0)