Skip to main content

An Intelligent Method to Extract Characters in Color Document with Highlight Regions

  • Conference paper
Modern Approaches in Applied Intelligence (IEA/AIE 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6704))

Abstract

Most conventional characters extraction methods include binarization (background determination), region segmentation, and region identification. Incorrect binarization results adversely influence the segmentation and identification results. This can be a problem when color documents are printed with different background color regions as the binarization will not have effective threshold results and subsequent segmentation and identification steps will not work properly. Conventional region segmentation methods are time-consuming for large document images. Conventional region identification methods are applied for the preceding segmentation results, using a bottom-up method. This study presents an intelligent method to solve these problems, which integrates background determination, region segmentation, and region identification to extract characters in color documents with highlight regions. The results demonstrate that the proposed method is more effective and efficient than other methods in terms of binarization results, extraction results, and computational performance.

This paper is supported by the National Science Council, R.O.C., under Grants NSC 99-2221-E-133-002-.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Smith, E.B., Monn, D., Veeramachaneni, H., Kise, K., Malizia, A., Todoran, L., El-Nasan, A., Ingold, R.: Reports of the DAS02 working groups. IJDAR 6, 211–217 (2004)

    Google Scholar 

  2. Tsai, C.M., Lee, H.J.: Binarization of Color Document Images via Luminance and Saturation Color Features. IEEE Transactions on IP 11(4), 434–451 (2002)

    Google Scholar 

  3. Tsai, C.M., Lee, H.J.: Efficiently Extracting and Classifying Objects for Analyzing Color Documents. online version in Machine Vision and Applications (2009)

    Google Scholar 

  4. Chen, Y.L., Wu, B.F.: A Multi-Plane Approach for Text Segmentation of Complex Document Images. PR 42(7), 1419–1444 (2009)

    MATH  Google Scholar 

  5. Otsu, N.: A thresholding selection method from gray-scale histogram. IEEE Trans. Systems, Men, and Cybernetics 9, 62–66 (1979)

    Article  Google Scholar 

  6. Tsai, C.M.: An Efficient and Effective Background Determination for Color Document Images. In: ICMLC 2009, Baoding, Hebei, China, vol. 5, pp. 2857–2862 (2009)

    Google Scholar 

  7. Chou, C.H., Lin, W.H., Chang, F.: A binarization method with learning-built rules for document images produced by cameras. PR 43(4), 1518–1530 (2010)

    MATH  Google Scholar 

  8. Niblack, W.: An Introduction to Digital Image Processing, pp. 115–116. Prentice Hall, Englewood Cliffs (1986)

    Google Scholar 

  9. Tseng, Y.H., Lee, H.J.: Document Image Binarization by Two-Stage Block Extraction and Background Intensity Determination. PAA 11, 33–44 (2008)

    MathSciNet  Google Scholar 

  10. Nagy, G., Seth, S., Viswanathan, M.: A Prototype Document Image Analysis System for Technical Journals. IEEE Computer 25(7), 10–22 (1992)

    Article  Google Scholar 

  11. Fletcher, L.A., Kasturi, R.: A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images. IEEE Transactions on PAMI 10, 910–918 (1988)

    Article  Google Scholar 

  12. Lee, S.W., Ryu, D.S.: Parameter-Free Geometric Document Layout Analysis. IEEE Transactions on PAMI 23(11), 1240–1256 (2001)

    Article  Google Scholar 

  13. Lee, K.H., Choy, Y.C., Cho, S.B.: Geometric Structure Analysis of Document Images: A Knowledge-Based Approach. IEEE Transactions PAMI 2(11), 1224–1240 (2000)

    Google Scholar 

  14. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tsai, CM. (2011). An Intelligent Method to Extract Characters in Color Document with Highlight Regions. In: Mehrotra, K.G., Mohan, C.K., Oh, J.C., Varshney, P.K., Ali, M. (eds) Modern Approaches in Applied Intelligence. IEA/AIE 2011. Lecture Notes in Computer Science(), vol 6704. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21827-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21827-9_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21826-2

  • Online ISBN: 978-3-642-21827-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics