Skip to main content
Log in

Character pattern extraction from documents with complex backgrounds

  • Original Research Papers
  • Published:
International Journal on Document Analysis and Recognition Aims and scope Submit manuscript

Abstract.

Recent remarkable progress in computer systems and printing devices has made it easier to produce printed documents with various designs. Text characters are often printed on colored backgrounds, and sometimes on complex backgrounds such as photographs, computer graphics, etc. Some methods have been developed for character pattern extraction from document images and scene images with complex backgrounds. However, the previous methods are suitable only for extracting rather large characters, and the processes often fail to extract small characters with thin strokes. This paper proposes a new method by which character patterns can be extracted from document images with complex backgrounds. The method is based on local multilevel thresholding and pixel labeling, and region growing. This framework is very useful for extracting character patterns from badly illuminated document images. The performance of extracting small character patterns has been improved by suppressing the influence of mixed-color pixels around character edges. Experimental results show that the method is capable of extracting very small character patterns from main text blocks in various documents, separating characters and complex backgrounds, as long as the thickness of the character strokes is more than about 1.5 pixels.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

Author information

Authors and Affiliations

Authors

Additional information

Received July 23, 2001 / Accepted November 5, 2001

Rights and permissions

Reprints and permissions

About this article

Cite this article

Goto, H., Aso, H. Character pattern extraction from documents with complex backgrounds. IJDAR 4, 258–268 (2002). https://doi.org/10.1007/s10032-001-0073-1

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-001-0073-1

Navigation