Skip to main content
Log in

Script and language identification for handwritten document images

  • Original papers
  • Published:
International Journal on Document Analysis and Recognition Aims and scope Submit manuscript

Abstract.

A system for automatically identifying the script used in a handwritten document image is described. The system was developed using a 496-document dataset representing six scripts, eight languages, and 279 writers. Documents were characterized by the mean, standard deviation, and skew of five connected component features. A linear discriminant analysis was used to classify new documents, and tested using writer-sensitive cross-validation. Classification accuracy averaged 88% across the six scripts. The same method, applied within the Roman subcorpus, discriminated English and German documents with 85% accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Author information

Authors and Affiliations

Authors

Additional information

Received December 1, 1998 / Revised April 5, 1999

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hochberg, J., Bowers, K., Cannon, M. et al. Script and language identification for handwritten document images. IJDAR 2, 45–52 (1999). https://doi.org/10.1007/s100320050036

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s100320050036

Navigation