Abstract.
A system for automatically identifying the script used in a handwritten document image is described. The system was developed using a 496-document dataset representing six scripts, eight languages, and 279 writers. Documents were characterized by the mean, standard deviation, and skew of five connected component features. A linear discriminant analysis was used to classify new documents, and tested using writer-sensitive cross-validation. Classification accuracy averaged 88% across the six scripts. The same method, applied within the Roman subcorpus, discriminated English and German documents with 85% accuracy.
Similar content being viewed by others
Author information
Authors and Affiliations
Additional information
Received December 1, 1998 / Revised April 5, 1999
Rights and permissions
About this article
Cite this article
Hochberg, J., Bowers, K., Cannon, M. et al. Script and language identification for handwritten document images. IJDAR 2, 45–52 (1999). https://doi.org/10.1007/s100320050036
Issue Date:
DOI: https://doi.org/10.1007/s100320050036