Paper
19 January 2009 On-line handwritten text categorization
Author Affiliations +
Proceedings Volume 7247, Document Recognition and Retrieval XVI; 724709 (2009) https://doi.org/10.1117/12.804355
Event: IS&T/SPIE Electronic Imaging, 2009, San Jose, California, United States
Abstract
As new innovative devices, accepting or producing on-line documents, emerge, managing facilities for these kinds of documents such as topic spotting are required. This means that we should be able to perform text categorization of on-line documents. The textual data available in on-line documents can be extracted through online recognition, a process which produces noise, i.e. errors, in the resulting text. This work reports experiments on categorization of on-line handwritten documents based on their textual contents. We analyze the effect of the word recognition rate on the categorization performances, by comparing the performances of a categorization system over the texts obtained through on-line handwriting recognition and the same texts available as ground truth. Two categorization algorithms (kNN and SVM) are compared in this work. A subset of the Reuters-21578 corpus consisting of more than 2000 handwritten documents has been collected for this study. Results show that accuracy loss is not significant, and precision loss is only significant for recall values of 60%-80% depending on the noise levels.
© (2009) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Sebastián Peña Saldarriaga, Christian Viard-Gaudin, and Emmanuel Morin "On-line handwritten text categorization", Proc. SPIE 7247, Document Recognition and Retrieval XVI, 724709 (19 January 2009); https://doi.org/10.1117/12.804355
Lens.org Logo
CITATIONS
Cited by 4 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Algorithm development

Detection and tracking algorithms

Error analysis

Feature extraction

Optical character recognition

Signal processing

Systems modeling

RELATED CONTENT

Interaction for style-constrained OCR
Proceedings of SPIE (January 29 2007)
Modeling the sample distribution for clustering OCR
Proceedings of SPIE (December 21 2000)
Benchmarking of document page segmentation
Proceedings of SPIE (December 22 1999)

Back to Top