Paper
24 March 2014 Form classification and retrieval using bag of words with shape features of line structures
Florian Kleber, Markus Diem, Robert Sablatnig
Author Affiliations +
Proceedings Volume 9021, Document Recognition and Retrieval XXI; 902107 (2014) https://doi.org/10.1117/12.2037210
Event: IS&T/SPIE Electronic Imaging, 2014, San Francisco, California, United States
Abstract
In this paper a document form classification and retrieval method using Bag of Words and newly introduced local shape features of form lines is proposed. In a preprocessing step the document is binarized and the form lines (solid and dotted) are detected. The shape features are based on the line information describing local line structures, e.g. line endings, crossings, boxes. The dominant line structures build a vocabulary for each form class. According to the vocabulary an occurrence histogram of structures of form documents can be calculated for the classification and retrieval. The proposed method has been tested on a set of 489 documents and 9 different form classes.
© (2014) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Florian Kleber, Markus Diem, and Robert Sablatnig "Form classification and retrieval using bag of words with shape features of line structures", Proc. SPIE 9021, Document Recognition and Retrieval XXI, 902107 (24 March 2014); https://doi.org/10.1117/12.2037210
Lens.org Logo
CITATIONS
Cited by 2 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Associative arrays

Feature extraction

Solids

Fluctuations and noise

Binary data

Databases

Product engineering

RELATED CONTENT


Back to Top