Loading [a11y]/accessibility-menu.js
Investigation of feature selection for historical document layout analysis | IEEE Conference Publication | IEEE Xplore

Investigation of feature selection for historical document layout analysis

Publisher: IEEE

Abstract:

In this paper we investigate the importance of individual features for the task of document layout analysis, in particular for the classification of the document pixels. ...View more

Abstract:

In this paper we investigate the importance of individual features for the task of document layout analysis, in particular for the classification of the document pixels. The feature set consists of numerous state-of-the-art features, including color, gradient, and local binary patterns (LBP). To deal with the high dimensionality of the feature set, we propose a cascade of an adapted forward selection and a genetic selection. We have evaluated our feature selection method on three historical document datasets. For the classification we used machine learning methods which classify each pixel into either periphery, background, text block, or decoration. The proposed cascading feature selection method reduced the number of features significantly while preserving the cross-validation performance. Furthermore, it selected less features with comparable performance, compared with the conventional feature selection methods. In our analysis we found that LBP features are consistently selected by all feature selection methods on all three datasets. This indicates that LBP correlate highly with the pixel classes much more than any other type of features does. These findings suggest a clue in paradigm for document layout analysis in general.
Date of Conference: 14-17 October 2014
Date Added to IEEE Xplore: 08 January 2015
ISBN Information:

ISSN Information:

Publisher: IEEE
Conference Location: Paris, France

References

References is not available for this document.