Loading [a11y]/accessibility-menu.js
Training on severely degraded text-line images | IEEE Conference Publication | IEEE Xplore

Training on severely degraded text-line images


Abstract:

We show that document image decoding (DID) supervised training algorithms, as a result of recent refinements, achieve high accuracy with low manual effort even under cond...Show More

Abstract:

We show that document image decoding (DID) supervised training algorithms, as a result of recent refinements, achieve high accuracy with low manual effort even under conditions of severe image degradation in both training and test data. We describe improvements in DID training of character template, set-width, and channel (noise) models. Large-scale experimental trials, using synthetically degraded images of text, have established two new and practically important advantages of DID algorithms: 1) high accuracy (> 99% characters correct) in decoding using models trained on even severely degraded images from the same distribution; and 2) greatly improved accuracy (< 1/10 the error rate) across a wide range of image degradations compared to untrained (idealized) models. This ability to train reliably on low-quality images that suffer from massive fragmentation and merging of characters, without the need for manual segmentation and labeling of character images, significantly reduces the manual effort of DID training.
Date of Conference: 06-06 August 2003
Date Added to IEEE Xplore: 08 September 2003
Print ISBN:0-7695-1960-1
Conference Location: Edinburgh, UK

Contact IEEE to Subscribe