Paper
24 January 2011 A simple and effective figure caption detection system for old-style documents
Zongyi Liu, Hanning Zhou
Author Affiliations +
Proceedings Volume 7874, Document Recognition and Retrieval XVIII; 78740T (2011) https://doi.org/10.1117/12.872144
Event: IS&T/SPIE Electronic Imaging, 2011, San Francisco Airport, California, United States
Abstract
Identifying figure captions has wide applications in producing high quality e-books such as kindle books or ipad books. In this paper, we present a rule-based system to detect horizontal figure captions in old-style documents. Our algorithm consists of three steps: (i) segment images into regions of different types such as text and figures, (ii) search the best caption region candidate based on heuristic rules such as region alignments and distances, and (iii) expand caption regions identified in step (ii) with its neighboring text-regions in order to correct oversegmentation errors. We test our algorithm using 81 images collected from old-style books, with each image containing at least one figure area. We show that the approach is able to correctly detect figure captions from images with different layouts, and we also measure its performances in terms of both precision rate and recall rate.
© (2011) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Zongyi Liu and Hanning Zhou "A simple and effective figure caption detection system for old-style documents", Proc. SPIE 7874, Document Recognition and Retrieval XVIII, 78740T (24 January 2011); https://doi.org/10.1117/12.872144
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Image segmentation

Optical character recognition

Image processing algorithms and systems

Image processing

Rule based systems

Detection and tracking algorithms

Error analysis

RELATED CONTENT

Locally adaptive document skew detection
Proceedings of SPIE (April 03 1997)
A higher-order-statistics-based approach to face detection
Proceedings of SPIE (February 08 2005)
Practical automatic Arabic license plate recognition system
Proceedings of SPIE (February 17 2011)
Dotted and curved line character segmentation
Proceedings of SPIE (January 24 2011)
Attention trees and semantic paths
Proceedings of SPIE (February 12 2007)

Back to Top