Abstract
Table of content (TOC) recognition is an essential task in processing book contents for document retrieval applications. Existing methods focus on exploiting characteristic information of TOC page formats on specific types of books. However, we observe that many other normal layout based features of pages can also identify the nature of pages (TOC pages or not). In this paper we propose using some selected layout-based features for improving TOC pages recognition. To show the effectiveness of our proposed method, we conduct experiments on ICDAR Book Structure Extraction Datasets 2009, 2011 and 2013, on which it improves the stateof- the-art performance of current approach focusing on TOC pages based features only.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python, 1st edn. O’Reilly Media, Inc. (2009)
Liu, C., Chen, J., Zhang, X., Liu, J., Huang, Y.: TOC structure extraction from OCR-ed books. In: Geva, S., Kamps, J., Schenkel, R. (eds.) INEX 2011. LNCS, vol. 7424, pp. 98–108. Springer, Heidelberg (2012)
Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 27:1–27:27, 1–4 (2011), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Doucet, A., Kazai, G., Colutto, S., Mühlberger, G.: Overview of the ICDAR 2013 Competition on Book Structure Extraction. In: Proceedings of the Twelfth International Conference on Document Analysis and Recognition (ICDAR 2013), Washington DC, USA, p. 6 (August 2013)
Doucet, A., Kazai, G., Dresevic, B., Uzelac, A., Radakovic, B., Todic, N.: ICDAR 2009 Book Structure Extraction Competition. In: Proceedings of the Tenth International Conference on Document Analysis and Recognition, ICDAR 2009, Barcelona, Spain, pp. 1408–1412 (July 2009)
Doucet, A., Kazai, G., Meunier, J.-L.: ICDAR 2011 Book Structure Extraction Competition. In: Proceedings of the Eleventh International Conference on Document Analysis and Recognition, ICDAR 2011, Beijing, China, pp. 1501–1505 (September 2011)
Kazai, G., Doucet, A., Landoni, M.: Overview of the INEX 2008 book track. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2008. LNCS, vol. 5631, pp. 106–123. Springer, Heidelberg (2009)
Luo, Q., Watanabe, T., Nakayama, T.: Identifying contents page of documents. In: Proceedings of the 13th International Conference on Pattern Recognition 1996, 3rd edn., pp. 696–700 (August 1996)
Mandal, S., Das, A.K., Bhowmick, P., Chanda, B.: A unified algorithm for identification of various tabular structures from document images. Int. J. Digit. Library Syst. 2(2), 27–54 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Nguyen, P.T., Nguyen, D.T. (2015). Improving Table of Contents Recognition Using Layout-Based Features. In: Nguyen, VH., Le, AC., Huynh, VN. (eds) Knowledge and Systems Engineering. Advances in Intelligent Systems and Computing, vol 326. Springer, Cham. https://doi.org/10.1007/978-3-319-11680-8_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-11680-8_32
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11679-2
Online ISBN: 978-3-319-11680-8
eBook Packages: EngineeringEngineering (R0)