Improving Table of Contents Recognition Using Layout-Based Features

Nguyen, Phuc Tri; Nguyen, Dang Tuan

doi:10.1007/978-3-319-11680-8_32

Improving Table of Contents Recognition Using Layout-Based Features

Phuc Tri Nguyen⁵ &
Dang Tuan Nguyen⁵

Conference paper

1768 Accesses
1 Altmetric

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 326))

Abstract

Table of content (TOC) recognition is an essential task in processing book contents for document retrieval applications. Existing methods focus on exploiting characteristic information of TOC page formats on specific types of books. However, we observe that many other normal layout based features of pages can also identify the nature of pages (TOC pages or not). In this paper we propose using some selected layout-based features for improving TOC pages recognition. To show the effectiveness of our proposed method, we conduct experiments on ICDAR Book Structure Extraction Datasets 2009, 2011 and 2013, on which it improves the stateof- the-art performance of current approach focusing on TOC pages based features only.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python, 1st edn. O’Reilly Media, Inc. (2009)
Google Scholar
Liu, C., Chen, J., Zhang, X., Liu, J., Huang, Y.: TOC structure extraction from OCR-ed books. In: Geva, S., Kamps, J., Schenkel, R. (eds.) INEX 2011. LNCS, vol. 7424, pp. 98–108. Springer, Heidelberg (2012)
Chapter Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 27:1–27:27, 1–4 (2011), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Doucet, A., Kazai, G., Colutto, S., Mühlberger, G.: Overview of the ICDAR 2013 Competition on Book Structure Extraction. In: Proceedings of the Twelfth International Conference on Document Analysis and Recognition (ICDAR 2013), Washington DC, USA, p. 6 (August 2013)
Google Scholar
Doucet, A., Kazai, G., Dresevic, B., Uzelac, A., Radakovic, B., Todic, N.: ICDAR 2009 Book Structure Extraction Competition. In: Proceedings of the Tenth International Conference on Document Analysis and Recognition, ICDAR 2009, Barcelona, Spain, pp. 1408–1412 (July 2009)
Google Scholar
Doucet, A., Kazai, G., Meunier, J.-L.: ICDAR 2011 Book Structure Extraction Competition. In: Proceedings of the Eleventh International Conference on Document Analysis and Recognition, ICDAR 2011, Beijing, China, pp. 1501–1505 (September 2011)
Google Scholar
Kazai, G., Doucet, A., Landoni, M.: Overview of the INEX 2008 book track. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2008. LNCS, vol. 5631, pp. 106–123. Springer, Heidelberg (2009)
Chapter Google Scholar
Luo, Q., Watanabe, T., Nakayama, T.: Identifying contents page of documents. In: Proceedings of the 13th International Conference on Pattern Recognition 1996, 3rd edn., pp. 696–700 (August 1996)
Google Scholar
Mandal, S., Das, A.K., Bhowmick, P., Chanda, B.: A unified algorithm for identification of various tabular structures from document images. Int. J. Digit. Library Syst. 2(2), 27–54 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science, University of Information Technology, VNU-HCM, Ho Chi Minh City, Vietnam
Phuc Tri Nguyen & Dang Tuan Nguyen

Authors

Phuc Tri Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Dang Tuan Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Phuc Tri Nguyen .

Editor information

Editors and Affiliations

Faculty of Information Technology, VNU University of Engineering and Technology, Hanoi, Vietnam
Viet-Ha Nguyen
Faculty of Information Technology, VNU University of Engineering and Technology, Hanoi, Vietnam
Anh-Cuong Le
School of Knowledge Science, Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Van-Nam Huynh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, P.T., Nguyen, D.T. (2015). Improving Table of Contents Recognition Using Layout-Based Features. In: Nguyen, VH., Le, AC., Huynh, VN. (eds) Knowledge and Systems Engineering. Advances in Intelligent Systems and Computing, vol 326. Springer, Cham. https://doi.org/10.1007/978-3-319-11680-8_32

Download citation

DOI: https://doi.org/10.1007/978-3-319-11680-8_32
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11679-2
Online ISBN: 978-3-319-11680-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics