Skip to main content

Improving Table of Contents Recognition Using Layout-Based Features

  • Conference paper

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 326))

Abstract

Table of content (TOC) recognition is an essential task in processing book contents for document retrieval applications. Existing methods focus on exploiting characteristic information of TOC page formats on specific types of books. However, we observe that many other normal layout based features of pages can also identify the nature of pages (TOC pages or not). In this paper we propose using some selected layout-based features for improving TOC pages recognition. To show the effectiveness of our proposed method, we conduct experiments on ICDAR Book Structure Extraction Datasets 2009, 2011 and 2013, on which it improves the stateof- the-art performance of current approach focusing on TOC pages based features only.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python, 1st edn. O’Reilly Media, Inc. (2009)

    Google Scholar 

  2. Liu, C., Chen, J., Zhang, X., Liu, J., Huang, Y.: TOC structure extraction from OCR-ed books. In: Geva, S., Kamps, J., Schenkel, R. (eds.) INEX 2011. LNCS, vol. 7424, pp. 98–108. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  3. Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 27:1–27:27, 1–4 (2011), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

  4. Doucet, A., Kazai, G., Colutto, S., Mühlberger, G.: Overview of the ICDAR 2013 Competition on Book Structure Extraction. In: Proceedings of the Twelfth International Conference on Document Analysis and Recognition (ICDAR 2013), Washington DC, USA, p. 6 (August 2013)

    Google Scholar 

  5. Doucet, A., Kazai, G., Dresevic, B., Uzelac, A., Radakovic, B., Todic, N.: ICDAR 2009 Book Structure Extraction Competition. In: Proceedings of the Tenth International Conference on Document Analysis and Recognition, ICDAR 2009, Barcelona, Spain, pp. 1408–1412 (July 2009)

    Google Scholar 

  6. Doucet, A., Kazai, G., Meunier, J.-L.: ICDAR 2011 Book Structure Extraction Competition. In: Proceedings of the Eleventh International Conference on Document Analysis and Recognition, ICDAR 2011, Beijing, China, pp. 1501–1505 (September 2011)

    Google Scholar 

  7. Kazai, G., Doucet, A., Landoni, M.: Overview of the INEX 2008 book track. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2008. LNCS, vol. 5631, pp. 106–123. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  8. Luo, Q., Watanabe, T., Nakayama, T.: Identifying contents page of documents. In: Proceedings of the 13th International Conference on Pattern Recognition 1996, 3rd edn., pp. 696–700 (August 1996)

    Google Scholar 

  9. Mandal, S., Das, A.K., Bhowmick, P., Chanda, B.: A unified algorithm for identification of various tabular structures from document images. Int. J. Digit. Library Syst. 2(2), 27–54 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Phuc Tri Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Nguyen, P.T., Nguyen, D.T. (2015). Improving Table of Contents Recognition Using Layout-Based Features. In: Nguyen, VH., Le, AC., Huynh, VN. (eds) Knowledge and Systems Engineering. Advances in Intelligent Systems and Computing, vol 326. Springer, Cham. https://doi.org/10.1007/978-3-319-11680-8_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11680-8_32

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11679-2

  • Online ISBN: 978-3-319-11680-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics