Skip to main content

Extraction of Referential Heading-Entries in Recognized Table of Contents Pages

  • Conference paper
  • 895 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 348))

Abstract

This paper presents our research focusing on extracting referential heading-entries in recognized table of contents (TOC) pages. This task encounters two issues: the complexity of layouts (e.g., a referential heading-entry can have one or many lines, with “decorate” texts, etc.), and some text data errors caused by OCR processing in training data. Our approach uses several layout-based and content-based features to classify textual lines of TOC pages in datasets. Also, we propose synthesis rules to combine related and classified lines into identify referential heading-entries. The experiments are conducted on ICDAR Book Structure Extraction Datasets 2009, 2011, and 2013. The results of experiments show that proposed approach is more efficient than previous methods of referential heading-entries extraction.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Doucet, A., Kazai, G., Colutto, S., Mühlberger, G.: Overview of the ICDAR 2013 Competition on Book Structure Extraction. In: Proceedings of the Twelfth International Conference on Document Analysis and Recognition (ICDAR 2013), Washington DC, USA, p. 6 (2013)

    Google Scholar 

  2. Liu, C., Chen, J., Zhang, X., Liu, J., Huang, Y.: TOC Structure Extraction from OCR-ed Books. In: Geva, S., Kamps, J., Schenkel, R. (eds.) INEX 2011. LNCS, vol. 7424, pp. 98–108. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  3. Gander, L., Lezuo, C., Unterweger, R.: Rule based document understanding of historical books using a hybrid fuzzy classification system. In: Proceedings of the 2011 Workshop on Historical Document Imaging and Processing, HIP 2011, pp. 91–97. ACM, New York (2011)

    Chapter  Google Scholar 

  4. Lazzara, G., Levillain, R., Géraud, T., Jacquelet, Y., Marquegnies, J., Crépin-Leblond, A.: The scribo module of the olena platform: A free software framework for document image analysis. In: Proceedings of the Eleventh International Conference on Document Analysis and Recognition (ICDAR 2011), pp. 252–258 (2011)

    Google Scholar 

  5. Dresevic, B., Uzelac, A., Radakovic, B., Todic, N.: Book layout analysis: Toc structure extraction engine. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2008. LNCS, vol. 5631, pp. 164–171. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  6. Doucet, A., Kazai, G., Dresevic, B., Uzelac, A., Radakovic, B., Todic, N.: ICDAR 2009 Book Structure Extraction Competition. In: Proceedings of the Tenth International Conference on Document Analysis and Recognition (ICDAR 2009), Barcelona, Spain, pp. 1408–1412 (2009)

    Google Scholar 

  7. Doucet, A., Kazai, G., Meunier, J.L.: ICDAR 2011 Book Structure Extraction Competition. In: Proceedings of the Eleventh International Conference on Document Analysis and Recognition (ICDAR 2011), Beijing, China, pp. 1501–1505 (2011)

    Google Scholar 

  8. Doucet, A., Kazai, G., Dresevic, B., Uzelac, A., Radakovic, B., Todic, N.: Setting up a competition framework for the evaluation of structure extraction from ocr-ed books. International Journal of Document Analysis and Recognition (IJDAR), Special Issue on Performance Evaluation of Document Analysis and Recognition Algorithms 14, 45–52 (2011)

    Article  Google Scholar 

  9. Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), http://www.csie.ntu.edu.tw/~cjlin/libsvm

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Phuc Tri Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Nguyen, P.T., Nguyen, D.T. (2015). Extraction of Referential Heading-Entries in Recognized Table of Contents Pages. In: Silhavy, R., Senkerik, R., Oplatkova, Z., Prokopova, Z., Silhavy, P. (eds) Intelligent Systems in Cybernetics and Automation Theory. CSOC 2015. Advances in Intelligent Systems and Computing, vol 348. Springer, Cham. https://doi.org/10.1007/978-3-319-18503-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18503-3_1

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18502-6

  • Online ISBN: 978-3-319-18503-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics