Skip to main content
Log in

On tables of contents and how to recognize them

  • Original Paper
  • Published:
International Journal of Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

We present a method for structuring a document according to the information present in its different organizational tables: table of contents, tables of figures, etc. This method is based on a two-step approach that leverages functional and formal (layout-based) kinds of knowledge. The functional definition of organizational table, based on five properties, is used to provide a first solution, which is improved in a second step by automatically learning the form of the table of contents. We also report on the robustness and performance of the method and we illustrate its use in a real conversion case.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Belaïd, A., Pierron, L., Valverde, N.: Part-of-speech tagging for table of contents recognition. In: International Conference on Pattern Recognition (2000)

  2. Déjean, H., Meunier, J.-L.: System for converting PDF documents into structured XML format. In: Proceedings of the Seventh IAPR Workshop on Document Analysis Systems, Nelson, New Zealand, 13–15 February 2006

  3. Déjean, H., Meunier, J.-L.: Logical document conversion: combining functional and formal knowledge. In: Proceedings of the 2007 ACM Symposium on Document Engineering, Winnipeg, Manitoba, Canada (2007). doi:10.1145/1284420.1284456

  4. Déjean, H., Meunier, J.-L.: Combining multiple methods for book indexing. In: Proceedings of the Eighth IAPR International Workshop on Document Analysis Systems, Nara, Japan, 16–19 September 2008

  5. Forney G.D.: The Viterbi algorithm. Proc. IEEE 61(3), 268–278 (1973)

    Article  MathSciNet  Google Scholar 

  6. He, F., Ding, X., Peng, L.: Hierarchical logical structure extraction of book documents by analyzing tables of contents. Document Recognition and Retrieval XI, Proceedings of SPIE-IS&T Electronic Imaging, SPIE, vol. 5296 (2004)

  7. Le Bourgeois, F., Emptoz, H., Souafi Bensafi, S.: Document understanding using probabilistic relaxation: application on tables of contents of periodicals. In: Proceedings of the Sixth International Conference on Document Analysis and Recognition (2001)

  8. Lin, C., Niwa, Y., Narita, S.: Logical structure analysis of book document images using contents information. In: Proceedings of the Fourth International Conference on Document Analysis and Recognition (1997)

  9. Lin X., Xiong Y.: Detection and analysis of table of contents based on content association. Int. J. Document Anal. Understand. 8, 2–3 (2006)

    Google Scholar 

  10. Lin, X.: Text-mining based journal splitting. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition (2003)

  11. Mandal, S., Chowbury, S.P., Das, A.K., Chanda, B.: Automated detection and segmentation of table of contents pages from document images. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition (2003)

  12. Meunier, J.-L.: Optimized XY-cut for determining a page reading order. In: Proceedings of the Eighth International Conference on Document Analysis and Recognition (2005)

  13. Satoh, S., Takasu, A., Katsura, E.: An automated generation of electronic library based on document image understanding. In: Proceedings of the Third International Conference on Document Analysis and Recognition (1995)

  14. Tsuruoka, S., Hirano, C., Yoshikawa, T., Shinogi, T.: Image-based structure analysis for a ToC and conversion to XML. In: DLIA Workshop 2001

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hervé Déjean.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Déjean, H., Meunier, JL. On tables of contents and how to recognize them. IJDAR 12, 1–20 (2009). https://doi.org/10.1007/s10032-009-0078-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-009-0078-8

Keywords

Navigation