Recognition of table of contents for electronic library consulting

Belaïd, A.

doi:10.1007/PL00013572

Recognition of table of contents for electronic library consulting

SI: Document Analysis for Office Systems
Published: August 2001

Volume 4, pages 35–45, (2001)
Cite this article

International Journal on Document Analysis and Recognition Aims and scope Submit manuscript

A. Belaïd¹

137 Accesses
27 Citations
Explore all metrics

Abstract.

A labelling approach for the automatic recognition of tables of contents (ToC) is described in this paper. A prototype is used for the electronic consulting of scientific papers in a digital library system named Calliope. This method operates on a roughly structured ASCII file, produced by OCR. The recognition approach operates by text labelling without using any a priori model. Labelling is based on part-of-speech tagging (PoS) which is initiated by a primary labelling of text components using some specific dictionaries. Significant tags are first grouped into homogeneous classes according to their grammar categories and then reduced in canonical forms corresponding to article fields: “title” and “authors”. Non-labelled tokens are integrated in one or another field by either applying PoS correction rules or using a structure model generated from well-detected articles. The designed prototype operates very well on different ToC layouts and character recognition qualities. Without manual intervention, a 96.3% rate of correct segmentation was obtained on 38 journals, including 2,020 articles, accompanied by a 93.0% rate of correct field extraction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Author information

Authors and Affiliations

LORIA-CNRS Campus Scientifique, B.P. 239, 54506 Vandoeuvre-Lœs-Nancy, France; Tel.: +33-3-83592082, Fax: +33-3-83413071, e-mail: Abdel.Belaid@loria.fr , , , , , , FR
A. Belaïd

Authors

A. Belaïd
View author publications
You can also search for this author inPubMed Google Scholar

Additional information

Received April 5, 2000 / Revised February 19, 2001

Rights and permissions

Reprints and permissions

About this article

Cite this article

Belaïd, A. Recognition of table of contents for electronic library consulting. IJDAR 4, 35–45 (2001). https://doi.org/10.1007/PL00013572

Download citation

Issue Date: August 2001
DOI: https://doi.org/10.1007/PL00013572

Key words: Calliope project – Digital library – Table of contents recognition – Part-of-speech tagging – OCR combination

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recognition of table of contents for electronic library consulting

Abstract.

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Clustering Approach Combining Lines and Text Detection for Table Extraction

Automatic Identification of Table Contents in Electronic Component Specifications of EDA

Automated Table Understanding Using Stub Patterns

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Subscribe and save

Buy Now

Recognition of table of contents for electronic library consulting

Abstract.

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Clustering Approach Combining Lines and Text Detection for Table Extraction

Automatic Identification of Table Contents in Electronic Component Specifications of EDA

Automated Table Understanding Using Stub Patterns

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Subscribe and save

Buy Now