skip to main content
10.1145/2960811.2967162acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
short-paper

A PDF Wrapper for Table Processing

Published:13 September 2016Publication History

ABSTRACT

We propose a PDF document wrapper system that is specifically targeted at table processing applications. We (i) review the PDF specifications and identify particular challenges from the table processing point of view, (ii) specify a table-oriented document model containing the required atomic elements for table extraction and understanding applications. Our evaluation showed that the wrapper was able to detect important features such as page columns, bullets and numbering in all measures, recording over 90% accuracy, leading to better table locating and segmenting.

References

  1. Jing Fang, Liangcai Gao, Kun Bai, Ruiheng Qiu, Xin Tao, and Zhi Tang. A table detection method for multipage PDF documents via visual seperators and tabular structures. In Document Analysis and Recognition (ICDAR), pages 779--783. IEEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jing Fang, Prasenjit Mitra, Zhi Tang, and C Lee Giles. Table header detection and classification. In AAAI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Adobe Systems Incorporated. PDF reference. Technical Report Version 1.7, November 2006.Google ScholarGoogle Scholar
  4. Ying Liu, Kun Bai, Prasenjit Mitra, and C Lee Giles. Improving the table boundary detection in pdfs by fixing the sequence error of the sparse lines. In Document Analysis and Recognition (ICDAR), pages 1006--1010. IEEE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ermelinda Oro and Massimo Ruffolo. PDF-TREX: An approach for recognizing and extracting tables from pdf documents. In Document Analysis and Recognition (ICDAR), pages 906--910. IEEE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Roya Rastan, Hye-Young Paik, and John Shepherd. Texus: A task-based approach for table extraction and understanding. In Symposium on Document Engineering, pages 25--34. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Roya Rastan, Hye-Young Paik, John Shepherd, and Armin Haller. Automated table understanding using stub patterns. In Database Systems for Advanced Applications, pages 533--548. Springer, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  8. Sachin Seth and George Nagy. Segmenting tables via indexing of value cells by table headers. In Document Analysis and Recognition (ICDAR), pages 887--891. IEEE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ana Costa E Silva. New metrics for evaluating performance in document analysis tasks application to the table case. In Document Analysis and Recognition (ICDAR), pages 481--485. IEEE, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A PDF Wrapper for Table Processing

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      DocEng '16: Proceedings of the 2016 ACM Symposium on Document Engineering
      September 2016
      222 pages
      ISBN:9781450344388
      DOI:10.1145/2960811

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 September 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      DocEng '16 Paper Acceptance Rate11of35submissions,31%Overall Acceptance Rate178of537submissions,33%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader