skip to main content
10.1145/2361354.2361384acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
research-article

Scientific table type classification in digital library

Published:04 September 2012Publication History

ABSTRACT

Tables are ubiquitous in digital libraries and on the Web, utilized to satisfy various types of data delivery and document formatting goals. For example, tables are widely used to present experimental results or statistical data in a condensed fashion in scientific documents. Identifying and organizing tables of different types is an absolutely necessary task for better table understanding, and data sharing and reusing. This paper has a three-fold contribution: 1) We propose Introduction, Methods, Results, and Discussion (IMRAD)-based table functional classification for scientific documents; 2) A fine-grained table taxonomy is introduced based on an extensive observation and investigation of tables in digital libraries; and 3) We investigate table characteristics and classify tables automatically based on the defined taxonomy. The preliminary experimental results show that our table taxonomy with salient features can significantly improve scientific table classification performance.

References

  1. Cho, K. and Kim, J. 1997. Automatic Text Categorization on Hierarchical Category Structure by using ICF(Inverted Category Frequency) Weighting KOREA INFORMATION SCIENCE SOCIETY, 507--510.Google ScholarGoogle Scholar
  2. Crestan, E. and Pantel, P. 2011. Web-scale table census and classification. In Proceedings of the fourth ACM international conference on Web search and data mining (Hong Kong, China2011), ACM, 1935904, 545--554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Fleiss, J.L. 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin 76, 5, 378--382.Google ScholarGoogle ScholarCross RefCross Ref
  4. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I.H. 2009. The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11, 1, 10--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Hearst, M.A., Divoli, A., Guturu, H., Ksikes, A., Nakov, P., Wooldridge, M.A., and Ye, J. 2007. BioText Search Engine. Bioinformatics 23, 16, 2196--2197. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Kim, S. and Liu, Y. 2011. Functional-Based Table Category Identification in Digital Library. In Proceedings of the 11th International Conference on Document Analysis and Recognition (Beijing, China2011), 1364--1368. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Liu, Y., Bai, K., Mitra, P., and Giles, C.L. 2007. TableSeer: automatic table metadata extraction and searching in digital libraries. In Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries (Vancouver, BC, Canada2007), ACM, 1255193, 91--100. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Scientific table type classification in digital library

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          DocEng '12: Proceedings of the 2012 ACM symposium on Document engineering
          September 2012
          256 pages
          ISBN:9781450311168
          DOI:10.1145/2361354

          Copyright © 2012 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 4 September 2012

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate178of537submissions,33%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader