skip to main content
10.1145/2838931.2838941acmotherconferencesArticle/Chapter ViewAbstractPublication PagesadcsConference Proceedingsconference-collections
short-paper

Towards Searching Amongst Tables

Published:08 December 2015Publication History

ABSTRACT

An increasing number of data sets are being published online, in institutional or government repositories as well as by individual researchers, journalists, and others. These data are often represented as tables of various kinds: however, repositories have poor search over and inside tables. It is difficult for a user to tell from a repository's portal whether a useful dataset is available, and this problem is only likely to get worse.

We describe this problem, and demonstrate that the naïve approach of full-text search is not appropriate. We describe an alternative, based on inferring types of data and indexing columns as a unit, and demonstrate some improvements in early success especially when long captions are not available.

References

  1. Marco D Adelfio and Hanan Samet. Schema extraction for tabular data on the web. Proc. VLDB Endowment, 6(6):421--432, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. David W Embley, Matthew Hurst, Daniel Lopresti, and George Nagy. Table-processing paradigms: a research survey. Int. J of Document Analysis and Recognition, 8(2--3):66--86, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  3. Ying Liu, Kun Bai, Prasenjit Mitra, and C. Lee Giles. TableSeer: Automatic table metadata extraction and searching in digital libraries. In Proc. JCDL, pages 91--100, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. David Pinto, Michael Branstein, Ryan Coleman, W. Bruce Croft, Matthew King, Wei Li, and Xing Wei. QuASM: A system for question answering using semi-structured data. In Proc. JCDL, pages 46--55, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Pallavi Pyreddy and W Bruce Croft. TINTIN: A system for retrieval in text tables. In Proc. ACM Int. Conf. on Digital Libraries, pages 193--200, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Trevor Strohman, Donald Metzler, Howard Turtle, and W Bruce Croft. Indri: A language model-based search engine for complex queries. In Proc. Int. Conf. on Intelligent Analysis, volume 2, pages 2--6, 2005.Google ScholarGoogle Scholar
  7. Xing Wei, Bruce Croft, and Andrew McCallum. Table extraction for answer retrieval. Inf. Retrieval, 9:589--611, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Towards Searching Amongst Tables

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ADCS '15: Proceedings of the 20th Australasian Document Computing Symposium
      December 2015
      72 pages
      ISBN:9781450340403
      DOI:10.1145/2838931

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 8 December 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper
      • Research
      • Refereed limited

      Acceptance Rates

      ADCS '15 Paper Acceptance Rate5of14submissions,36%Overall Acceptance Rate30of57submissions,53%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader