Skip to main content

A toolkit to facilitate the querying and integration of tabular data from semistructured documents

  • Conference paper
  • First Online:
Book cover Advances in Databases (BNCOD 1998)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1405))

Included in the following conference series:

  • 93 Accesses

Abstract

The advent of the World Wide Web (WWW) has meant there is a growing need to link information held in databases and files with information held in other types of structure [2]. This poster presents a toolkit which enables information held in tables within documents to be linked with data held in conventional databases. The toolkit addresses the problems of:-

  1. extracting tabular data from various types of document e.g. semi-structured text [4], HTML and spreadsheets

  2. providing a standard interface to these tables via a standard query language

  3. extracting useful table meta-data from the table and accompanying text.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Wrapper Generation for Semi-structured Internet Sources. Ashish, N & Knoblock, C. University of Southern California.

    Google Scholar 

  2. Learning to Extract Text-based Information from the World Wide Web. Soderland, S. University of Washington. In Proceedings of Third International Conference on Knowledge Discovery and Data mining (KDD-97).

    Google Scholar 

  3. Wrapper Induction for Information Extraction. Kushmerick, N, Weld, D, Doorenbos, R. Proceedings of IJCAI 97.

    Google Scholar 

  4. Using Natural Language Processing for Identifying and Interpreting Tables in Plain Text. Douglas. S, Hurst. M, Quinn, D. December 1994.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Suzanne M. Embury Nicholas J. Fiddian W. Alex Gray Andrew C. Jones

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hodge, L.E., Gray, W.A., Fiddian, N.J. (1998). A toolkit to facilitate the querying and integration of tabular data from semistructured documents. In: Embury, S.M., Fiddian, N.J., Gray, W.A., Jones, A.C. (eds) Advances in Databases. BNCOD 1998. Lecture Notes in Computer Science, vol 1405. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0053482

Download citation

  • DOI: https://doi.org/10.1007/BFb0053482

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64659-4

  • Online ISBN: 978-3-540-69112-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics