Abstract
The advent of the World Wide Web (WWW) has meant there is a growing need to link information held in databases and files with information held in other types of structure [2]. This poster presents a toolkit which enables information held in tables within documents to be linked with data held in conventional databases. The toolkit addresses the problems of:-
-
•
extracting tabular data from various types of document e.g. semi-structured text [4], HTML and spreadsheets
-
•
providing a standard interface to these tables via a standard query language
-
•
extracting useful table meta-data from the table and accompanying text.
References
Wrapper Generation for Semi-structured Internet Sources. Ashish, N & Knoblock, C. University of Southern California.
Learning to Extract Text-based Information from the World Wide Web. Soderland, S. University of Washington. In Proceedings of Third International Conference on Knowledge Discovery and Data mining (KDD-97).
Wrapper Induction for Information Extraction. Kushmerick, N, Weld, D, Doorenbos, R. Proceedings of IJCAI 97.
Using Natural Language Processing for Identifying and Interpreting Tables in Plain Text. Douglas. S, Hurst. M, Quinn, D. December 1994.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hodge, L.E., Gray, W.A., Fiddian, N.J. (1998). A toolkit to facilitate the querying and integration of tabular data from semistructured documents. In: Embury, S.M., Fiddian, N.J., Gray, W.A., Jones, A.C. (eds) Advances in Databases. BNCOD 1998. Lecture Notes in Computer Science, vol 1405. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0053482
Download citation
DOI: https://doi.org/10.1007/BFb0053482
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64659-4
Online ISBN: 978-3-540-69112-9
eBook Packages: Springer Book Archive