Abstract
Legacy data in many mature descriptive sciences is distributed across multiple text descriptions. The challenge is both to extract this data, and to correlate it once extracted. The MultiFlora system does this using an established Information Extraction system tuned to the domain of botany and integrated with a formal ontology to structure and store the data. A range of output formats are supported through the W3C RDFS standard, making it simple to populate a database as desired.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bagga, A., Biermann, A.W.: A Methodology for Cross-Document Coreference. In: Proceedings of the Fifth Joint Conference on Information Sciences, pp. 207–210 (2000)
Chinchor, N.: MUC-4 Evaluation Metrics. In: Proceedings of the Fourth Message Understanding Conference, pp. 22–29 (1992)
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A framework and graphical development environment for robust NLP tools and applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia, USA (2002)
Lydon, S.J., Wood, M.M., Huxley, R., Sutton, D.: Data Patterns in Multiple Botanical Descriptions: implications for automatic processing of legacy data. Systematics and Biodiversity 1(2), 151–157 (2003)
Lawrence, G.M.H.: Taxonomy of Vascular Plants. Macmillan, New York (1951)
Miller, C.J., Attwood, T.K.: Bioinformatics goes back to the future. Nature Reviews Molecular Cell Biology 4, 157–162 (2003)
Radev, D.R., McKeown, K.R.: Generating Natural Language Summaries from Multiple On-Line Sources. Computational Linguistics 24(3) (1998)
Stace, C.: New Flora of the British Isles. Cambridge University Press, Cambridge (1997)
Stein, G.C., Bagga, A., Bowden Wise, G.: Multi-Document Summarization: Methodologies and Evaluations. In: Proceedings of the 7th Conference on Automatic Natural Language Processing, pp. 337–346 (2000)
Wood, M.M., Lydon, S.J., Tablan, V., Maynard, D., Cunningham, H.: Using parallel texts to improve recall in IE. In: Recent Advances in Natural Language Processing: Selected Papers from RANLP 2003, John Benjamins, Amsterdam (2003) (in press)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wood, M.M., Lydon, S.J., Tablan, V., Maynard, D., Cunningham, H. (2004). Populating a Database from Parallel Texts Using Ontology-Based Information Extraction. In: Meziane, F., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2004. Lecture Notes in Computer Science, vol 3136. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27779-8_22
Download citation
DOI: https://doi.org/10.1007/978-3-540-27779-8_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22564-5
Online ISBN: 978-3-540-27779-8
eBook Packages: Springer Book Archive