Abstract
Database prototyping is a technique widely used both to validate user requirements and to verify certain application functionality. These tasks usually require the population of the underlying data structures with sampling data that, additionally, may need to stick to certain restrictions. Although some existing approaches have already automated this population task by means of random data generation, the lack of semantic meaning of the resulting structures may interfere both in the user validation and in the designer verification task.
In order to solve this problem and improve the intuitiveness of the resulting prototypes, this paper presents a population system that, departing from the information contained in a UML-compliant Domain Conceptual Model, applies Information Extraction techniques to compile meaningful information sets from texts available through Internet. The system is based on the semantic information extracted from the EWN lexical resource and includes, among other features, a named entity recognition system and an ontology that speed up the prototyping process and improve the quality of the sampling data.
This paper has been supported by the Spanish government, projects TIC2000-0664- C02-01/02 and TIC2001-3530-C02-01/02
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. Bell. Code Generation from Object Models. Embedded Systems Programming, 3:1-9, 1998.
A. Cucchiarelli, D. Luzy, and P. Velardi. Automatic semantic tagging of unknown proper names. In ACL, editor, Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING-ACL’98), pages 286–292, Canada, 1998.
Morgan Kaufman, editor. Sixth Message Understanding Conference (MUC-6), Los Altos, Ca, November 1995.
G. R. Krupka. Description of the SRA system as used for MUC-6. In Kaufman [3], pages 71–86.
B. Magnini and G. Cavaglia. Integrating subject field codes into WordNet. In Proceedings of the LREC-2000, 2000.
B. Magnini and C. Strapparava. Experiments in Word Domain Disambiguation for Parallel Texts. In Proceedings of the ACL Workshop on Word Senses and Multilinguality, 2000.
D. McDonald. Internal and external evidence in the identification and semantic categorization of proper names, 1996.
A. Mikhocv, M. Moons, and C. Grover. Named Entity Recognition withou Gazetteers. In ACL, editor, Proceedings of the 11th European Chapter of the Association for Computational Linguistics (EACL), pages 1–8, Norway, 1999.
A. Montoyo, A. Suarez, and M. Palomar. Combining Supervised-Unsupervised Methods for Word Sense Disambiguation. In Alexander Gelbukh, editor, Proceedings of 3nd International conference on Intelligent Text Processing and Computational Linguistics (CICLing-2002), volume 2276 of Lecture Notes in Computer Science, pages 156–164, Mexico City, 2002. Springer-Verlag.
R. Morgan, R. Garigliano, P. Callaghan, S. Poria, M. Smith, A. Urbanowicz, R. Collingham, M. Costantino, and C. Cooper. Description of the LOLITA system as used for MUC-6. In Kaufman [3], pages 71–86.
R. Muñoz, A. Montoyo, F. Llopis, and A. Suárez. EsReconocimiento de entidades en el sistema EXIT. Procesamiento del Lenguaje Natural, 23:47–53, 1998.
B. Sundheim. Overview of results of the MUC-6. In Kaufman [3], pages 13–32.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Moreda, P., Muñoz, R., Martńez-Barco, P., Cachero, C., Palomar, M. (2002). A Web Information Extraction System to DB Prototyping. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds) Natural Language Processing and Information Systems. NLDB 2002. Lecture Notes in Computer Science, vol 2553. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36271-1_2
Download citation
DOI: https://doi.org/10.1007/3-540-36271-1_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00307-6
Online ISBN: 978-3-540-36271-5
eBook Packages: Springer Book Archive