Abstract
The entity-relationship approach to conceptual modelling for database design conventionally begins with the analysis of natural language system specifications to identify entities, attributes, and relationships in preparation for the creation of entity models represented in entity-relationship diagrams. This task of document scanning can be both time-consuming and complex, often requiring linguistic knowledge, subject domain knowledge, judgement and intuition. To help alleviate the burden of this aspect of database design, we present some of our research into the development of tools for analysing natural language specifications and extracting candidate entities, attributes, and relationships. Drawing on research in corpus linguistics and terminology science, our research relies on an examination of patterns of word co-occurrence and the use of ‘linguistic cues’. We indicate how we intend integrating our tools into a CASE environment to support database designers during each stage of their work, from the analysis of system specifications through to code generation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abbott, R. J. Program design by informal English descriptions. In Communications of the ACM, 26(11) (1983) 882–894
Aijmer, K. and Altenberg, B. (Eds.) English corpus linguistics: studies in honour of Jan Svartvik. Longman, London and New York (1991)
Bowers, D. From data to database. 2nd edition, Chapman and Hall, London (1993)
Cluver, A. D. de V. A manual of terminography. Human Sciences Research Council, Pretoria (1989)
Connolly, T. and Begg, C. Database systems: a practical approach to design, implementation and management. 2nd edition. Addison Wesley Longman, Harlow (1999)
Cruse, D. A. Lexical semantics. Cambridge University Press, Cambridge (1986)
Fliedl, G., Kop, C., Mayerthaler, W. and Mayr, H. C. Das Projekt NIBA zur automatischen Generierung von Vorentwurfsschemata fĂ¼r die Datenbankentwicklung. In Papiere zur Linguistik, Nr. 55 (Heft 2, 1996) 154–174
Fulford, H. Knowledge processing 6: collocation patterns and term discovery. Computing Sciences Report. CS-92-21. University of Surrey, Guildford (1992)
Fulford, H. Term acquisition: a text-probing approach. Doctoral thesis. University of Surrey, Guildford (1997)
Fulford, H. Griffin, S. and Ahmad, K. Resources for knowledge transfer and training: the exploitation of domain documentation and database technology. In Proceedings of the 6th international conference on urban storm drainage, Volume 2. Eds. J. Marsalek and H. C. Torno. Victoria, Canada: Seapoint Publishing. (1993) 1332–1338
Fulford, H., Work, L. B., and Bowers, D. S. Tools for information systems teaching: making a case for metaCASE. In Proceedings of the 7th Annual Conference on the teaching of computing. Eds. S. Alexander and U. O. Reilly. CTI Computing, University of Ulster. (1999) 64–68
Gomez, F., Segami, C., and Delaune, C. A system for the semiautomatic generation of E-R models from natural language specifications. In Data and knowledge engineering 29 (1999) 57–81
Lejk, M. and Deeks, D. An introduction to systems analysis techniques. Prentice Hall Europe, Hemel Hempstead (1998)
Lyons, J. Semantics. Cambridge University Press, Cambridge (1977)
Rock-Evans, R. A simple introduction to data and activity analysis. Computer Weekly, Sutton (1989)
Saeki, M., Horai, H., and Enomoto, H. Software development process from natural language specification. In Communications of the ACM (1989) 64–73
Sager, J. C., Dungworth, D., and McDonald, P. F. English special languages, principles and practice in science and technology. Oscar Brandstetter Verlag KG (1980)
Sager, J. C. A practical course in terminology processing. John Benjamins Publishing Co., Amsterdam/Philadelphia (1990)
Silberschatz, A., Korth, H. and Sudershan, S. Database system concepts. 3rd edition. McGraw-Hill, Singapore (1997)
Sinclair, J. M. Corpus, concordance, collocation. Oxford University Press, Oxford (1991)
Sinclair, J. M. The automatic analysis of corpora. In Svartvik, J. (Ed.) Directions in corpus linguistics: proceedings of Nobel Symposium 82. Stockholm 1991. Series: Trends in linguistics studies and monographs 65. Mouton de Gruyter, Berlin and New York (1992) 379–397
Tjoa, A. M. and Berger, L. Transformation of requirement specifications expressed in natural language into an EER model. In Proceedings of the 12th Entity-Relationship Approach-ER.93 Conference. Lecture notes in Computer Science, Vol. 823 (1994) 206–217
Yang, H. F. (1986) A new technique for identifying scientific/technical terms and describing science texts. In Literary and Linguistic Computing 1. No. 2. (1986) 93–103
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fulford, H. (2001). Developing Document Analysis and Data Extraction Tools for Entity Modelling. In: Bouzeghoub, M., Kedad, Z., MĂ©tais, E. (eds) Natural Language Processing and Information Systems. NLDB 2000. Lecture Notes in Computer Science, vol 1959. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45399-7_22
Download citation
DOI: https://doi.org/10.1007/3-540-45399-7_22
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41943-3
Online ISBN: 978-3-540-45399-4
eBook Packages: Springer Book Archive