Skip to main content

Developing Document Analysis and Data Extraction Tools for Entity Modelling

  • Conference paper
  • First Online:
Natural Language Processing and Information Systems (NLDB 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1959))

  • 4466 Accesses

Abstract

The entity-relationship approach to conceptual modelling for database design conventionally begins with the analysis of natural language system specifications to identify entities, attributes, and relationships in preparation for the creation of entity models represented in entity-relationship diagrams. This task of document scanning can be both time-consuming and complex, often requiring linguistic knowledge, subject domain knowledge, judgement and intuition. To help alleviate the burden of this aspect of database design, we present some of our research into the development of tools for analysing natural language specifications and extracting candidate entities, attributes, and relationships. Drawing on research in corpus linguistics and terminology science, our research relies on an examination of patterns of word co-occurrence and the use of ‘linguistic cues’. We indicate how we intend integrating our tools into a CASE environment to support database designers during each stage of their work, from the analysis of system specifications through to code generation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abbott, R. J. Program design by informal English descriptions. In Communications of the ACM, 26(11) (1983) 882–894

    Article  MATH  Google Scholar 

  2. Aijmer, K. and Altenberg, B. (Eds.) English corpus linguistics: studies in honour of Jan Svartvik. Longman, London and New York (1991)

    Google Scholar 

  3. Bowers, D. From data to database. 2nd edition, Chapman and Hall, London (1993)

    Google Scholar 

  4. Cluver, A. D. de V. A manual of terminography. Human Sciences Research Council, Pretoria (1989)

    Google Scholar 

  5. Connolly, T. and Begg, C. Database systems: a practical approach to design, implementation and management. 2nd edition. Addison Wesley Longman, Harlow (1999)

    Google Scholar 

  6. Cruse, D. A. Lexical semantics. Cambridge University Press, Cambridge (1986)

    Google Scholar 

  7. Fliedl, G., Kop, C., Mayerthaler, W. and Mayr, H. C. Das Projekt NIBA zur automatischen Generierung von Vorentwurfsschemata fĂ¼r die Datenbankentwicklung. In Papiere zur Linguistik, Nr. 55 (Heft 2, 1996) 154–174

    Google Scholar 

  8. Fulford, H. Knowledge processing 6: collocation patterns and term discovery. Computing Sciences Report. CS-92-21. University of Surrey, Guildford (1992)

    Google Scholar 

  9. Fulford, H. Term acquisition: a text-probing approach. Doctoral thesis. University of Surrey, Guildford (1997)

    Google Scholar 

  10. Fulford, H. Griffin, S. and Ahmad, K. Resources for knowledge transfer and training: the exploitation of domain documentation and database technology. In Proceedings of the 6th international conference on urban storm drainage, Volume 2. Eds. J. Marsalek and H. C. Torno. Victoria, Canada: Seapoint Publishing. (1993) 1332–1338

    Google Scholar 

  11. Fulford, H., Work, L. B., and Bowers, D. S. Tools for information systems teaching: making a case for metaCASE. In Proceedings of the 7th Annual Conference on the teaching of computing. Eds. S. Alexander and U. O. Reilly. CTI Computing, University of Ulster. (1999) 64–68

    Google Scholar 

  12. Gomez, F., Segami, C., and Delaune, C. A system for the semiautomatic generation of E-R models from natural language specifications. In Data and knowledge engineering 29 (1999) 57–81

    Article  Google Scholar 

  13. Lejk, M. and Deeks, D. An introduction to systems analysis techniques. Prentice Hall Europe, Hemel Hempstead (1998)

    Google Scholar 

  14. Lyons, J. Semantics. Cambridge University Press, Cambridge (1977)

    Google Scholar 

  15. Rock-Evans, R. A simple introduction to data and activity analysis. Computer Weekly, Sutton (1989)

    Google Scholar 

  16. Saeki, M., Horai, H., and Enomoto, H. Software development process from natural language specification. In Communications of the ACM (1989) 64–73

    Google Scholar 

  17. Sager, J. C., Dungworth, D., and McDonald, P. F. English special languages, principles and practice in science and technology. Oscar Brandstetter Verlag KG (1980)

    Google Scholar 

  18. Sager, J. C. A practical course in terminology processing. John Benjamins Publishing Co., Amsterdam/Philadelphia (1990)

    Google Scholar 

  19. Silberschatz, A., Korth, H. and Sudershan, S. Database system concepts. 3rd edition. McGraw-Hill, Singapore (1997)

    Google Scholar 

  20. Sinclair, J. M. Corpus, concordance, collocation. Oxford University Press, Oxford (1991)

    Google Scholar 

  21. Sinclair, J. M. The automatic analysis of corpora. In Svartvik, J. (Ed.) Directions in corpus linguistics: proceedings of Nobel Symposium 82. Stockholm 1991. Series: Trends in linguistics studies and monographs 65. Mouton de Gruyter, Berlin and New York (1992) 379–397

    Google Scholar 

  22. Tjoa, A. M. and Berger, L. Transformation of requirement specifications expressed in natural language into an EER model. In Proceedings of the 12th Entity-Relationship Approach-ER.93 Conference. Lecture notes in Computer Science, Vol. 823 (1994) 206–217

    Google Scholar 

  23. Yang, H. F. (1986) A new technique for identifying scientific/technical terms and describing science texts. In Literary and Linguistic Computing 1. No. 2. (1986) 93–103

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fulford, H. (2001). Developing Document Analysis and Data Extraction Tools for Entity Modelling. In: Bouzeghoub, M., Kedad, Z., MĂ©tais, E. (eds) Natural Language Processing and Information Systems. NLDB 2000. Lecture Notes in Computer Science, vol 1959. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45399-7_22

Download citation

  • DOI: https://doi.org/10.1007/3-540-45399-7_22

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41943-3

  • Online ISBN: 978-3-540-45399-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics