Skip to main content

Corpus Linguistics for establishing the natural language content of Digital Library documents

  • Classification and Indexing
  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 916))

Abstract

The methods of corpus linguistics can reveal a great deal of information about word use and language structure by careful processing of very large corpora. This information can be used for adding organizational structure to digital libraries both in terms of individual document content and inter-document relations. The structure discovered by corpus linguistics methods reflects the actual use of words and language style in particular domains and genres, rather than being constrained by pre-built categories. The data presented here has demonstrated the power of simple word classification methods for discovering semantically related word clusters. Work in progress based on the new balanced entropy principle overcomes a number of limitations of current classification methods and should discover more detailed and accurate information about word relations and text structure.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Author information

Authors and Affiliations

Authors

Editor information

Nabil R. Adam Bharat K. Bhargava Yelena Yesha

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Futrelle, R.P., Zhang, X., Sekiya, Y. (1995). Corpus Linguistics for establishing the natural language content of Digital Library documents. In: Adam, N.R., Bhargava, B.K., Yesha, Y. (eds) Digital Libraries Current Issues. DL 1994. Lecture Notes in Computer Science, vol 916. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0026855

Download citation

  • DOI: https://doi.org/10.1007/BFb0026855

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-59282-2

  • Online ISBN: 978-3-540-49230-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics