Skip to main content

Finding Semantically Related Words in Large Corpora

  • Conference paper
  • First Online:
Book cover Text, Speech and Dialogue (TSD 2001)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2166))

Included in the following conference series:

Abstract

The paper deals with the linguistic problem of fully automatic grouping of semantically related words. We discuss the measures of semantic relatedness of basic word forms and describe the treatment of collocations. Next we present the procedure of hierarchical clustering of a very large number of semantically related words and give examples of the resulting partitioning of data in the form of dendrogram. Finally we show a form of the output presentation that facilitates the inspection of the resulting word clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Philip Stuart Resnik. Selection and Information: A Class-Based Approach to Lexical Relationships. PhD thesis, University of Pennsylvania, 1993.

    Google Scholar 

  2. Gregory Grefenstette. Explorations in Automatic Thesaurus Discovery. Kluwer Academic Press, 1994.

    Google Scholar 

  3. Steven Paul Finch. Finding Structure in Language. PhD thesis, University of Edinburgh, 1993.

    Google Scholar 

  4. C. D. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge MA, 1999.

    MATH  Google Scholar 

  5. B. Boguraev and J. Pustejovsky, editors. Corpus Processing for Lexical Acquisition. MIT Press, Cambridge MA, 1995.

    Google Scholar 

  6. G. Grefenstette. Evaluation Techniques for Automatic Semantic Extraction: Comparing Syntactic and Window-Based Approaches, pages 205–216. MIT Press, Cambridge MA, 1996.

    Google Scholar 

  7. F. Smajda. Retrieving collocations from text: Xtract. Computational Linguistics, 19:143–177, 1993.

    Google Scholar 

  8. M. P. Oakes. Statistics for Corpus Linguistics. Edinburgh University Press, 1997.

    Google Scholar 

  9. K.W. Church and W. A. Gale. Concordances for parallel text. In Proceedings of the Seventh Annual Conference of the UW Centre for the New OED and Text Research, pages 40–62, 1991.

    Google Scholar 

  10. W. N. Francis and H. Kučera. Brown Corpus Manual. Brown University, Providence, Rhode Island, revised and amplified edition, 1979.

    Google Scholar 

  11. F. R. Palmer. Selected Papers of J. R. Firth 1952-1959. London: Longman, 1968.

    Google Scholar 

  12. K.W. Church and P. Hanks. Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1):22–29, March 1990.

    Google Scholar 

  13. A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Englewood Cliffs, NJ: Prentice Hall, 1988.

    MATH  Google Scholar 

  14. A. K. Jain, R. P. W. Duin, and J. Mao. Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1):4–37, 2000.

    Article  Google Scholar 

  15. D. Titterington, A. Smith, and U. Makov. Statistical Analysis of Finite Mixture Distributions. John Willey and Sons, 1985.

    Google Scholar 

  16. G. A. Miller et al. Five papers on Wordnet. Technical report, 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Smrž, P., Rychlý, P. (2001). Finding Semantically Related Words in Large Corpora. In: Matoušek, V., Mautner, P., Mouček, R., Taušer, K. (eds) Text, Speech and Dialogue. TSD 2001. Lecture Notes in Computer Science(), vol 2166. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44805-5_14

Download citation

  • DOI: https://doi.org/10.1007/3-540-44805-5_14

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42557-1

  • Online ISBN: 978-3-540-44805-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics