Skip to main content

An Empirical Bayesian Method for Detecting Out of Context Words

  • Conference paper
Text, Speech and Dialogue (TSD 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5246))

Included in the following conference series:

Abstract

In this paper, we propose an empirical Bayesian method for determining whether a word is used out of context. We suggest we can treat a word’s context as a multinomially distributed random variable, and this leads us to a simple and direct Bayesian hypothesis test for the problem in question. We demonstrate this method to be superior to a method based upon common practice in the literature. We also demonstrate how an empirical Bayes method, whereby we use the behaviour of other words to specify a prior distribution on model parameters, improves performance by an appreciable amount where training data is sparse.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Pustejovsky, J.: The Generative Lexicon. Comput. Linguist. 17, 409–441 (1991)

    Google Scholar 

  2. Hirst, G., St-Onge, D.: Lexical chains as representation of context for the detection and correction malapropisms (1997)

    Google Scholar 

  3. Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1999)

    MATH  Google Scholar 

  4. Jarmasz, M., Szpakowicz, S.: Roget’s thesaurus and semantic similarity. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-2003), pp. 212–219 (2003)

    Google Scholar 

  5. Lee, L.J.: Similarity-based approaches to natural language processing. Ph.D. thesis, Cambridge, MA, USA (1997)

    Google Scholar 

  6. Lee, L., Pereira, F.: Distributional similarity models: clustering vs. nearest neighbors. In: Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 33–40. Association for Computational Linguistics (1999)

    Google Scholar 

  7. Padó, S., Lapata, M.: Dependency-based construction of semantic space models. Comput. Linguist. 33, 161–199 (2007)

    Article  Google Scholar 

  8. Landauer, T.K., Foltz, P.W., Laham, D.: Introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)

    Article  Google Scholar 

  9. Schütze, H.: Automatic word sense discrimination. Comput. Linguist. 24, 97–123 (1998)

    Google Scholar 

  10. Gliozzo, A.M.: Semantic Domains in Computational Linguistics. Ph.D. thesis (2005)

    Google Scholar 

  11. Minka, T.: Estimating a dirichlet distribution. Technical report, Microsoft Research (2000)

    Google Scholar 

  12. Procter, P.: Longman’s Dictionary of Contemporary English. Longman Group Limited (1978)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Petr Sojka Aleš Horák Ivan Kopeček Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jabbari, S., Allison, B., Guthrie, L. (2008). An Empirical Bayesian Method for Detecting Out of Context Words. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2008. Lecture Notes in Computer Science(), vol 5246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87391-4_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87391-4_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87390-7

  • Online ISBN: 978-3-540-87391-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics