Skip to main content

Concordancing Revised or How to Aid the Recognition of New Senses in Very Large Corpora

  • Conference paper
  • First Online:
  • 910 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1835))

Abstract

This paper describes the application of a framework for text analysis to the problem of distinguishing unusual or non-standard usage of words in large corpora. The need to identify such novel uses, and augment machine-readable dictionaries is a constant battle for professional lexicographers that need to update their resources in order to keep up with the development of the dynamic and evolving aspects of human language. Of equal importance is the need to devise automatic means upon which we can evaluate to what extent a (defining) dictionary accounts for what we find in corpus data. A combination of both semi-, and automatic means have been explored, and it seems that Machine Learning might be a plausible solution towards the stated goals.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. All’en, S.:The Lemma-Lexeme Model of the Swedish Lexical Database. In Rieger B. (ed): Empirical Semantics. Bochum(1981) 376–387

    Google Scholar 

  2. Atkins, B. T.: Semantic ID Tags: Corpus Evidence for Dictionary Senses. Proceedings of the 3rd OED. Waterloo, Canada (1987)

    Google Scholar 

  3. Clear, J.: I Can See the Sense in a Large Corpus. In Kiefer, F., Kiss, G., Pajzs J. (eds.): Papers in Computational Lexicography, COMPLEX’ 94. Budapest (1994)33–45

    Google Scholar 

  4. Daelemans, W., Zavrel, J., van der Sloot, K.: TiMBL: Tilburg Memory Based Learner, version 2. ILK Technical Report 99-01. Paper available from http://ilk.kub.nl/~ilk/papers/ilk9901.ps.gz (1999)

  5. Dorr, B., Jones D.: Role of Word Sense Disambiguation in Lexical Acquisition: Predicting Semantics from Syntactic Cues. Proceedings of the 16th COLING. Vol. 1. Copenhagen, Denmark (1996)322–327

    Google Scholar 

  6. Hanks, P.: Contextual Dependency and Lexical Sets. Journal of Corpus Linguistics. Benjamins 1(1) (1996) 75–98

    Google Scholar 

  7. Kilgarrif, A.:Generative Lexicon Meets Corpus Data: the Case of Non-Standard Word Uses. In Bouillon P., Busa F. (eds)Word Meaning and Creativity. Cambridge UP(2000)

    Google Scholar 

  8. Kilgarriff, A., Palmer, M.:Introduction to the Special Issue on SENSEVAL. International Journal of Computer and the Humanities. Special Issue on SENSEVAL. Kluwer Academic Publishers (2000)

    Google Scholar 

  9. Kokkinakis, D.: AVENTINUS, GATE and Swedish Lingware. In Proceedings of the llth NODALIDA Conference (Nordisk Datalingvistik). Copenhagen, Denmark (1998) 22–33

    Google Scholar 

  10. Kokkinakis, D. and Johansson-Kokkinakis, S.: Sense Tagging at the Cycle-Level Using GLDB. In Proceedings of the NFL Symposium (Nordic Association of Lexicography). Gothenburg, Sweden (1999). Paper available from: http://svenska.gu.se/~svedk/publics/nfl.pdf

  11. Kokkinakis, D. and Johansson-Kokkinakis, S.:A Cascaded Finite-State Parser for Syntactic Analysis of Swedish. In Proceedings of the 9th EACL. Bergen, Norway (1999b). Paper available from:http://svenska.gu.se/~svedk/publics/eaclKokk.ps

  12. Kokkinakis, D., Toporowska-Gronostaj, M. and Warmenius, K.: Annotating, Disambiguating & Automatically Extending the Coverage of the Swedish SIMPLE Lexicon. In proceedings of the 2nd LREC. Athens, Hellas (2000)

    Google Scholar 

  13. Krovetz, R.: Learning to Augment a Machine-Readable Dictionary. In Proceedings of the EURALEX’ 94. Amsterdam, Holland (1994) 107–116

    Google Scholar 

  14. Leacock, C., Towell, G., Voorhees, E.M.: Towards Buidling Contextual Representations of Word Senses Using Statistical Models. Boguraev, B., Pustejovsky, J. (eds.): Corpus Processing for Lexical Acquisition. Bradford (1996) 98–113

    Google Scholar 

  15. Levin, B. English Verb Classes and Alternations: a Preliminary Investigation. UCP (1993)

    Google Scholar 

  16. Malmgren, S.G.: From Svenska ordbok (‘A dictionary of Swedish’) to Nationalencyklopediensordbok (‘The Dictionary of the National Encyclopedia’). In Tommola H., Varantola K., Salmi-Tolonen T., Schopp, J. (eds.) Proceedings of the EURALEX’ 92, Vol. 2. Tampere, Finland (1992) 485–491

    Google Scholar 

  17. Miller, G.A. (ed.): WordNet: An on-line Lexical Database. International Journal of Lexicography Special Issue 3(4) (1990)

    Google Scholar 

  18. Mitchell, T.M.:Machine Learning. McGraw-Hill Series on Computer Science (1997)

    Google Scholar 

  19. Renouf, A.:A Word in Time: First Findings from the Investigation of Dynamic Text. Aarts, J., de Haan, P., Oostdijk, N.(eds.):English Language Corpora: Design, Analysis and Exploitation. Rodopi (1993)

    Google Scholar 

  20. Wilks, Y.: Frames, Semantics and Novelty. In Metzing, D (ed): Frame Conceptions and Text Understanding, de Gruyter (1980) 134–163

    Google Scholar 

  21. Wilks, Y., Slator B. and Guthrie L.: Electric Words, Dictionaries, Computers, and Meanings. MIT (1996)

    Google Scholar 

  22. Yarowsky, D.: Unsupervised Word Sense Disambiguation Rivaling Supervised Methods.In Proceedings of the 33rd ACL. Cambridge, MA (1995) 189–196

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kokkinakis, D. (2000). Concordancing Revised or How to Aid the Recognition of New Senses in Very Large Corpora. In: Christodoulakis, D.N. (eds) Natural Language Processing — NLP 2000. NLP 2000. Lecture Notes in Computer Science(), vol 1835. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45154-4_34

Download citation

  • DOI: https://doi.org/10.1007/3-540-45154-4_34

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67605-8

  • Online ISBN: 978-3-540-45154-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics