Concordancing Revised or How to Aid the Recognition of New Senses in Very Large Corpora

Kokkinakis, Dimitrios

doi:10.1007/3-540-45154-4_34

Concordancing Revised or How to Aid the Recognition of New Senses in Very Large Corpora

Dimitrios Kokkinakis²

Conference paper
First Online: 01 January 2000

910 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1835))

Abstract

This paper describes the application of a framework for text analysis to the problem of distinguishing unusual or non-standard usage of words in large corpora. The need to identify such novel uses, and augment machine-readable dictionaries is a constant battle for professional lexicographers that need to update their resources in order to keep up with the development of the dynamic and evolving aspects of human language. Of equal importance is the need to devise automatic means upon which we can evaluate to what extent a (defining) dictionary accounts for what we find in corpus data. A combination of both semi-, and automatic means have been explored, and it seems that Machine Learning might be a plausible solution towards the stated goals.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

All’en, S.:The Lemma-Lexeme Model of the Swedish Lexical Database. In Rieger B. (ed): Empirical Semantics. Bochum(1981) 376–387
Google Scholar
Atkins, B. T.: Semantic ID Tags: Corpus Evidence for Dictionary Senses. Proceedings of the 3rd OED. Waterloo, Canada (1987)
Google Scholar
Clear, J.: I Can See the Sense in a Large Corpus. In Kiefer, F., Kiss, G., Pajzs J. (eds.): Papers in Computational Lexicography, COMPLEX’ 94. Budapest (1994)33–45
Google Scholar
Daelemans, W., Zavrel, J., van der Sloot, K.: TiMBL: Tilburg Memory Based Learner, version 2. ILK Technical Report 99-01. Paper available from http://ilk.kub.nl/~ilk/papers/ilk9901.ps.gz (1999)
Dorr, B., Jones D.: Role of Word Sense Disambiguation in Lexical Acquisition: Predicting Semantics from Syntactic Cues. Proceedings of the 16th COLING. Vol. 1. Copenhagen, Denmark (1996)322–327
Google Scholar
Hanks, P.: Contextual Dependency and Lexical Sets. Journal of Corpus Linguistics. Benjamins 1(1) (1996) 75–98
Google Scholar
Kilgarrif, A.:Generative Lexicon Meets Corpus Data: the Case of Non-Standard Word Uses. In Bouillon P., Busa F. (eds)Word Meaning and Creativity. Cambridge UP(2000)
Google Scholar
Kilgarriff, A., Palmer, M.:Introduction to the Special Issue on SENSEVAL. International Journal of Computer and the Humanities. Special Issue on SENSEVAL. Kluwer Academic Publishers (2000)
Google Scholar
Kokkinakis, D.: AVENTINUS, GATE and Swedish Lingware. In Proceedings of the ll^th NODALIDA Conference (Nordisk Datalingvistik). Copenhagen, Denmark (1998) 22–33
Google Scholar
Kokkinakis, D. and Johansson-Kokkinakis, S.: Sense Tagging at the Cycle-Level Using GLDB. In Proceedings of the NFL Symposium (Nordic Association of Lexicography). Gothenburg, Sweden (1999). Paper available from: http://svenska.gu.se/~svedk/publics/nfl.pdf
Kokkinakis, D. and Johansson-Kokkinakis, S.:A Cascaded Finite-State Parser for Syntactic Analysis of Swedish. In Proceedings of the 9th EACL. Bergen, Norway (1999b). Paper available from:http://svenska.gu.se/~svedk/publics/eaclKokk.ps
Kokkinakis, D., Toporowska-Gronostaj, M. and Warmenius, K.: Annotating, Disambiguating & Automatically Extending the Coverage of the Swedish SIMPLE Lexicon. In proceedings of the 2^nd LREC. Athens, Hellas (2000)
Google Scholar
Krovetz, R.: Learning to Augment a Machine-Readable Dictionary. In Proceedings of the EURALEX’ 94. Amsterdam, Holland (1994) 107–116
Google Scholar
Leacock, C., Towell, G., Voorhees, E.M.: Towards Buidling Contextual Representations of Word Senses Using Statistical Models. Boguraev, B., Pustejovsky, J. (eds.): Corpus Processing for Lexical Acquisition. Bradford (1996) 98–113
Google Scholar
Levin, B. English Verb Classes and Alternations: a Preliminary Investigation. UCP (1993)
Google Scholar
Malmgren, S.G.: From Svenska ordbok (‘A dictionary of Swedish’) to Nationalencyklopediensordbok (‘The Dictionary of the National Encyclopedia’). In Tommola H., Varantola K., Salmi-Tolonen T., Schopp, J. (eds.) Proceedings of the EURALEX’ 92, Vol. 2. Tampere, Finland (1992) 485–491
Google Scholar
Miller, G.A. (ed.): WordNet: An on-line Lexical Database. International Journal of Lexicography Special Issue 3(4) (1990)
Google Scholar
Mitchell, T.M.:Machine Learning. McGraw-Hill Series on Computer Science (1997)
Google Scholar
Renouf, A.:A Word in Time: First Findings from the Investigation of Dynamic Text. Aarts, J., de Haan, P., Oostdijk, N.(eds.):English Language Corpora: Design, Analysis and Exploitation. Rodopi (1993)
Google Scholar
Wilks, Y.: Frames, Semantics and Novelty. In Metzing, D (ed): Frame Conceptions and Text Understanding, de Gruyter (1980) 134–163
Google Scholar
Wilks, Y., Slator B. and Guthrie L.: Electric Words, Dictionaries, Computers, and Meanings. MIT (1996)
Google Scholar
Yarowsky, D.: Unsupervised Word Sense Disambiguation Rivaling Supervised Methods.In Proceedings of the 33^rd ACL. Cambridge, MA (1995) 189–196
Google Scholar

Download references

Author information

Authors and Affiliations

Sprakdata/Department of Swedish Language, Göteborg University, Box 200, SE-405 30, Sweden
Dimitrios Kokkinakis

Authors

Dimitrios Kokkinakis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Engineering Department and Computer Technology Institute, University of Patras, 26500, Patras, Greece
Dimitris N. Christodoulakis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kokkinakis, D. (2000). Concordancing Revised or How to Aid the Recognition of New Senses in Very Large Corpora. In: Christodoulakis, D.N. (eds) Natural Language Processing — NLP 2000. NLP 2000. Lecture Notes in Computer Science(), vol 1835. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45154-4_34

Download citation

DOI: https://doi.org/10.1007/3-540-45154-4_34
Published: 25 May 2000
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67605-8
Online ISBN: 978-3-540-45154-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics