Abstract
This paper describes the application of a framework for text analysis to the problem of distinguishing unusual or non-standard usage of words in large corpora. The need to identify such novel uses, and augment machine-readable dictionaries is a constant battle for professional lexicographers that need to update their resources in order to keep up with the development of the dynamic and evolving aspects of human language. Of equal importance is the need to devise automatic means upon which we can evaluate to what extent a (defining) dictionary accounts for what we find in corpus data. A combination of both semi-, and automatic means have been explored, and it seems that Machine Learning might be a plausible solution towards the stated goals.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
All’en, S.:The Lemma-Lexeme Model of the Swedish Lexical Database. In Rieger B. (ed): Empirical Semantics. Bochum(1981) 376–387
Atkins, B. T.: Semantic ID Tags: Corpus Evidence for Dictionary Senses. Proceedings of the 3rd OED. Waterloo, Canada (1987)
Clear, J.: I Can See the Sense in a Large Corpus. In Kiefer, F., Kiss, G., Pajzs J. (eds.): Papers in Computational Lexicography, COMPLEX’ 94. Budapest (1994)33–45
Daelemans, W., Zavrel, J., van der Sloot, K.: TiMBL: Tilburg Memory Based Learner, version 2. ILK Technical Report 99-01. Paper available from http://ilk.kub.nl/~ilk/papers/ilk9901.ps.gz (1999)
Dorr, B., Jones D.: Role of Word Sense Disambiguation in Lexical Acquisition: Predicting Semantics from Syntactic Cues. Proceedings of the 16th COLING. Vol. 1. Copenhagen, Denmark (1996)322–327
Hanks, P.: Contextual Dependency and Lexical Sets. Journal of Corpus Linguistics. Benjamins 1(1) (1996) 75–98
Kilgarrif, A.:Generative Lexicon Meets Corpus Data: the Case of Non-Standard Word Uses. In Bouillon P., Busa F. (eds)Word Meaning and Creativity. Cambridge UP(2000)
Kilgarriff, A., Palmer, M.:Introduction to the Special Issue on SENSEVAL. International Journal of Computer and the Humanities. Special Issue on SENSEVAL. Kluwer Academic Publishers (2000)
Kokkinakis, D.: AVENTINUS, GATE and Swedish Lingware. In Proceedings of the llth NODALIDA Conference (Nordisk Datalingvistik). Copenhagen, Denmark (1998) 22–33
Kokkinakis, D. and Johansson-Kokkinakis, S.: Sense Tagging at the Cycle-Level Using GLDB. In Proceedings of the NFL Symposium (Nordic Association of Lexicography). Gothenburg, Sweden (1999). Paper available from: http://svenska.gu.se/~svedk/publics/nfl.pdf
Kokkinakis, D. and Johansson-Kokkinakis, S.:A Cascaded Finite-State Parser for Syntactic Analysis of Swedish. In Proceedings of the 9th EACL. Bergen, Norway (1999b). Paper available from:http://svenska.gu.se/~svedk/publics/eaclKokk.ps
Kokkinakis, D., Toporowska-Gronostaj, M. and Warmenius, K.: Annotating, Disambiguating & Automatically Extending the Coverage of the Swedish SIMPLE Lexicon. In proceedings of the 2nd LREC. Athens, Hellas (2000)
Krovetz, R.: Learning to Augment a Machine-Readable Dictionary. In Proceedings of the EURALEX’ 94. Amsterdam, Holland (1994) 107–116
Leacock, C., Towell, G., Voorhees, E.M.: Towards Buidling Contextual Representations of Word Senses Using Statistical Models. Boguraev, B., Pustejovsky, J. (eds.): Corpus Processing for Lexical Acquisition. Bradford (1996) 98–113
Levin, B. English Verb Classes and Alternations: a Preliminary Investigation. UCP (1993)
Malmgren, S.G.: From Svenska ordbok (‘A dictionary of Swedish’) to Nationalencyklopediensordbok (‘The Dictionary of the National Encyclopedia’). In Tommola H., Varantola K., Salmi-Tolonen T., Schopp, J. (eds.) Proceedings of the EURALEX’ 92, Vol. 2. Tampere, Finland (1992) 485–491
Miller, G.A. (ed.): WordNet: An on-line Lexical Database. International Journal of Lexicography Special Issue 3(4) (1990)
Mitchell, T.M.:Machine Learning. McGraw-Hill Series on Computer Science (1997)
Renouf, A.:A Word in Time: First Findings from the Investigation of Dynamic Text. Aarts, J., de Haan, P., Oostdijk, N.(eds.):English Language Corpora: Design, Analysis and Exploitation. Rodopi (1993)
Wilks, Y.: Frames, Semantics and Novelty. In Metzing, D (ed): Frame Conceptions and Text Understanding, de Gruyter (1980) 134–163
Wilks, Y., Slator B. and Guthrie L.: Electric Words, Dictionaries, Computers, and Meanings. MIT (1996)
Yarowsky, D.: Unsupervised Word Sense Disambiguation Rivaling Supervised Methods.In Proceedings of the 33rd ACL. Cambridge, MA (1995) 189–196
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kokkinakis, D. (2000). Concordancing Revised or How to Aid the Recognition of New Senses in Very Large Corpora. In: Christodoulakis, D.N. (eds) Natural Language Processing — NLP 2000. NLP 2000. Lecture Notes in Computer Science(), vol 1835. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45154-4_34
Download citation
DOI: https://doi.org/10.1007/3-540-45154-4_34
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67605-8
Online ISBN: 978-3-540-45154-9
eBook Packages: Springer Book Archive