Abstract
Thesauri have always been a useful resource for natural language processing. WordNet, a kind of thesaurus, has proven invaluable in computational linguistics. We present the various applications of Roget’s Thesaurus in this field and discuss the advantages of its structure. We evaluate the merits of the 1987 edition of Penguin’s Roget’s Thesaurus of English Words and Phrases as an NLP resource: we design and implement an electronic lexical knowledge base with its material. An extensive qualitative and quantitative comparison of Roget’s and WordNet has been performed, and the ontologies as well as the semantic relations of both thesauri contrasted. We discuss the design in Java of the lexical knowledge base, and its potential applications. We also propose a framework for measuring similarity between concepts and annotating Roget’s semantic links with WordNet labels.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Cassidy, P. (1996). “Modified Roget Available”, http://www.hit.uib.no/corpora/1996-2 /0042.html, May 28
Fellbaum, C. (ed.) (1998a). WordNet: An Electronic Lexical Database. Cambridge: MIT Press.
Fellbaum, C. (1998b). “Towards a Representation of Idioms in WordNet”, In Harabagiu (1998), 52–57.
Harabagiu, S. (ed.) (1998). Proc COLING/ACL Workshop on Usage of WordNet in Natural Language Processing Systems. Montreal, Canada, August.
Hart, M. (1991). Project Gutenberg Official Home Site. http://www.gutenberg.net/
Hirst, G. and St-Onge, D. (1998). “Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms”. In Fellbaum (1998a), 305–332.
Ide, N. and Véronis, J. (1998). “Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art. ” Computational Linguistics. Special Issue on Word Sense Disambiguation, 24(1), 1–40.
Jarmasz, M. and Szpakowicz, S. (2001). Roget’s Thesaurus as an Electronic Lexical Knowledge Base. In W. Gruszczynski and D. Kopcinska (eds.) “NIE BEZ ZNACZENIA. Prace ofiarowane Profesorowi Zygmuntowi Saloniemu z okazji 40-lecia pracy naukowej”, Bialystok (to appear).
Kirkpatrick, B. (1998). Roget’s Thesaurus of English Words and Phrases. Harmondsworth, Middlesex, England: Penguin.
Kwong, O. (1998). “Aligning WordNet with Additional Lexical Resources”. In Harabagiu (1998), 73–79.
Mc Hale, M. (1998). “A Comparison of WordNet and Roget's Taxonomy for Measuring Semantic Similarity.”, In Harabagiu (1998), 115–120.
Mandala, R., Tokunaga, T. and Tanaka, H. (1999). “Complementing WordNet with Roget and Corpus-based Automatically Constructed Thesauri for Information Retrieval” Proc Ninth Conference of the European Chapter of the Association for Computational Linguistics, Bergen, 94–101.
Masterman, M. (1957). “The Thesaurus in Syntax and Semantics”,. Mechanical Translation, 4(1-2), 35–43.
Miller, G. (1998). “Nouns in WordNet”. In Fellbaum (1998a), 23–46.
Miller, G., Beckwith, R., Fellbaum, C., Gross, D. and Miller, K. (1990). “Introduction to WordNet: an on-line lexical database. ”,. International Journal of Lexicography, 3(4), 235-244.
Morris, J. and Hirst, G. (1991). “Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of Text.” Computational Linguistics, 17:1, 21–48.
NECI Scientific Literature Digital Library, http://citeseer.nj.nec.com/cs
Procter, P. (1978). Longman Dictionary of Contemporary English. Harlow, Essex, England: Longman Group Ltd.
Roget, P. (1852). Roget’s Thesaurus of English Words and Phrases. Harlow, Essex, England: Longman Group Ltd.
Sedelow, S. and Sedelow, W. (1992). “Recent Model-based and Model-related Studies of a Large-scale Lexical Resource [Roget’s Thesaurus]”. Proc 14th International Conference on Computational Linguistics (COLING-92). Nantes, France, August, 1223–1227.
Sparck Jones, K. (1964). Synonymy and Semantic Classification.Ph.D. thesis, University of Cambridge, Cambridge, England.
Vossen, Piek (1998). EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Kluwer Academic Publishers, Dordrecht
Wilks, Y., Slator, B. and Guthrie, L. (1996). Electric Words: Dictionaries, Computers, and Meanings. Cambridge: The MIT Press.
Wilks, Y. (1998). “Language processing and the thesaurus”. Proc National Language Research Institute. Tokyo, Japan.
Yarowsky, D. (1992). “Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora”, Proc 14th International Conference on Computational Linguistics (COLING-92). Nantes, France, August, 454–460.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jarmasz, M., Szpakowicz, S. (2001). The Design and Implementation of an Electronic Lexical Knowledge Base. In: Stroulia, E., Matwin, S. (eds) Advances in Artificial Intelligence. Canadian AI 2001. Lecture Notes in Computer Science(), vol 2056. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45153-6_32
Download citation
DOI: https://doi.org/10.1007/3-540-45153-6_32
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42144-3
Online ISBN: 978-3-540-45153-2
eBook Packages: Springer Book Archive