Skip to main content

The Design and Implementation of an Electronic Lexical Knowledge Base

  • Conference paper
  • First Online:
Advances in Artificial Intelligence (Canadian AI 2001)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2056))

  • 1182 Accesses

Abstract

Thesauri have always been a useful resource for natural language processing. WordNet, a kind of thesaurus, has proven invaluable in computational linguistics. We present the various applications of Roget’s Thesaurus in this field and discuss the advantages of its structure. We evaluate the merits of the 1987 edition of Penguin’s Roget’s Thesaurus of English Words and Phrases as an NLP resource: we design and implement an electronic lexical knowledge base with its material. An extensive qualitative and quantitative comparison of Roget’s and WordNet has been performed, and the ontologies as well as the semantic relations of both thesauri contrasted. We discuss the design in Java of the lexical knowledge base, and its potential applications. We also propose a framework for measuring similarity between concepts and annotating Roget’s semantic links with WordNet labels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Cassidy, P. (1996). “Modified Roget Available”, http://www.hit.uib.no/corpora/1996-2 /0042.html, May 28

  2. Fellbaum, C. (ed.) (1998a). WordNet: An Electronic Lexical Database. Cambridge: MIT Press.

    Google Scholar 

  3. Fellbaum, C. (1998b). “Towards a Representation of Idioms in WordNet”, In Harabagiu (1998), 52–57.

    Google Scholar 

  4. Harabagiu, S. (ed.) (1998). Proc COLING/ACL Workshop on Usage of WordNet in Natural Language Processing Systems. Montreal, Canada, August.

    Google Scholar 

  5. Hart, M. (1991). Project Gutenberg Official Home Site. http://www.gutenberg.net/

  6. Hirst, G. and St-Onge, D. (1998). “Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms”. In Fellbaum (1998a), 305–332.

    Google Scholar 

  7. Ide, N. and Véronis, J. (1998). “Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art. ” Computational Linguistics. Special Issue on Word Sense Disambiguation, 24(1), 1–40.

    Google Scholar 

  8. Jarmasz, M. and Szpakowicz, S. (2001). Roget’s Thesaurus as an Electronic Lexical Knowledge Base. In W. Gruszczynski and D. Kopcinska (eds.) “NIE BEZ ZNACZENIA. Prace ofiarowane Profesorowi Zygmuntowi Saloniemu z okazji 40-lecia pracy naukowej”, Bialystok (to appear).

    Google Scholar 

  9. Kirkpatrick, B. (1998). Roget’s Thesaurus of English Words and Phrases. Harmondsworth, Middlesex, England: Penguin.

    Google Scholar 

  10. Kwong, O. (1998). “Aligning WordNet with Additional Lexical Resources”. In Harabagiu (1998), 73–79.

    Google Scholar 

  11. Mc Hale, M. (1998). “A Comparison of WordNet and Roget's Taxonomy for Measuring Semantic Similarity.”, In Harabagiu (1998), 115–120.

    Google Scholar 

  12. Mandala, R., Tokunaga, T. and Tanaka, H. (1999). “Complementing WordNet with Roget and Corpus-based Automatically Constructed Thesauri for Information Retrieval” Proc Ninth Conference of the European Chapter of the Association for Computational Linguistics, Bergen, 94–101.

    Google Scholar 

  13. Masterman, M. (1957). “The Thesaurus in Syntax and Semantics”,. Mechanical Translation, 4(1-2), 35–43.

    Google Scholar 

  14. Miller, G. (1998). “Nouns in WordNet”. In Fellbaum (1998a), 23–46.

    Google Scholar 

  15. Miller, G., Beckwith, R., Fellbaum, C., Gross, D. and Miller, K. (1990). “Introduction to WordNet: an on-line lexical database. ”,. International Journal of Lexicography, 3(4), 235-244.

    Google Scholar 

  16. Morris, J. and Hirst, G. (1991). “Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of Text.” Computational Linguistics, 17:1, 21–48.

    Google Scholar 

  17. NECI Scientific Literature Digital Library, http://citeseer.nj.nec.com/cs

  18. Procter, P. (1978). Longman Dictionary of Contemporary English. Harlow, Essex, England: Longman Group Ltd.

    Google Scholar 

  19. Roget, P. (1852). Roget’s Thesaurus of English Words and Phrases. Harlow, Essex, England: Longman Group Ltd.

    Google Scholar 

  20. Sedelow, S. and Sedelow, W. (1992). “Recent Model-based and Model-related Studies of a Large-scale Lexical Resource [Roget’s Thesaurus]”. Proc 14th International Conference on Computational Linguistics (COLING-92). Nantes, France, August, 1223–1227.

    Google Scholar 

  21. Sparck Jones, K. (1964). Synonymy and Semantic Classification.Ph.D. thesis, University of Cambridge, Cambridge, England.

    Google Scholar 

  22. Vossen, Piek (1998). EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Kluwer Academic Publishers, Dordrecht

    Google Scholar 

  23. Wilks, Y., Slator, B. and Guthrie, L. (1996). Electric Words: Dictionaries, Computers, and Meanings. Cambridge: The MIT Press.

    Google Scholar 

  24. Wilks, Y. (1998). “Language processing and the thesaurus”. Proc National Language Research Institute. Tokyo, Japan.

    Google Scholar 

  25. Yarowsky, D. (1992). “Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora”, Proc 14th International Conference on Computational Linguistics (COLING-92). Nantes, France, August, 454–460.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jarmasz, M., Szpakowicz, S. (2001). The Design and Implementation of an Electronic Lexical Knowledge Base. In: Stroulia, E., Matwin, S. (eds) Advances in Artificial Intelligence. Canadian AI 2001. Lecture Notes in Computer Science(), vol 2056. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45153-6_32

Download citation

  • DOI: https://doi.org/10.1007/3-540-45153-6_32

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42144-3

  • Online ISBN: 978-3-540-45153-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics