Word Sense Disambiguation of Farsi Homographs Using Thesaurus and Corpus

Makki, Raheleh; Homayounpour, Mohammad Mehdi

doi:10.1007/978-3-540-85287-2_30

Raheleh Makki² &
Mohammad Mehdi Homayounpour²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5221))

Included in the following conference series:

International Conference on Natural Language Processing

1486 Accesses
3 Citations

Abstract

This paper describes disambiguation of Farsi homographs in unrestricted text using thesaurus and corpus. The proposed method is based on [1] with some differences. These differences consist of first using collocational information to avoid the collection of spurious contexts caused by polysemous words in thesaurus categories, and second contribution of all words in the test data context, even those not appeared in the collected contexts to the calculation of the conceptual classes’ score. Using a Farsi corpus and a Farsi thesaurus, this method correctly disambiguated 91.46% of the instances of 15 Farsi homographs. This method was compared to three supervised corpus based methods including Naïve Bayes, Exemplar-based, and Decision List. Unlike supervised methods, this method needs no training data, and has a good performance on disambiguation of uncommon words. In addition, this method can be used for removing some kinds of morphological ambiguities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Yarowsky, D.: Word-sense disambiguation using statistical models of Roget’s categories trained on large corpora. In: 15th [sic] International Conference on Computational Linguistics (Coling), Nantes, pp. 454–460 (1992)
Google Scholar
Ide, N., Veronis, J.: Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art. Computational Linguistics 24(1), 1–40 (1998)
Google Scholar
Escudero, G., Marquez, L., Rigau, G.: Naïve Bayes and Exemplar-Based Approaches to Word Sense Disambiguation Revisited. In: 14th European Conference on Artificial Intelligence, ECAI, Berlin, Germany (2000)
Google Scholar
Gausted, T.: Linguistic Knowledge and Word Sense Disambiguation, PhD dissertation, Groningen University (2004)
Google Scholar
Gale, B., Church, K., Yarowsky, D.: A method for disambiguating word senses in a corpus. Computers and the Humanities 26, 415–439 (1992)
Article Google Scholar
Bijankhan, M.: Farsi text corpus, Research Center of Intelligent Signal Processing of Iran (RCISP), http://www.rcisp.com
Fararooy, J.: thesaurus and Electronic transfer of Persian language content. In: 2nd workshop on Persian language and computer, Tehran, Iran (2004)
Google Scholar
Fararooy, J.: Thesaurus of Persian Words and Phrases (1999)
Google Scholar
Manning, C.D., Raghavan, P., Schutze, H.: An Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Book MATH Google Scholar
Gale, B., Church, K., Yarowsky, D.: Estimating upper and lower bounds on the performance of word-sense disambiguation programs. In: 30th Annual Meeting of the Association for Computational Linguistics, Newark, pp. 249–256 (1992)
Google Scholar
Ng, H.T.: Exemplar-Base Word Sense Disambiguation: Some Recent Improvements. In: 2nd Conference on Empirical Methods in Natural Language Processing, EMNLP 1997 (1997)
Google Scholar
Yarowsky, D.: Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French. In: 32th Annual Meeting of the Association for Computational Linguistics, Las Cruces (1994)
Google Scholar
Ng, H.T., Lee, H.B.: Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-based Approach. In: 34th Annual Meeting of the Association for Computational Linguistics, pp. 40–47. N.J. Association for Computational Linguistics, Somerset (1996)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory for Intelligent Sound and Speech Processing, Department of Computer Engineering and Information Technology, Amirkabir University of Technology, Tehran, Iran
Raheleh Makki & Mohammad Mehdi Homayounpour

Authors

Raheleh Makki
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Mehdi Homayounpour
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Chalmers University of Technology, 41296, Göteborg, Sweden
Bengt Nordström & Aarne Ranta &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Makki, R., Homayounpour, M.M. (2008). Word Sense Disambiguation of Farsi Homographs Using Thesaurus and Corpus. In: Nordström, B., Ranta, A. (eds) Advances in Natural Language Processing. GoTAL 2008. Lecture Notes in Computer Science(), vol 5221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85287-2_30

Download citation

DOI: https://doi.org/10.1007/978-3-540-85287-2_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85286-5
Online ISBN: 978-3-540-85287-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics