Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8003))

  • 1144 Accesses

Abstract

Stemming is useful for various natural language processing tasks, such as document indexing and text classification. Therefore, identification of the correct root of any given word is important. For Hebrew this is not a trivial task, due to the complex nature of Hebrew morphology and its orthography. Many Hebrew words are ambiguous in the sense that each one of them can be created from a few possible roots. However, for a given word in a specific context, each word has only one correct root or no root at all. We have developed a variety of features in order to find the correct root for a Hebrew ambiguous word. These features are classified into 3 distinct groups: root-based features, conjugation-based features and statistical features. Several common machine learning methods have been tested in order to find a successful integration of the features. The best result has been achieved by Naïve Bayes, with about 87% accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abu-Salem, H., Al-Omari, M., Evens, M.W.: Stemming Methodologies over Individual Query Words for an Arabic Information Retrieval System. Journal of the American Society for Information Science 50(6), 524–529 (1999)

    Article  Google Scholar 

  2. Al-Kharashi, I.A., Evens, M.W.: Comparing Words Roots, and Roots as Index Terms in an Arabic Information Retrieval System. Journal of the American Society for Information Science 45(8), 548–560 (2004)

    Article  Google Scholar 

  3. The Academy of the Hebrew Language (2009), http://Hebrew-terms.huji.ac.il/odot.html

  4. Carlson, A.J., Cumby, C.M., Rosen, J.L., Roth, D.: The SNoW Learning Architecture. Technical Report UIUCDCS-R-99-2101, UIUC Computer Science Department (1999)

    Google Scholar 

  5. Choueka, Y.: Full-Text Systems and Research in the Humanities. Computers and the Humanities 14, 153–169 (1980)

    Article  Google Scholar 

  6. Choueka, Y.: Rav Milim, A Comprehensive Dictionary of Modern Hebrew (1997)

    Google Scholar 

  7. Choueka, Y., Conley, E.S., Dagan, I.: A Comprehensive Bilingual Word Alignment System: Application to Disparate Languages – Hebrew and English. In: Veronis, J. (ed.) Parallel Text Processing, pp. 69–96. Kluwer Academic Publishers (2000)

    Google Scholar 

  8. Daya, E., Roth, D., Wintner, S.: Learning Hebrew roots: Machine learning with linguistic constraints. In: Proceedings of EMNLP 2004, pp. 357–364 (2004)

    Google Scholar 

  9. Daya, E., Roth, D., Wintner, S.: Learning to Identify Semitic Roots. In: Abdelhadi, S., Neumann, G., van den Bosch, A. (eds.) Arabic Computational Morphology: Knowledge-based and Empirical Methods. Text, Speech and Language Technology, vol. 38, pp. 143–158. Springer (2007)

    Google Scholar 

  10. Daya, E., Roth, D., Wintner, S.: Learning Hebrew Roots: Machine Learning with Linguistic Constraints. Computational Linguistics 34(3), 429–448 (2008)

    Article  Google Scholar 

  11. Even-Shoshan, A.: HaMillon HaHadash (The New Dictionary), Kiryat Sefer, Jerusalem (1993) (in Hebrew)

    Google Scholar 

  12. Frank, Y.: Dayka Namei: Dikduk for Talmud Bavli and Targum Onqelos, Jerusalem (1996) (in Hebrew)

    Google Scholar 

  13. Fox, B., Fox, C.J.: Efficient Rootmer Generation. Inf. Process. Manage. 38(4), 547–558 (2002)

    Article  MATH  Google Scholar 

  14. Frakes, W.: Stemming Algorithms. In: Frakes, W., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures and Algorithms, pp. 131–161. Prentice-Hall, Englewood Cliffs (1992)

    Google Scholar 

  15. Glinert, L.: Hebrew – An Essential Grammar. Routledge, London (1994)

    Google Scholar 

  16. HaCohen-Kerner, Y., Badlov, A., Filgut, A.: Finding the Correct Root of a Hebrew Word. Technical report of a Graduation Project, Department of Computer Science, Jerusalem College of Technology (2004) (in Hebrew)

    Google Scholar 

  17. HaCohen-Kerner, Y., Beck, H., Yehudai, E., Mughaz, D.: Identifying Historical Period and Ethnic Origin of Documents Using Stylistic Feature Sets. In: Lavrač, N., Todorovski, L., Jantke, K.P. (eds.) DS 2006. LNCS (LNAI), vol. 4265, pp. 102–113. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  18. HaCohen-Kerner, Y., Boger, Z., Beck, H., Yehudai, E.: Classifying Documents’ Authors to their Ethnic Group Using Roots. In: Proceedings of the 20th International Conference on Computer Applications in Industry and Engineering (CAINE 2007), San Francisco, California USA, pp. 5–11 (2007)

    Google Scholar 

  19. HaCohen-Kerner, Y., Mughaz, D., Beck, H., Yehudai, E.: Words as Classifiers of Documents According to their Historical Period and the Ethnic Origin of their Authors. Cybernetics and Systems 39(3), 213–228 (2008)

    Article  MATH  Google Scholar 

  20. HaCohen-Kerner, Y., Kass, A., Peretz, A.: Baseline Methods for Automatic Disambiguation of Abbreviations in Jewish Law Documents. In: Vicedo, J.L., Martínez-Barco, P., Muńoz, R., Saiz Noeda, M. (eds.) EsTAL 2004. LNCS (LNAI), vol. 3230, pp. 58–69. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  21. HaCohen-Kerner, Y., Kass, A., Peretz, A.: Combined one Sense Disambiguation of Abbreviations. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL 2008), Short Papers (Companion Volume), pp. 61–64 (2008)

    Google Scholar 

  22. HaCohen-Kerner, Y., Kass, A., Peretz, A.: Abbreviation Disambiguation: Experiments with Various Variants of the One Sense per Discourse Hypothesis. In: Kapetanios, E., Sugumaran, V., Spiliopoulou, M. (eds.) NLDB 2008. LNCS, vol. 5039, pp. 27–39. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  23. HaCohen-Kerner, Y., Stern, I., Korkus, D., Fredj, E.: Automatic Machine Learning of Keyphrase Extraction from Short Html Documents Written in Hebrew. Cybernetics and Systems 38(1), 1–21 (2007)

    Article  MATH  Google Scholar 

  24. Hebrew Google (2009), http://www.google.co.il

  25. Itai, A., Segal, E.: A Corpus Based Morphological Analyzer for Unvocalized Modern Hebrew. In: Proc. Workshop of Machine Translation for Semitic Languages, New Orleans, USA (2003)

    Google Scholar 

  26. Larkey, L.S., Ballesteros, L.: Connell. M.E.: Improving Stemming for Arabic Information Retrieval: Light Stemming and Co-occurrence Analysis. In: SIGIR 2002: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275–282. ACM Press, New York (2002)

    Chapter  Google Scholar 

  27. Levinger, M.: Morphologic Disambiguation in Hebrew. Master’s Thesis, Technion, Haifa, Israel (1992) (in Hebrew)

    Google Scholar 

  28. Levinger, M., Ornan, U., Itai, A.: Learning Morpho-Lexical Probabilities from an Untagged Corpus with an Application to Hebrew. Computational Linguistics 21(3), 383–404 (1995)

    Google Scholar 

  29. Melamed, E.Z.: Aramaic-Hebrew-English Dictionary. Feldheim, Jerusalem (2005)

    Google Scholar 

  30. Morfix (2009), http://milon.morfix.co.il/

  31. Rav-Milim dictionary (2009), http://www.ravmilim.co.il/naerr.asp

  32. The Responsa Project (2009), http://www.biu.ac.il/ICJI/Responsa/index.html

  33. Rosenthal, F.: Aramaic Studies During the Past Thirty Years. The Journal of Near Eastern Studies, 81–82 (1978)

    Google Scholar 

  34. Roth, D.: Learning to Resolve Natural Language Ambiguities: A Unified Approach. In: Proceedings of AAAI 1998 and IAAI 1998, Madison, Wisconsin, pp. 806–813 (1998)

    Google Scholar 

  35. Wartski, I.: Hebrew Grammar and Explanatory Notes. The Linguaphone Institute, London (1900)

    Google Scholar 

  36. Wintner, S.: Hebrew Computational Linguistics: Past and Future. Artificial Intelligence Review 21(2), 113–138 (2004)

    Article  MATH  Google Scholar 

  37. Witten, I.H., Frank, E.: Weka 3: Machine Learning Software in Java (2009), http://www.cs.waikato.ac.nz/~ml/weka

  38. Yelin, D.: Dikduk HaLason HaIvrit (Hebrew Grammar), Jerusalem (1970) (in Hebrew)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

HaCohen-Kerner, Y., Erlich, O.T. (2014). Identifying the Correct Root of an Ambiguous Hebrew Word. In: Dershowitz, N., Nissan, E. (eds) Language, Culture, Computation. Computational Linguistics and Linguistics. Lecture Notes in Computer Science, vol 8003. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45327-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-45327-4_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-45326-7

  • Online ISBN: 978-3-642-45327-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics