Skip to main content

A Comparison of Lithuanian Morphological Analyzers

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10415))

Included in the following conference series:

Abstract

In this paper we present the comparative research work disclosing strengths and weaknesses of two the most popular and publicly available Lithuanian morphological analyzers, in particular, Lemuoklis and Semantika.lt. Their lemmatization, part-of-speech tagging, and fined-grained annotation of the morphological categories (as case, gender, tense, etc.) performance was evaluated on the morphologically annotated gold standard corpus composed of four domains, in particular, administrative, fiction, scientific and periodical texts. Semantika.lt significantly outperformed Lemuoklis by \(\sim \)1.7%, \(\sim \)2.5%, and \(\sim \)8.1% on the lemmatization, part-of-speech tagging, and fine-grained annotation tasks achieving \(\sim \)98.0%, \(\sim \)95.3% and, \(\sim \)86.8% of the accuracy, respectively.

Semantika.lt was also superior on the administrative, fiction, and periodical texts; however, Lemuoklis yielded similar performance on the scientific texts and even bypassed Semantika.lt in the fine-grained annotation task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    At http://tekstynas.vdu.lt/page.xhtml;jsessionid=C27B0743101187E540CD32D0498C9887?id=morphological-annotator.

  2. 2.

    At http://www.semantika.lt/TextAnnotation/Annotation/Annotate.

  3. 3.

    More about the Universal Dependencies Project is presented in http://universaldependencies.org/.

  4. 4.

    The annotated corpus can be downloaded from https://clarin.vdu.lt/xmlui/handle/20.500.11821/9.

References

  1. Agarwal, A., Pramila, Singh, S.P., Kumar, A., Darbari, H.: Morphological analyser for Hindi - a rule based implementation. Int. J. Adv. Comput. Res. 4(1), 19–25 (2014)

    Google Scholar 

  2. Akilan, R., Naganathan, E.R.: Morphological analyzer for classical Tamil texts: a rule-based approach. IJISET - Int. J. Innovative Sci. Eng. Technol. 1(5), 563–568 (2014)

    Google Scholar 

  3. Baisa, V., Suchomel, V.: Large corpora for Turkic languages and unsupervised morphological analysis. In: Proceedings of the Eighth Conference on International Language Resources and Evaluation (LREC) (2012)

    Google Scholar 

  4. Bickel, B., Comrie, B., Haspelmath, M.: Leipzig Glossing Rules: Conventions for Interlinear Morpheme-by-Morpheme Glosses (2008)

    Google Scholar 

  5. Bögel, T., Butt, M., Hautli, A., Sulger, S.: Developing a finite-state morphological analyzer for Urdu and Hindi. In: The 6th International Workshop on Finite-State Methods and Natural Language Processing (FSMNLP 2007), pp. 86–96 (2007)

    Google Scholar 

  6. den Bosch, A.V., Daelemans, W.: Memory-based morphological analysis. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL 1999), pp. 285–292 (1999)

    Google Scholar 

  7. Byrd, R.J., Tzoukermann, E.: Adapting an English morphological analyzer for French. In: Proceedings of the 26th Annual Meeting on Association for Computational Linguistics (ACL 1988), pp. 1–6 (1988)

    Google Scholar 

  8. Daudaravičius, V., Rimkutė, E., Utka, A.: Morphological annotation of the Lithuanian corpus. In: Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies (ACL 2007), pp. 94–99 (2007)

    Google Scholar 

  9. Gelbukh, A., Sidorov, G.: Approach to construction of automatic morphological analysis systems for inflective languages with little effort. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 215–220. Springer, Heidelberg (2003). doi:10.1007/3-540-36456-0_21

    Chapter  Google Scholar 

  10. Jȩrzejowicz, P., Strychowski, J.: A neural network based morphological analyser of the natural language. In: Proceedings of the International Conference on Intelligent Information Processing and Web Mining (IIPWM 2005), pp. 199–208 (2005)

    Google Scholar 

  11. Karp, D., Schabes, Y., Zaidel, M., Egedi, D.: A freely available wide coverage morphological analyzer for English. In: Proceedings of the 14th Conference on Computational Linguistics, vol. 3, pp. 950–955 (1992)

    Google Scholar 

  12. Kessikbayeva, G., Cicekli, I.: A rule based morphological analyzer and a morphological disambiguator for Kazakh language. Linguist. Lit. Stud. 4(1), 96–104 (2016)

    Article  Google Scholar 

  13. Khoufi, N., Boudokhane, M.: Statistical-based system for morphological annotation of Arabic texts. In: Recent Advances in Natural Language Processing (RANLP 2013), pp. 100–106 (2013)

    Google Scholar 

  14. Koskenniemi, K.: Two-level model for morphological analysis. In: Proceedings of the International Joint Conferences on Artificial Intelligence Organization (IJCAI 1983), pp. 683–685 (1983)

    Google Scholar 

  15. Malladi, D.K., Mannem, P.: Statistical morphological analyzer for Hindi. In: International Joint Conference on Natural Language Processing (IJCNLP 2013), pp. 1007–1011 (2013)

    Google Scholar 

  16. McNemar, Q.M.: Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12(2), 153–157 (1947)

    Article  Google Scholar 

  17. Pauw, G.D., de Schryver, G.M.: Improving the computational morphological analysis of a Swahili corpus for lexicographic purposes. Lexikos 18, 303–318 (2008)

    Google Scholar 

  18. Rimkutė, E.: Morfologinio daugiareikšmiškumo ribojimas kompiuteriniame tekstyne [The Limitation of the Morphological Disambiguation in the Digitalized Corpus] (in Lithuanian). Ph.D. thesis, Vytautas Magnus University (2006)

    Google Scholar 

  19. Russell, G.J., Pulman, S.G., Ritchie, G.D., Black, A.W.: A dictionary and morphological analyser for English. In: Proceedings of the 11th Conference on Computational Linguistics (COLING 1986), pp. 277–279 (1986)

    Google Scholar 

  20. Savickienė, I., Kempe, V., Brooks, P.J.: Acquisition of gender agreement in Lithuanian: exploring the effect of diminutive usage in an elicited production task. J. Child Lang. 36, 477–494 (2009)

    Article  Google Scholar 

  21. Žilinskienė, V.: Lietuviŭ kalbos dažninis žodynas [The Frequency Dictionary of the Lithuanian Language] (1990). (in Lithuanian)

    Google Scholar 

  22. Zinkevičius, V.: Lemuoklis - morfologinei analizei [Morphological analysis with Lemuoklis]. In: Gudaitis, L. (ed.) Darbai ir Dienos, vol. 24, pp. 246–273 (2000) (in Lithuanian)

    Google Scholar 

Download references

Acknowledgments

The authors thank the researchers from LLC Fotonija, especially Virginijus Dadurkevičius, for providing information about the Semantika.lt morphological analyzer.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jurgita Kapočiūtė-Dzikienė .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Kapočiūtė-Dzikienė, J., Rimkutė, E., Boizou, L. (2017). A Comparison of Lithuanian Morphological Analyzers. In: Ekštein, K., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2017. Lecture Notes in Computer Science(), vol 10415. Springer, Cham. https://doi.org/10.1007/978-3-319-64206-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64206-2_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64205-5

  • Online ISBN: 978-3-319-64206-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics