Abstract
We introduce an approach to the grouping of morphemes into suffix slots in morphologically complex languages using genetic algorithm. The method is applied to verbs in Amharic, an under-resourced morphologically rich Semitic language, with a number of non-concatenative prefix and suffix morphemes. We start with a limited set of segmented verbs and the set of suffixes themselves, extracted on the basis of our previous work. Each member of the population for the genetic algorithm is an assignment of the morphemes to one of the possible slots. The fitness function combines scores for exact slot position and correct ordering of morphemes. We use mutation but no crossover operator with various combinations of population size, mutation rate, and number of generations, and models evolve to yield promising morpheme classification results with 90.02 % accuracy level. We evaluate the fittest individuals on the basis of the known morpheme classes for Amharic.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Amharic is written in the Geez script (see http://en.wikipedia.org/wiki/Ge’ez_script). In this paper we write Amharic words using a convention SERA romanization scheme.
- 2.
CLOG is a freely available ILP system at http://www-users.cs.york.ac.uk/suresh/CLOG.html.
- 3.
In this paper we do not address the problem of morpheme ambiguity (appearing in multiple slots), which arises in many languages, including Amharic. We leave this for future work.
- 4.
A similar approach would apply to the prefixes, which are in any cases simpler in Amharic.
References
Beesley, K.R., Karttunen, L.: Finite State Morphology, CSLI Studies in Computational Linguistics, vol. 3. CSLI Publications, Stanford (2003)
Bender, M.L.: Amharic verb morphology: a generative approach. Ph.D. thesis, Graduate School of Texas (1968)
De Pauw, G., Wagacha, P.W.: Bootstrapping morphological analysis of gikuyu using unsupervised maximum entropy learning. In: Proceedings of the Eighth INTERSPEECH Conference, Antwerp, Belgium (2007)
Goldsmith, J.: The unsupervised learning of natural language morphology. Comput. Linguist. 27, 153–198 (2001)
Hammarström, H., Borin, L.: Unsupervised learning of morphology. Comput. Linguist. 37(2), 309–350 (2011)
Holland, J.H.: Adapt. Nat. Artif. Syst. MIT Press, Cambridge (1992)
Kaplan, R.M., Kay, M.: Regular models of phonological rule systems. Comput. Linguist. 20(3), 331–378 (1994)
Karttunen, L., Kaplan, R.M., Zaenen, A.: Two level morphology with composition. In: Proceedings of the International Conference on Computational Linguistics, vol. 14, no. 1, pp. 141–148 (1992)
Kazakov, D.: Achievements and prospects of learning word morphology with inductive logic programming. In: Cussens, J., Džeroski, S. (eds.) LLL 1999. LNCS (LNAI), vol. 1925, pp. 89–109. Springer, Heidelberg (2000)
Koskenniemi, K.: Two level morphology: a general computational model for word-form recognition and production. In: Proceedings of the 10th International Conference on Computational Linguistics-COLING 1984. Association for Computational Linguistics, pp. 178–181 (1984)
Manandhar, S., Džeroski, S., Erjavec, T.: Learning multilingual morphology with CLOG. In: Page, David L. (ed.) ILP 1998. LNCS, vol. 1446, pp. 135–144. Springer, Heidelberg (1998)
Mooney, R.J.: Inductive logic programming. In: Mitkov, R. (ed.) Oxford Handbook of Computational Linguistics, pp. 376–394. Oxford University Press, Oxford (1997)
Mooney, R.J.: Machine learning. In: Mitkov, R. (ed.) Oxford Handbook of Computational Linguistics, pp. 376–394. Oxford University Press, Oxford (2003)
Wondwossen, M., Gasser, M., Baye, Y.: Incremental learning of affix segmentation. In: Proceedings of the 24th International Conference on Computational Linguistics-COLING 2012, pp. 1901–1914. Association for Computational Linguistics (ACL), Mumbai, India (2012)
Wondwossen, M., Gasser, M.: Learning morphological rules for Amharic verbs using inductive logic programming. In: Proceedings of SALTMIL-AfLaT Workshop on Language Technology for Normalisation of Less-Resourced Languages, Istanbul, Turkey, pp. 7–12 (2012)
Oflazer, K., Nirenburg, S., McShane, M.: Bootstrapping morphological analyzers by combining human elicitation and machine learning. Comput. Linguist. 27(1), 59–85 (2001)
Spiegler, S.R.: Machine learning for the analysis of morphologically complex languages. Ph.D. thesis. University of Bristol (2011)
Baye, Y.: Yamarigna Sewasiw (Amharic Grammar). EMPDA Publications, Addis Ababa (1995)
Ivanovska, A., Zdravkova, K., Džeroski, S., Erjavec, T.: Learning rules for morphological analysis and synthesis of Macedonian nouns. In: Proceedings of SiKDD-2005 Conference on Data Mining and Data Warehouses, Ljubljana, Sloveniapp, pp. 195–198 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Mulugeta, W., Gasser, M., Yimam, B. (2016). Automatic Morpheme Slot Identification Using Genetic Algorithm. In: Vetulani, Z., Uszkoreit, H., Kubis, M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2013. Lecture Notes in Computer Science(), vol 9561. Springer, Cham. https://doi.org/10.1007/978-3-319-43808-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-43808-5_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43807-8
Online ISBN: 978-3-319-43808-5
eBook Packages: Computer ScienceComputer Science (R0)