Automatic Morpheme Slot Identification Using Genetic Algorithm

Mulugeta, Wondwossen; Gasser, Michael; Yimam, Baye

doi:10.1007/978-3-319-43808-5_7

Automatic Morpheme Slot Identification Using Genetic Algorithm

Wondwossen Mulugeta¹⁶,
Michael Gasser¹⁷ &
Baye Yimam¹⁶

Conference paper
First Online: 30 July 2016

672 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9561))

Abstract

We introduce an approach to the grouping of morphemes into suffix slots in morphologically complex languages using genetic algorithm. The method is applied to verbs in Amharic, an under-resourced morphologically rich Semitic language, with a number of non-concatenative prefix and suffix morphemes. We start with a limited set of segmented verbs and the set of suffixes themselves, extracted on the basis of our previous work. Each member of the population for the genetic algorithm is an assignment of the morphemes to one of the possible slots. The fitness function combines scores for exact slot position and correct ordering of morphemes. We use mutation but no crossover operator with various combinations of population size, mutation rate, and number of generations, and models evolve to yield promising morpheme classification results with 90.02 % accuracy level. We evaluate the fittest individuals on the basis of the known morpheme classes for Amharic.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Amharic is written in the Geez script (see http://en.wikipedia.org/wiki/Ge’ez_script). In this paper we write Amharic words using a convention SERA romanization scheme.
2.
CLOG is a freely available ILP system at http://www-users.cs.york.ac.uk/suresh/CLOG.html.
3.
In this paper we do not address the problem of morpheme ambiguity (appearing in multiple slots), which arises in many languages, including Amharic. We leave this for future work.
4.
A similar approach would apply to the prefixes, which are in any cases simpler in Amharic.

References

Beesley, K.R., Karttunen, L.: Finite State Morphology, CSLI Studies in Computational Linguistics, vol. 3. CSLI Publications, Stanford (2003)
Google Scholar
Bender, M.L.: Amharic verb morphology: a generative approach. Ph.D. thesis, Graduate School of Texas (1968)
Google Scholar
De Pauw, G., Wagacha, P.W.: Bootstrapping morphological analysis of gikuyu using unsupervised maximum entropy learning. In: Proceedings of the Eighth INTERSPEECH Conference, Antwerp, Belgium (2007)
Google Scholar
Goldsmith, J.: The unsupervised learning of natural language morphology. Comput. Linguist. 27, 153–198 (2001)
Article MathSciNet Google Scholar
Hammarström, H., Borin, L.: Unsupervised learning of morphology. Comput. Linguist. 37(2), 309–350 (2011)
Article MathSciNet Google Scholar
Holland, J.H.: Adapt. Nat. Artif. Syst. MIT Press, Cambridge (1992)
Google Scholar
Kaplan, R.M., Kay, M.: Regular models of phonological rule systems. Comput. Linguist. 20(3), 331–378 (1994)
Google Scholar
Karttunen, L., Kaplan, R.M., Zaenen, A.: Two level morphology with composition. In: Proceedings of the International Conference on Computational Linguistics, vol. 14, no. 1, pp. 141–148 (1992)
Google Scholar
Kazakov, D.: Achievements and prospects of learning word morphology with inductive logic programming. In: Cussens, J., Džeroski, S. (eds.) LLL 1999. LNCS (LNAI), vol. 1925, pp. 89–109. Springer, Heidelberg (2000)
Chapter Google Scholar
Koskenniemi, K.: Two level morphology: a general computational model for word-form recognition and production. In: Proceedings of the 10th International Conference on Computational Linguistics-COLING 1984. Association for Computational Linguistics, pp. 178–181 (1984)
Google Scholar
Manandhar, S., Džeroski, S., Erjavec, T.: Learning multilingual morphology with CLOG. In: Page, David L. (ed.) ILP 1998. LNCS, vol. 1446, pp. 135–144. Springer, Heidelberg (1998)
Chapter Google Scholar
Mooney, R.J.: Inductive logic programming. In: Mitkov, R. (ed.) Oxford Handbook of Computational Linguistics, pp. 376–394. Oxford University Press, Oxford (1997)
Google Scholar
Mooney, R.J.: Machine learning. In: Mitkov, R. (ed.) Oxford Handbook of Computational Linguistics, pp. 376–394. Oxford University Press, Oxford (2003)
Google Scholar
Wondwossen, M., Gasser, M., Baye, Y.: Incremental learning of affix segmentation. In: Proceedings of the 24th International Conference on Computational Linguistics-COLING 2012, pp. 1901–1914. Association for Computational Linguistics (ACL), Mumbai, India (2012)
Google Scholar
Wondwossen, M., Gasser, M.: Learning morphological rules for Amharic verbs using inductive logic programming. In: Proceedings of SALTMIL-AfLaT Workshop on Language Technology for Normalisation of Less-Resourced Languages, Istanbul, Turkey, pp. 7–12 (2012)
Google Scholar
Oflazer, K., Nirenburg, S., McShane, M.: Bootstrapping morphological analyzers by combining human elicitation and machine learning. Comput. Linguist. 27(1), 59–85 (2001)
Article Google Scholar
Spiegler, S.R.: Machine learning for the analysis of morphologically complex languages. Ph.D. thesis. University of Bristol (2011)
Google Scholar
Baye, Y.: Yamarigna Sewasiw (Amharic Grammar). EMPDA Publications, Addis Ababa (1995)
Google Scholar
Ivanovska, A., Zdravkova, K., Džeroski, S., Erjavec, T.: Learning rules for morphological analysis and synthesis of Macedonian nouns. In: Proceedings of SiKDD-2005 Conference on Data Mining and Data Warehouses, Ljubljana, Sloveniapp, pp. 195–198 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Addis Ababa University, Addis Ababa, Ethiopia
Wondwossen Mulugeta & Baye Yimam
Indiana University, Bloomington, USA
Michael Gasser

Authors

Wondwossen Mulugeta
View author publications
You can also search for this author in PubMed Google Scholar
Michael Gasser
View author publications
You can also search for this author in PubMed Google Scholar
Baye Yimam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wondwossen Mulugeta .

Editor information

Editors and Affiliations

Adam Mickiewicz University , Poznań, Poland
Zygmunt Vetulani
Deutsches Forschungszentrum f. Künstl.Intelligenz (DFKI GmbH), Saarbrücken, Saarland, Germany
Hans Uszkoreit
Adam Mickiewicz University , Poznań, Poland
Marek Kubis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mulugeta, W., Gasser, M., Yimam, B. (2016). Automatic Morpheme Slot Identification Using Genetic Algorithm. In: Vetulani, Z., Uszkoreit, H., Kubis, M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2013. Lecture Notes in Computer Science(), vol 9561. Springer, Cham. https://doi.org/10.1007/978-3-319-43808-5_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-43808-5_7
Published: 30 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43807-8
Online ISBN: 978-3-319-43808-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics