Skip to main content

Automatic Morpheme Slot Identification Using Genetic Algorithm

  • Conference paper
  • First Online:
  • 672 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9561))

Abstract

We introduce an approach to the grouping of morphemes into suffix slots in morphologically complex languages using genetic algorithm. The method is applied to verbs in Amharic, an under-resourced morphologically rich Semitic language, with a number of non-concatenative prefix and suffix morphemes. We start with a limited set of segmented verbs and the set of suffixes themselves, extracted on the basis of our previous work. Each member of the population for the genetic algorithm is an assignment of the morphemes to one of the possible slots. The fitness function combines scores for exact slot position and correct ordering of morphemes. We use mutation but no crossover operator with various combinations of population size, mutation rate, and number of generations, and models evolve to yield promising morpheme classification results with 90.02 % accuracy level. We evaluate the fittest individuals on the basis of the known morpheme classes for Amharic.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Amharic is written in the Geez script (see http://en.wikipedia.org/wiki/Ge’ez_script). In this paper we write Amharic words using a convention SERA romanization scheme.

  2. 2.

    CLOG is a freely available ILP system at http://www-users.cs.york.ac.uk/suresh/CLOG.html.

  3. 3.

    In this paper we do not address the problem of morpheme ambiguity (appearing in multiple slots), which arises in many languages, including Amharic. We leave this for future work.

  4. 4.

    A similar approach would apply to the prefixes, which are in any cases simpler in Amharic.

References

  1. Beesley, K.R., Karttunen, L.: Finite State Morphology, CSLI Studies in Computational Linguistics, vol. 3. CSLI Publications, Stanford (2003)

    Google Scholar 

  2. Bender, M.L.: Amharic verb morphology: a generative approach. Ph.D. thesis, Graduate School of Texas (1968)

    Google Scholar 

  3. De Pauw, G., Wagacha, P.W.: Bootstrapping morphological analysis of gikuyu using unsupervised maximum entropy learning. In: Proceedings of the Eighth INTERSPEECH Conference, Antwerp, Belgium (2007)

    Google Scholar 

  4. Goldsmith, J.: The unsupervised learning of natural language morphology. Comput. Linguist. 27, 153–198 (2001)

    Article  MathSciNet  Google Scholar 

  5. Hammarström, H., Borin, L.: Unsupervised learning of morphology. Comput. Linguist. 37(2), 309–350 (2011)

    Article  MathSciNet  Google Scholar 

  6. Holland, J.H.: Adapt. Nat. Artif. Syst. MIT Press, Cambridge (1992)

    Google Scholar 

  7. Kaplan, R.M., Kay, M.: Regular models of phonological rule systems. Comput. Linguist. 20(3), 331–378 (1994)

    Google Scholar 

  8. Karttunen, L., Kaplan, R.M., Zaenen, A.: Two level morphology with composition. In: Proceedings of the International Conference on Computational Linguistics, vol. 14, no. 1, pp. 141–148 (1992)

    Google Scholar 

  9. Kazakov, D.: Achievements and prospects of learning word morphology with inductive logic programming. In: Cussens, J., Džeroski, S. (eds.) LLL 1999. LNCS (LNAI), vol. 1925, pp. 89–109. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  10. Koskenniemi, K.: Two level morphology: a general computational model for word-form recognition and production. In: Proceedings of the 10th International Conference on Computational Linguistics-COLING 1984. Association for Computational Linguistics, pp. 178–181 (1984)

    Google Scholar 

  11. Manandhar, S., Džeroski, S., Erjavec, T.: Learning multilingual morphology with CLOG. In: Page, David L. (ed.) ILP 1998. LNCS, vol. 1446, pp. 135–144. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  12. Mooney, R.J.: Inductive logic programming. In: Mitkov, R. (ed.) Oxford Handbook of Computational Linguistics, pp. 376–394. Oxford University Press, Oxford (1997)

    Google Scholar 

  13. Mooney, R.J.: Machine learning. In: Mitkov, R. (ed.) Oxford Handbook of Computational Linguistics, pp. 376–394. Oxford University Press, Oxford (2003)

    Google Scholar 

  14. Wondwossen, M., Gasser, M., Baye, Y.: Incremental learning of affix segmentation. In: Proceedings of the 24th International Conference on Computational Linguistics-COLING 2012, pp. 1901–1914. Association for Computational Linguistics (ACL), Mumbai, India (2012)

    Google Scholar 

  15. Wondwossen, M., Gasser, M.: Learning morphological rules for Amharic verbs using inductive logic programming. In: Proceedings of SALTMIL-AfLaT Workshop on Language Technology for Normalisation of Less-Resourced Languages, Istanbul, Turkey, pp. 7–12 (2012)

    Google Scholar 

  16. Oflazer, K., Nirenburg, S., McShane, M.: Bootstrapping morphological analyzers by combining human elicitation and machine learning. Comput. Linguist. 27(1), 59–85 (2001)

    Article  Google Scholar 

  17. Spiegler, S.R.: Machine learning for the analysis of morphologically complex languages. Ph.D. thesis. University of Bristol (2011)

    Google Scholar 

  18. Baye, Y.: Yamarigna Sewasiw (Amharic Grammar). EMPDA Publications, Addis Ababa (1995)

    Google Scholar 

  19. Ivanovska, A., Zdravkova, K., Džeroski, S., Erjavec, T.: Learning rules for morphological analysis and synthesis of Macedonian nouns. In: Proceedings of SiKDD-2005 Conference on Data Mining and Data Warehouses, Ljubljana, Sloveniapp, pp. 195–198 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wondwossen Mulugeta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Mulugeta, W., Gasser, M., Yimam, B. (2016). Automatic Morpheme Slot Identification Using Genetic Algorithm. In: Vetulani, Z., Uszkoreit, H., Kubis, M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2013. Lecture Notes in Computer Science(), vol 9561. Springer, Cham. https://doi.org/10.1007/978-3-319-43808-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-43808-5_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-43807-8

  • Online ISBN: 978-3-319-43808-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics