Skip to main content

A Multi-purpose Bayesian Model for Word-Based Morphology

  • Conference paper
  • First Online:
Systems and Frameworks for Computational Morphology (SFCM 2015)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 537))

Abstract

This paper introduces a probabilistic model of morphology based on a word-based morphological theory. Morphology is understood here as a system of rules that describe systematic correspondences between full word forms, without decomposing words into any smaller units. The model is formulated in the Bayesian learning framework and can be trained in both supervised and unsupervised setting. Evaluation is performed on tasks of generating unseen words, lemmatization and inflected form production.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The notion of root used here has nothing to do with the definition typically used in morphology. It is meant as a root of a derivational tree, like the one shown in Fig. 1, which principally can be any word.

  2. 2.

    The distribution of set length has negligible influence on the behavior of the model and is included only for formal completeness. Poisson distribution is chosen because of mathematical simplicity.

  3. 3.

    In our formalism, rules are functions mapping words to sets of words. The set is empty if the constraints on the left-hand side of the rule are not met. Otherwise, typically a single word is produced, but cases with more than one result are also possible.

  4. 4.

    For simplicity, it is assumed here that the newly inserted word does not take over any child nodes from other words.

  5. 5.

    Although ber is not a valid German word, it may happen to occur in the data, for example as an abbreviation or a foreign word.

  6. 6.

    http://corpora.uni-leipzig.de.

  7. 7.

    http://sjp.pl/slownik/odmiany/.

References

  1. Anderson, S.R.: A-Morphous Morphology, Cambridge Studies in Linguistics, vol. 62. Cambridge University Press, New York (1992)

    Book  Google Scholar 

  2. Aronoff, M.: Word Formation in Generative Grammar. The MIT Press, Cambridge (1976)

    Google Scholar 

  3. Bocek, T., Hunt, E., Stiller, B.: Fast Similarity Search in Large Dictionaries. Technical report, University of Zurich (2007)

    Google Scholar 

  4. Brants, S., Dipper, S., Eisenberg, P., Hansen, S., König, E., Lezius, W., Rohrer, C., Smith, G., Uszkoreit, H.: TIGER: linguistic interpretation of a German corpus. J. Lang. Comput. 2, 597–620 (2004)

    Article  Google Scholar 

  5. Can, B.: Statistical Models for Unsupervised Learning of Morphology and POS Tagging. Ph.D. thesis, University of York (2011)

    Google Scholar 

  6. Chan, E.: Learning probabilistic paradigms for morphology. In: Proceedings of the Eighth Meeting of the ACL Special Interest Group on Computational Phonology at HLT-NAACL, pp. 69–78 (2006)

    Google Scholar 

  7. Chrupała, G., Dinu, G., van Genabith, J.: Learning morphology with morfette. In: Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008, pp. 2362–2367 (2008)

    Google Scholar 

  8. Durrett, G., DeNero, J.: Supervised learning of complete morphological paradigms. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1185–1195 (2013)

    Google Scholar 

  9. Ford, A., Singh, R., Martohardjono, G.: Pace Pini: towards a word-based theory of morphology. American University Studies. Series XIII, Linguistics, vol. 34. Peter Lang Publishing Incorporated (1997)

    Google Scholar 

  10. Hammarström, H., Borin, L.: Unsupervised learning of morphology. Comput. Linguist. 37(2), 309–350 (2011)

    Article  MathSciNet  Google Scholar 

  11. Janicki, M.: Unsupervised learning of a-morphous inflection with graph clustering. In: Proceedings of the Student Research Workshop associated with RANLP 2013, Hissar, Bulgaria, pp. 93–99 (2013)

    Google Scholar 

  12. Kirschenbaum, A.: Unsupervised segmentation for different types of morphological processes using multiple sequence alignment. In: Dediu, A.-H., Martín-Vide, C., Mitkov, R., Truthe, B. (eds.) SLSP 2013. LNCS, vol. 7978, pp. 152–163. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  13. Kurimo, M., Virpioja, S., Turunen, V., Lagus, K.: Morpho challenge 2005–2010: evaluations and results. In: Proceedings of the 11th Meeting of the ACL-SIGMORPHON, ACL 2010, pp. 87–95, July 2010

    Google Scholar 

  14. Mikheev, A.: Automatic rule induction for unknown word guessing. Comput. Linguist. 23, 405–423 (1997)

    Google Scholar 

  15. Neuvel, S., Fulop, S.A.: Unsupervised learning of morphology without morphemes. In: Proceedings of the 6th Workshop of the ACL Special Interest Group in Computational Phonology (SIGPHON), pp. 31–40 (2002)

    Google Scholar 

  16. Poon, H., Cherry, C., Toutanova, K.: Unsupervised morphological segmentation with log-linear models. In: Proceedings of Human Language Technologies The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics on NAACL 2009, pp. 209–217 (2009)

    Google Scholar 

  17. Przepiórkowski, A.: The IPI PAN Corpus: Preliminary Version. Institute of Computer Science, Polish Academy of Sciences, Warsaw (2004)

    Google Scholar 

  18. Rasooli, M.S., Lippincott, T., Habash, N., Rambow, O.: Unsupervised morphology-based vocabulary expansion. In: ACL, pp. 1349–1359 (2014)

    Google Scholar 

  19. Ruokolainen, T., Kohonen, O., Virpioja, S., Kurimo, M.: Supervised morphological segmentation in a low-resource learning setting using conditional random fields. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning (CoNLL), Sofia, Bulgaria, pp. 29–37 (2013)

    Google Scholar 

  20. Samdani, R., Chang, M.W., Roth, D.: Unified expectation maximization. In: 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 688–698 (2012)

    Google Scholar 

  21. Tarjan, R.E.: Finding optimum branchings. Networks 7, 25–35 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  22. Tseng, H., Jurafsky, D., Manning, C.: Morphological features help POS tagging of unknown words across language varieties. In: Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, pp. 32–39 (2005)

    Google Scholar 

  23. Virpioja, S., Smit, P., Grönroos, S.A., Kurimo, M.: Morfessor 2.0: Python Implementation and Extensions for Morfessor Baseline. Technical report, Aalto University, Helsinki (2013)

    Google Scholar 

  24. Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM 21(1), 168–173 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  25. Wicentowski, R.H.: Modeling and Learning Multilingual Inflectional Morphology in a Minimally Supervised Framework. Ph.D. thesis, Johns Hopkins University (2002)

    Google Scholar 

  26. Yarowsky, D., Wicentowski, R.: Minimally supervised morphological analysis by multimodal alignment. In: ACL 2000, pp. 207–216 (2000)

    Google Scholar 

  27. Zielinski, A., Simon, C.: morphisto - an open source morphological analyzer for German. In: 7th International Workshop on Finite-State Methods and Natural Language Processing, FSMNLP 2008, pp. 224–231. Ispra, Italy (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maciej Janicki .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Janicki, M. (2015). A Multi-purpose Bayesian Model for Word-Based Morphology. In: Mahlow, C., Piotrowski, M. (eds) Systems and Frameworks for Computational Morphology. SFCM 2015. Communications in Computer and Information Science, vol 537. Springer, Cham. https://doi.org/10.1007/978-3-319-23980-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23980-4_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23978-1

  • Online ISBN: 978-3-319-23980-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics