Abstract
This paper introduces a probabilistic model of morphology based on a word-based morphological theory. Morphology is understood here as a system of rules that describe systematic correspondences between full word forms, without decomposing words into any smaller units. The model is formulated in the Bayesian learning framework and can be trained in both supervised and unsupervised setting. Evaluation is performed on tasks of generating unseen words, lemmatization and inflected form production.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The notion of root used here has nothing to do with the definition typically used in morphology. It is meant as a root of a derivational tree, like the one shown in Fig. 1, which principally can be any word.
- 2.
The distribution of set length has negligible influence on the behavior of the model and is included only for formal completeness. Poisson distribution is chosen because of mathematical simplicity.
- 3.
In our formalism, rules are functions mapping words to sets of words. The set is empty if the constraints on the left-hand side of the rule are not met. Otherwise, typically a single word is produced, but cases with more than one result are also possible.
- 4.
For simplicity, it is assumed here that the newly inserted word does not take over any child nodes from other words.
- 5.
Although ber is not a valid German word, it may happen to occur in the data, for example as an abbreviation or a foreign word.
- 6.
- 7.
References
Anderson, S.R.: A-Morphous Morphology, Cambridge Studies in Linguistics, vol. 62. Cambridge University Press, New York (1992)
Aronoff, M.: Word Formation in Generative Grammar. The MIT Press, Cambridge (1976)
Bocek, T., Hunt, E., Stiller, B.: Fast Similarity Search in Large Dictionaries. Technical report, University of Zurich (2007)
Brants, S., Dipper, S., Eisenberg, P., Hansen, S., König, E., Lezius, W., Rohrer, C., Smith, G., Uszkoreit, H.: TIGER: linguistic interpretation of a German corpus. J. Lang. Comput. 2, 597–620 (2004)
Can, B.: Statistical Models for Unsupervised Learning of Morphology and POS Tagging. Ph.D. thesis, University of York (2011)
Chan, E.: Learning probabilistic paradigms for morphology. In: Proceedings of the Eighth Meeting of the ACL Special Interest Group on Computational Phonology at HLT-NAACL, pp. 69–78 (2006)
Chrupała, G., Dinu, G., van Genabith, J.: Learning morphology with morfette. In: Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008, pp. 2362–2367 (2008)
Durrett, G., DeNero, J.: Supervised learning of complete morphological paradigms. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1185–1195 (2013)
Ford, A., Singh, R., Martohardjono, G.: Pace Pini: towards a word-based theory of morphology. American University Studies. Series XIII, Linguistics, vol. 34. Peter Lang Publishing Incorporated (1997)
Hammarström, H., Borin, L.: Unsupervised learning of morphology. Comput. Linguist. 37(2), 309–350 (2011)
Janicki, M.: Unsupervised learning of a-morphous inflection with graph clustering. In: Proceedings of the Student Research Workshop associated with RANLP 2013, Hissar, Bulgaria, pp. 93–99 (2013)
Kirschenbaum, A.: Unsupervised segmentation for different types of morphological processes using multiple sequence alignment. In: Dediu, A.-H., MartÃn-Vide, C., Mitkov, R., Truthe, B. (eds.) SLSP 2013. LNCS, vol. 7978, pp. 152–163. Springer, Heidelberg (2013)
Kurimo, M., Virpioja, S., Turunen, V., Lagus, K.: Morpho challenge 2005–2010: evaluations and results. In: Proceedings of the 11th Meeting of the ACL-SIGMORPHON, ACL 2010, pp. 87–95, July 2010
Mikheev, A.: Automatic rule induction for unknown word guessing. Comput. Linguist. 23, 405–423 (1997)
Neuvel, S., Fulop, S.A.: Unsupervised learning of morphology without morphemes. In: Proceedings of the 6th Workshop of the ACL Special Interest Group in Computational Phonology (SIGPHON), pp. 31–40 (2002)
Poon, H., Cherry, C., Toutanova, K.: Unsupervised morphological segmentation with log-linear models. In: Proceedings of Human Language Technologies The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics on NAACL 2009, pp. 209–217 (2009)
Przepiórkowski, A.: The IPI PAN Corpus: Preliminary Version. Institute of Computer Science, Polish Academy of Sciences, Warsaw (2004)
Rasooli, M.S., Lippincott, T., Habash, N., Rambow, O.: Unsupervised morphology-based vocabulary expansion. In: ACL, pp. 1349–1359 (2014)
Ruokolainen, T., Kohonen, O., Virpioja, S., Kurimo, M.: Supervised morphological segmentation in a low-resource learning setting using conditional random fields. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning (CoNLL), Sofia, Bulgaria, pp. 29–37 (2013)
Samdani, R., Chang, M.W., Roth, D.: Unified expectation maximization. In: 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 688–698 (2012)
Tarjan, R.E.: Finding optimum branchings. Networks 7, 25–35 (1977)
Tseng, H., Jurafsky, D., Manning, C.: Morphological features help POS tagging of unknown words across language varieties. In: Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, pp. 32–39 (2005)
Virpioja, S., Smit, P., Grönroos, S.A., Kurimo, M.: Morfessor 2.0: Python Implementation and Extensions for Morfessor Baseline. Technical report, Aalto University, Helsinki (2013)
Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM 21(1), 168–173 (1974)
Wicentowski, R.H.: Modeling and Learning Multilingual Inflectional Morphology in a Minimally Supervised Framework. Ph.D. thesis, Johns Hopkins University (2002)
Yarowsky, D., Wicentowski, R.: Minimally supervised morphological analysis by multimodal alignment. In: ACL 2000, pp. 207–216 (2000)
Zielinski, A., Simon, C.: morphisto - an open source morphological analyzer for German. In: 7th International Workshop on Finite-State Methods and Natural Language Processing, FSMNLP 2008, pp. 224–231. Ispra, Italy (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Janicki, M. (2015). A Multi-purpose Bayesian Model for Word-Based Morphology. In: Mahlow, C., Piotrowski, M. (eds) Systems and Frameworks for Computational Morphology. SFCM 2015. Communications in Computer and Information Science, vol 537. Springer, Cham. https://doi.org/10.1007/978-3-319-23980-4_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-23980-4_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23978-1
Online ISBN: 978-3-319-23980-4
eBook Packages: Computer ScienceComputer Science (R0)