A Multi-purpose Bayesian Model for Word-Based Morphology

Janicki, Maciej

doi:10.1007/978-3-319-23980-4_7

Maciej Janicki¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 537))

Included in the following conference series:

International Workshop on Systems and Frameworks for Computational Morphology

Abstract

This paper introduces a probabilistic model of morphology based on a word-based morphological theory. Morphology is understood here as a system of rules that describe systematic correspondences between full word forms, without decomposing words into any smaller units. The model is formulated in the Bayesian learning framework and can be trained in both supervised and unsupervised setting. Evaluation is performed on tasks of generating unseen words, lemmatization and inflected form production.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Learning Morphology of Natural Language as a Finite-State Grammar

ThamizhiMorph: A morphological parser for the Tamil language

Article Open access 23 April 2021

Methods and Algorithms for Unsupervised Learning of Morphology

Notes

1.
The notion of root used here has nothing to do with the definition typically used in morphology. It is meant as a root of a derivational tree, like the one shown in Fig. 1, which principally can be any word.
2.
The distribution of set length has negligible influence on the behavior of the model and is included only for formal completeness. Poisson distribution is chosen because of mathematical simplicity.
3.
In our formalism, rules are functions mapping words to sets of words. The set is empty if the constraints on the left-hand side of the rule are not met. Otherwise, typically a single word is produced, but cases with more than one result are also possible.
4.
For simplicity, it is assumed here that the newly inserted word does not take over any child nodes from other words.
5.
Although ber is not a valid German word, it may happen to occur in the data, for example as an abbreviation or a foreign word.
6.
http://corpora.uni-leipzig.de.
7.
http://sjp.pl/slownik/odmiany/.

References

Anderson, S.R.: A-Morphous Morphology, Cambridge Studies in Linguistics, vol. 62. Cambridge University Press, New York (1992)
Book Google Scholar
Aronoff, M.: Word Formation in Generative Grammar. The MIT Press, Cambridge (1976)
Google Scholar
Bocek, T., Hunt, E., Stiller, B.: Fast Similarity Search in Large Dictionaries. Technical report, University of Zurich (2007)
Google Scholar
Brants, S., Dipper, S., Eisenberg, P., Hansen, S., König, E., Lezius, W., Rohrer, C., Smith, G., Uszkoreit, H.: TIGER: linguistic interpretation of a German corpus. J. Lang. Comput. 2, 597–620 (2004)
Article Google Scholar
Can, B.: Statistical Models for Unsupervised Learning of Morphology and POS Tagging. Ph.D. thesis, University of York (2011)
Google Scholar
Chan, E.: Learning probabilistic paradigms for morphology. In: Proceedings of the Eighth Meeting of the ACL Special Interest Group on Computational Phonology at HLT-NAACL, pp. 69–78 (2006)
Google Scholar
Chrupała, G., Dinu, G., van Genabith, J.: Learning morphology with morfette. In: Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008, pp. 2362–2367 (2008)
Google Scholar
Durrett, G., DeNero, J.: Supervised learning of complete morphological paradigms. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1185–1195 (2013)
Google Scholar
Ford, A., Singh, R., Martohardjono, G.: Pace Pini: towards a word-based theory of morphology. American University Studies. Series XIII, Linguistics, vol. 34. Peter Lang Publishing Incorporated (1997)
Google Scholar
Hammarström, H., Borin, L.: Unsupervised learning of morphology. Comput. Linguist. 37(2), 309–350 (2011)
Article MathSciNet Google Scholar
Janicki, M.: Unsupervised learning of a-morphous inflection with graph clustering. In: Proceedings of the Student Research Workshop associated with RANLP 2013, Hissar, Bulgaria, pp. 93–99 (2013)
Google Scholar
Kirschenbaum, A.: Unsupervised segmentation for different types of morphological processes using multiple sequence alignment. In: Dediu, A.-H., Martín-Vide, C., Mitkov, R., Truthe, B. (eds.) SLSP 2013. LNCS, vol. 7978, pp. 152–163. Springer, Heidelberg (2013)
Chapter Google Scholar
Kurimo, M., Virpioja, S., Turunen, V., Lagus, K.: Morpho challenge 2005–2010: evaluations and results. In: Proceedings of the 11th Meeting of the ACL-SIGMORPHON, ACL 2010, pp. 87–95, July 2010
Google Scholar
Mikheev, A.: Automatic rule induction for unknown word guessing. Comput. Linguist. 23, 405–423 (1997)
Google Scholar
Neuvel, S., Fulop, S.A.: Unsupervised learning of morphology without morphemes. In: Proceedings of the 6th Workshop of the ACL Special Interest Group in Computational Phonology (SIGPHON), pp. 31–40 (2002)
Google Scholar
Poon, H., Cherry, C., Toutanova, K.: Unsupervised morphological segmentation with log-linear models. In: Proceedings of Human Language Technologies The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics on NAACL 2009, pp. 209–217 (2009)
Google Scholar
Przepiórkowski, A.: The IPI PAN Corpus: Preliminary Version. Institute of Computer Science, Polish Academy of Sciences, Warsaw (2004)
Google Scholar
Rasooli, M.S., Lippincott, T., Habash, N., Rambow, O.: Unsupervised morphology-based vocabulary expansion. In: ACL, pp. 1349–1359 (2014)
Google Scholar
Ruokolainen, T., Kohonen, O., Virpioja, S., Kurimo, M.: Supervised morphological segmentation in a low-resource learning setting using conditional random fields. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning (CoNLL), Sofia, Bulgaria, pp. 29–37 (2013)
Google Scholar
Samdani, R., Chang, M.W., Roth, D.: Unified expectation maximization. In: 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 688–698 (2012)
Google Scholar
Tarjan, R.E.: Finding optimum branchings. Networks 7, 25–35 (1977)
Article MathSciNet MATH Google Scholar
Tseng, H., Jurafsky, D., Manning, C.: Morphological features help POS tagging of unknown words across language varieties. In: Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, pp. 32–39 (2005)
Google Scholar
Virpioja, S., Smit, P., Grönroos, S.A., Kurimo, M.: Morfessor 2.0: Python Implementation and Extensions for Morfessor Baseline. Technical report, Aalto University, Helsinki (2013)
Google Scholar
Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM 21(1), 168–173 (1974)
Article MathSciNet MATH Google Scholar
Wicentowski, R.H.: Modeling and Learning Multilingual Inflectional Morphology in a Minimally Supervised Framework. Ph.D. thesis, Johns Hopkins University (2002)
Google Scholar
Yarowsky, D., Wicentowski, R.: Minimally supervised morphological analysis by multimodal alignment. In: ACL 2000, pp. 207–216 (2000)
Google Scholar
Zielinski, A., Simon, C.: morphisto - an open source morphological analyzer for German. In: 7th International Workshop on Finite-State Methods and Natural Language Processing, FSMNLP 2008, pp. 224–231. Ispra, Italy (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Science, University of Leipzig, Augustusplatz 10, 04109, Leipzig, Germany
Maciej Janicki

Authors

Maciej Janicki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maciej Janicki .

Editor information

Editors and Affiliations

Institut für Deutsche Sprache, Mannheim, Germany
Cerstin Mahlow
Leibniz Institute of European History, Mainz, Germany
Michael Piotrowski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Janicki, M. (2015). A Multi-purpose Bayesian Model for Word-Based Morphology. In: Mahlow, C., Piotrowski, M. (eds) Systems and Frameworks for Computational Morphology. SFCM 2015. Communications in Computer and Information Science, vol 537. Springer, Cham. https://doi.org/10.1007/978-3-319-23980-4_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-23980-4_7
Published: 09 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23978-1
Online ISBN: 978-3-319-23980-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Multi-purpose Bayesian Model for Word-Based Morphology

Abstract

Access this chapter

Similar content being viewed by others

Learning Morphology of Natural Language as a Finite-State Grammar

ThamizhiMorph: A morphological parser for the Tamil language

Methods and Algorithms for Unsupervised Learning of Morphology

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Multi-purpose Bayesian Model for Word-Based Morphology

Abstract

Access this chapter

Similar content being viewed by others

Learning Morphology of Natural Language as a Finite-State Grammar

ThamizhiMorph: A morphological parser for the Tamil language

Methods and Algorithms for Unsupervised Learning of Morphology

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation