Abstract
The aim of unsupervised and knowledge free morphological segmentation is the identification of boundaries between morphs in words of a given language without relying on any knowledge source about that language. This paper describes a segmentation method that draws on previous approaches based both on semantic and orthographical similarity to identify morphologically related words. Using a version of Multiple Sequence Alignment originally applied in bioinformatics, the method extracts both concatenative and non-concatenative (e.g. introflection and circumfixation) morphological patterns and can thus handle languages of different morphological types as well as non-dominant morphological processes within languages of a particular predominant morphological type.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baayen, R.H., Piepenbrock, R., Gulikers, L.: The CELEX lexical database (release 2). CD-ROM (1995)
Baroni, M., Matiasek, J., Trost, H.: Unsupervised discovery of morphologically related words based on orthographic and semantic similarity. In: Proceedings of the ACL 2002 Workshop on Morphological and Phonological Learning, pp. 48–57 (2002)
Bordag, S.: Unsupervised and knowledge-free morpheme segmentation and analysis. In: Peters, C., Jijkoun, V., Mandl, T., Müller, H., Oard, D.W., Peñas, A., Petras, V., Santos, D. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 881–891. Springer, Heidelberg (2008)
Bordag, S.: A Comparison of Co-occurrence and Similarity Measures as Simulations of Context. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 52–63. Springer, Heidelberg (2008)
Bybee, J.L.: Morphology: A Study of the Relation between Meaning and Form. Typological Studies in Language, vol. 9. John Benjamins Publishing Company (1985)
Creutz, M., Lagus, K.: Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora Using Morfessor 1.0. Tech. Rep. Report A81, Helsinki University of Technology (March 2005)
Dayhoff, M.O., Schwartz, R.M.: A model of evolutionary change in proteins. Atlas of protein sequence and structure 5(suppl. 3), 345–358 (1978)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for information science 41(6), 391–407 (1990)
Demberg, V.: A language-independent unsupervised model for morphological segmentation. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 920–927 (2007)
Dreyer, M.: A Non-Parametric Model for the Discovery of Inflectional Paradigms from Plain Text Using Graphical Models over Strings. Ph.D. thesis, Johns Hopkins University (2011)
Durbin, R., Eddie, S.R., Lrogh, A., Mitchinson, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press (1998)
Freitag, D.: Morphology induction from term clusters. In: Proceedings of the Ninth Conference on Computational Natural Language Learning, CONLL 2005, pp. 128–135 (2005)
Gotoh, O.: An Improved Algorithm for Matching Biological Sequences. Journal of Molecular Biology 162(3), 705–708 (1982)
Gotoh, O.: Significant Improvement in Accuracy of Multiple Protein Sequence Alignments by Iterative Refinement as Assessed by Reference to Structural Alignments. Journal of Molecular Biology 264, 823–838 (1996)
Hafer, M.A., Weiss, S.F.: Word segmentation by letter success varieties. Information Storage and Retrieval 10, 371–385 (1974)
Hammarström, H., Borin, L.: Unsupervised Learning of Morphology. Computational Linguistics 37(2), 309–350 (2011)
Harris, Z.S.: Distributional Structure. Word 10(2/3), 146–162 (1954)
Harris, Z.S.: From phoneme to morpheme. Language 31, 190–222 (1955)
Harris, Z.S.: Morpheme boundaries within words: Report on a computer test. In: Transformations and Discourse Analysis Papers. Department of Linguistics, University of Pennsylvania (1967)
Haspelmath, M.: Understanding morphology. Arnold London (2002)
Hathout, N.: Acquistion of the morphological structure of the lexicon based on lexical similarity and formal analogy. In: Proceedings of the 3rd Textgraphs Workshop on Graph-Based Algorithms for Natural Language Processing, pp. 1–8. Association for Computational Linguistics (2008)
Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences 89(22), 10915–10919 (1992)
Holland, R.C.G., Down, T.A., Pocock, M.R., Prlic, A., Huen, D., James, K., Foisy, S., Dräger, A., Yates, A., Heuer, M., Schreiber, M.J.: BioJava: an open-source framework for bioinformatics. Bioinformatics 24(18), 2096–2097 (2008)
Holtsberg, A., Willners, C.: Statistics for sentential co-occurrence. Lund Working Papers in Linguistics 48, 135–147 (2001)
Itai, A., Wintner, S.: Language resources for Hebrew. Language Resources and Evaluation 42(1), 75–98 (2008)
Kurimo, M., Virpioja, S., Turunen, V.T., Blackwood, G.W., Byrne, W.: Overview and Results of Morpho Challenge 2009. In: Peters, C., Di Nunzio, G.M., Kurimo, M., Mandl, T., Mostefa, D., Peñas, A., Roda, G. (eds.) CLEF 2009. LNCS, vol. 6241, pp. 578–597. Springer, Heidelberg (2010)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Doklady Akademii Nauk SSSR 163(4), 845–848 (1965)
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48(3), 443–453 (1970)
Notredame, C.: Recent progresses in multiple sequence alignment: a survey. Pharmacogenomics 3(1) (2002)
Pirkola, A.: Morphological typology of languages for ir. Jounral of Documentation 57(3) (2001)
Poon, H., Cherry, C., Toutanova, K.: Unsupervised morphological segmentation with log-linear models. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 209–217 (2009)
Quasthoff, U., Wolff, C.: The Poisson collocation measure and its applications. In: Proceedings of the Second International Workshop on Computational Approaches to Collocations (2002)
Schone, P., Jurafsky, D.: Knowledge-free induction of inflectional morphologies. In: Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies, NAACL 2001, Stroudsburg, PA, USA (2001)
Snyder, B., Barzilay, R.: Unsupervised multilingual learning for morphological segmentation. In: Proceedings of ACL 2008: HLT, pp. 737–745 (2008)
Tchoukalov, T., Monson, C., Roark, B.: Morphological Analysis by Multiple Sequence Alignment. In: Peters, C., Di Nunzio, G.M., Kurimo, M., Mandl, T., Mostefa, D., Peñas, A., Roda, G. (eds.) CLEF 2009. LNCS, vol. 6241, pp. 666–673. Springer, Heidelberg (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kirschenbaum, A. (2013). Unsupervised Segmentation for Different Types of Morphological Processes Using Multiple Sequence Alignment. In: Dediu, AH., MartÃn-Vide, C., Mitkov, R., Truthe, B. (eds) Statistical Language and Speech Processing. SLSP 2013. Lecture Notes in Computer Science(), vol 7978. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39593-2_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-39593-2_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39592-5
Online ISBN: 978-3-642-39593-2
eBook Packages: Computer ScienceComputer Science (R0)