Skip to main content

Unsupervised Segmentation for Different Types of Morphological Processes Using Multiple Sequence Alignment

  • Conference paper
Statistical Language and Speech Processing (SLSP 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7978))

Included in the following conference series:

Abstract

The aim of unsupervised and knowledge free morphological segmentation is the identification of boundaries between morphs in words of a given language without relying on any knowledge source about that language. This paper describes a segmentation method that draws on previous approaches based both on semantic and orthographical similarity to identify morphologically related words. Using a version of Multiple Sequence Alignment originally applied in bioinformatics, the method extracts both concatenative and non-concatenative (e.g. introflection and circumfixation) morphological patterns and can thus handle languages of different morphological types as well as non-dominant morphological processes within languages of a particular predominant morphological type.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baayen, R.H., Piepenbrock, R., Gulikers, L.: The CELEX lexical database (release 2). CD-ROM (1995)

    Google Scholar 

  2. Baroni, M., Matiasek, J., Trost, H.: Unsupervised discovery of morphologically related words based on orthographic and semantic similarity. In: Proceedings of the ACL 2002 Workshop on Morphological and Phonological Learning, pp. 48–57 (2002)

    Google Scholar 

  3. Bordag, S.: Unsupervised and knowledge-free morpheme segmentation and analysis. In: Peters, C., Jijkoun, V., Mandl, T., Müller, H., Oard, D.W., Peñas, A., Petras, V., Santos, D. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 881–891. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  4. Bordag, S.: A Comparison of Co-occurrence and Similarity Measures as Simulations of Context. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 52–63. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  5. Bybee, J.L.: Morphology: A Study of the Relation between Meaning and Form. Typological Studies in Language, vol. 9. John Benjamins Publishing Company (1985)

    Google Scholar 

  6. Creutz, M., Lagus, K.: Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora Using Morfessor 1.0. Tech. Rep. Report A81, Helsinki University of Technology (March 2005)

    Google Scholar 

  7. Dayhoff, M.O., Schwartz, R.M.: A model of evolutionary change in proteins. Atlas of protein sequence and structure 5(suppl. 3), 345–358 (1978)

    Google Scholar 

  8. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for information science 41(6), 391–407 (1990)

    Article  Google Scholar 

  9. Demberg, V.: A language-independent unsupervised model for morphological segmentation. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 920–927 (2007)

    Google Scholar 

  10. Dreyer, M.: A Non-Parametric Model for the Discovery of Inflectional Paradigms from Plain Text Using Graphical Models over Strings. Ph.D. thesis, Johns Hopkins University (2011)

    Google Scholar 

  11. Durbin, R., Eddie, S.R., Lrogh, A., Mitchinson, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press (1998)

    Google Scholar 

  12. Freitag, D.: Morphology induction from term clusters. In: Proceedings of the Ninth Conference on Computational Natural Language Learning, CONLL 2005, pp. 128–135 (2005)

    Google Scholar 

  13. Gotoh, O.: An Improved Algorithm for Matching Biological Sequences. Journal of Molecular Biology 162(3), 705–708 (1982)

    Article  Google Scholar 

  14. Gotoh, O.: Significant Improvement in Accuracy of Multiple Protein Sequence Alignments by Iterative Refinement as Assessed by Reference to Structural Alignments. Journal of Molecular Biology 264, 823–838 (1996)

    Article  Google Scholar 

  15. Hafer, M.A., Weiss, S.F.: Word segmentation by letter success varieties. Information Storage and Retrieval 10, 371–385 (1974)

    Article  Google Scholar 

  16. Hammarström, H., Borin, L.: Unsupervised Learning of Morphology. Computational Linguistics 37(2), 309–350 (2011)

    Article  Google Scholar 

  17. Harris, Z.S.: Distributional Structure. Word 10(2/3), 146–162 (1954)

    Google Scholar 

  18. Harris, Z.S.: From phoneme to morpheme. Language 31, 190–222 (1955)

    Article  Google Scholar 

  19. Harris, Z.S.: Morpheme boundaries within words: Report on a computer test. In: Transformations and Discourse Analysis Papers. Department of Linguistics, University of Pennsylvania (1967)

    Google Scholar 

  20. Haspelmath, M.: Understanding morphology. Arnold London (2002)

    Google Scholar 

  21. Hathout, N.: Acquistion of the morphological structure of the lexicon based on lexical similarity and formal analogy. In: Proceedings of the 3rd Textgraphs Workshop on Graph-Based Algorithms for Natural Language Processing, pp. 1–8. Association for Computational Linguistics (2008)

    Google Scholar 

  22. Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences 89(22), 10915–10919 (1992)

    Article  Google Scholar 

  23. Holland, R.C.G., Down, T.A., Pocock, M.R., Prlic, A., Huen, D., James, K., Foisy, S., Dräger, A., Yates, A., Heuer, M., Schreiber, M.J.: BioJava: an open-source framework for bioinformatics. Bioinformatics 24(18), 2096–2097 (2008)

    Article  Google Scholar 

  24. Holtsberg, A., Willners, C.: Statistics for sentential co-occurrence. Lund Working Papers in Linguistics 48, 135–147 (2001)

    Google Scholar 

  25. Itai, A., Wintner, S.: Language resources for Hebrew. Language Resources and Evaluation 42(1), 75–98 (2008)

    Article  Google Scholar 

  26. Kurimo, M., Virpioja, S., Turunen, V.T., Blackwood, G.W., Byrne, W.: Overview and Results of Morpho Challenge 2009. In: Peters, C., Di Nunzio, G.M., Kurimo, M., Mandl, T., Mostefa, D., Peñas, A., Roda, G. (eds.) CLEF 2009. LNCS, vol. 6241, pp. 578–597. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  27. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Doklady Akademii Nauk SSSR 163(4), 845–848 (1965)

    MathSciNet  Google Scholar 

  28. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48(3), 443–453 (1970)

    Article  Google Scholar 

  29. Notredame, C.: Recent progresses in multiple sequence alignment: a survey. Pharmacogenomics 3(1) (2002)

    Google Scholar 

  30. Pirkola, A.: Morphological typology of languages for ir. Jounral of Documentation 57(3) (2001)

    Google Scholar 

  31. Poon, H., Cherry, C., Toutanova, K.: Unsupervised morphological segmentation with log-linear models. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 209–217 (2009)

    Google Scholar 

  32. Quasthoff, U., Wolff, C.: The Poisson collocation measure and its applications. In: Proceedings of the Second International Workshop on Computational Approaches to Collocations (2002)

    Google Scholar 

  33. Schone, P., Jurafsky, D.: Knowledge-free induction of inflectional morphologies. In: Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies, NAACL 2001, Stroudsburg, PA, USA (2001)

    Google Scholar 

  34. Snyder, B., Barzilay, R.: Unsupervised multilingual learning for morphological segmentation. In: Proceedings of ACL 2008: HLT, pp. 737–745 (2008)

    Google Scholar 

  35. Tchoukalov, T., Monson, C., Roark, B.: Morphological Analysis by Multiple Sequence Alignment. In: Peters, C., Di Nunzio, G.M., Kurimo, M., Mandl, T., Mostefa, D., Peñas, A., Roda, G. (eds.) CLEF 2009. LNCS, vol. 6241, pp. 666–673. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kirschenbaum, A. (2013). Unsupervised Segmentation for Different Types of Morphological Processes Using Multiple Sequence Alignment. In: Dediu, AH., Martín-Vide, C., Mitkov, R., Truthe, B. (eds) Statistical Language and Speech Processing. SLSP 2013. Lecture Notes in Computer Science(), vol 7978. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39593-2_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39593-2_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39592-5

  • Online ISBN: 978-3-642-39593-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics