Abstract
We present a fully unsupervised method for morphological segmentation. Unlike many morphological segmentation systems, our method is based on semantic features rather than orthographic features. In order to capture word meanings, word embeddings are obtained from a two-level neural network [11]. We compute the semantic similarity between words using the neural word embeddings, which forms our baseline segmentation model. We model morphotactics with a bigram language model based on maximum likelihood estimates by using the initial segmentations from the baseline. Results show that using semantic features helps to improve morphological segmentation especially in agglutinating languages like Turkish. Our method shows competitive performance compared to other unsupervised morphological segmentation systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Can, B., Manandhar, S.: Clustering morphological paradigms using syntactic categories. In: Peters, C., Di Nunzio, G.M., Kurimo, M., Mandl, T., Mostefa, D., Peñas, A., Roda, G. (eds.) CLEF 2009. LNCS, vol. 6241, pp. 641–648. Springer, Heidelberg (2010)
Clark, A.: Inducing syntactic categories by context distribution clustering. In: Proceedings of 2nd Workshop on Learning Language in Logic and 4th Conference on Computational Natural Language Learning, ConLL 2000, vol. 7, pp. 91–94. Association for Computational Linguistics, Stroudsburg (2000)
Creutz, M., Lagus, K.: Inducing the morphological lexicon of a natural language from unannotated text. In: Proceedings of International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning (AKRR 2005), pp. 106–113 (2005)
Creutz, M., Lagus, K.: Unsupervised morpheme segmentation and morphology induction from text corpora using morfessor 1.0. Technical report A81 (2005)
Creutz, M., Lagus, K.: Unsupervised models for morpheme segmentation and morphology learning. ACM Trans. Speech Lang. Process. 4, 3:1–3:34 (2007)
Goldwater, S., Griffiths, T.L., Johnson, M.: Interpolating between types and tokens by estimating power-law generators. In: Advances in Neural Information Processing Systems, vol. 18, p. 459 (2006)
Hankamer, J.: Finite state morphology and left to right phonology. In: Proceedings of 5th West Coast Conference on Formal Linguistics, January 1986
Kurimo, M., Lagus, K., Virpioja, S., Turunen, V.T.: Morpho Challenge 2010, June 2011. http://research.ics.tkk.fi/events/morphochallenge2010/. Accessed 4 Jul 2016
Lee, Y.K., Haghighi, A., Barzilay, R.: Modeling syntactic context improves morphological segmentation. In: Proceedings of 15th Conference on Computational Natural Language Learning, CoNLL 2011, pp. 1–9. Association for Computational Linguistics, Stroudsburg (2011)
Lignos, C.: Learning from unseen data. In: Kurimo, M., Virpioja, S., Turunen, V., Lagus, K. (eds.) Proceedings of Morpho Challenge 2010 Workshop, pp. 35–38. Aalto University, Espoo (2010)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013). CoRR arXiv:abs/1301.3781
Narasimhan, K., Barzilay, R., Jaakkola, T.S.: An unsupervised method for uncovering morphological chains. Trans. Assoc. Comput. Linguist. (TACL) 3, 157–167 (2015)
Nicolas, L., Farré, J., Molinero, M.A.: Unsupervised learning of concatenative morphology based on frequency-related form occurrence. In: Kurimo, M., Virpioja, S., Turunen, V., Lagus, K. (eds.) Proceedings of Morpho Challenge 2010 Workshop, pp. 39–43. Aalto University, Espoo (2010)
Schone, P., Jurafsky, D.: Knowledge-free induction of inflectional morphologies. In: Proceedings of 2nd Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies, NAACL 2001, pp. 1–9. Association for Computational Linguistics, Stroudsburg (2001)
Soricut, R., Och, F.: Unsupervised morphology induction using word embeddings. In: Human Language Technologies: The 2015 Annual Conference of the North American Chapter of the ACL, pp. 1627–1637 (2015)
Sproat, R.W.: Morphology and Computation. MIT Press, Cambridge (1992)
Team, D.D.: Deeplearning4j: Open-Source Distributed Deep Learning for the JVM, Apache Software Foundation License 2.0, May 2016. http://deeplearning4j.org/
Acknowledgements
This research was supported by TUBITAK (The Scientific and Technological Research Council of Turkey) grant number 115E464.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Üstün, A., Can, B. (2016). Unsupervised Morphological Segmentation Using Neural Word Embeddings. In: Král, P., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2016. Lecture Notes in Computer Science(), vol 9918. Springer, Cham. https://doi.org/10.1007/978-3-319-45925-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-45925-7_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45924-0
Online ISBN: 978-3-319-45925-7
eBook Packages: Computer ScienceComputer Science (R0)