Skip to main content

Unsupervised Word Categorization Using Self-Organizing Maps and Automatically Extracted Morphs

  • Conference paper
Intelligent Data Engineering and Automated Learning – IDEAL 2006 (IDEAL 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4224))

  • 1704 Accesses

Abstract

Automatic creation of syntactic and semantic word categorizations is a challenging problem for highly inflecting languages due to excessive data sparsity. Moreover, the study of colloquial language resources requires the utilization of fully corpus-based tools. We present a completely automated approach for producing word categorizations for morphologically rich languages. Self-Organizing Map (SOM) is utilized for clustering words based on the morphological properties of the context words. These properties are extracted using an automated morphological segmentation algorithm called Morfessor. Our experiments on a colloquial Finnish corpus of stories told by young children show that utilizing unsupervised morphs as features leads to clearly improved clusterings when compared to the use of whole context words as features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Ritter, H., Kohonen, T.: Self-Organizing Maps. Biological Cybernetics 61, 241–254 (1989)

    Article  Google Scholar 

  2. Honkela, T., Pulkki, V., Kohonen, T.: Contextual relations of words in Grimm tales analyzed by self-organizing map. In: Proceedings of ICANN 1995. Paris. EC2 et Cie, vol. 2, pp. 3–7 (1995)

    Google Scholar 

  3. Redington, M., Chater, N., Finch, S.: Distributional information: A powerful cue for acquiring syntactic categories. Cognitive Science 22(4), 425–469 (1998)

    Article  Google Scholar 

  4. Schutze, H.: Automatic word sense discrimination. Computational Linguistics 24(1), 97–123 (1998)

    MathSciNet  Google Scholar 

  5. Lagus, K., Airola, A., Creutz, M.: Data analysis of conceptual similarities of Finnish verbs. In: Proceedings of the CogSci 2002, Fairfax, Virginia, pp. 566–571 (2002)

    Google Scholar 

  6. Pereira, F., Tishby, N., Lee, L.: Distributional clustering of English words. In: ACL 30, pp.183–190 (1993)

    Google Scholar 

  7. Schulte im Walde, S.: Clustering verbs semantically according to their alternation behaviour. In: COLING 2000, pp. 747–753 (2000)

    Google Scholar 

  8. Light, M.: Morphological cues for lexical semantics. In: ACL 34, pp. 25–31 (1996)

    Google Scholar 

  9. Goldsmith, J.: Unsupervised learning of the morphology of a natural language. Computational Linguistics 27(2), 153–198 (2001)

    Article  MathSciNet  Google Scholar 

  10. Creutz, M., Lagus, K.: Unsupervised discovery of morphemes. In: Proceedings of the Workshop on Morphological and Phonological Learning of ACL 2002, Philadelphia, Pennsylvania, pp. 21–30 (2002)

    Google Scholar 

  11. Kohonen, T.: Self-organized formation of topologically correct feature maps. Biological Cybernetics 43, 59–69 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  12. Creutz, M., Lagus, K.: Inducing the morphological lexicon of a natural language from unannotated text. In: Proceedings of AKRR 2005, Espoo, pp. 106–113 (2005)

    Google Scholar 

  13. Hirsimaki, T., Creutz, M., Siivola, V., Kurimo, M., Virpioja, S., Pylkkonen, J.: Unlimited vocabulary speech recognition with morph language models applied to Finnish. Computer Speech and Language 20(4), 515–541 (2006)

    Article  Google Scholar 

  14. Kohonen, T.: Self-Organizing Maps, 3rd edn. Springer, Berlin (2001)

    MATH  Google Scholar 

  15. Riihela, M.: The Storycrafting Method, Stakes, Helsinki, Finland (2001)

    Google Scholar 

  16. Hakulinen, A., Vilkuna, M., Korhonen, R., Koivisto, V., Heinonen, T., Alho, I.: Iso suomen kielioppi. Suomalaisen Kirjallisuuden Seura, Helsinki (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Klami, M., Lagus, K. (2006). Unsupervised Word Categorization Using Self-Organizing Maps and Automatically Extracted Morphs. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2006. IDEAL 2006. Lecture Notes in Computer Science, vol 4224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875581_109

Download citation

  • DOI: https://doi.org/10.1007/11875581_109

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-45485-4

  • Online ISBN: 978-3-540-45487-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics