Skip to main content

Comparative Study Concerning the Role of Surface Morphological Features in the Induction of Part-of-Speech Categories

  • Conference paper
Book cover Text, Speech and Dialogue (TSD 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8655))

Included in the following conference series:

  • 1514 Accesses

Abstract

Being based on English language, existing systems of part-of-speech induction prioritize the contextual and distributional features “external” to the word and attribute somewhat secondary importance to features derived from word’s “internal” morphologic and orthotactic regularities. Here we present some preliminary empirical results supporting the statement that simple “internal” features derived from frequencies of occurrences of character n-grams can substantially increase the V-measure of POS categories obtained by repeated bisection k-way clustering of tokens contained in Multext-East corpora. Obtained data indicate that information contained in suffix features can furnish c(l)ues strong enough to outperform some much more complex probabilist or HMM-based POS induction models, and that this can especially be the case for Western Slavic languages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Berg-Kirkpatrick, T., Bouchard-Côté, A., de Nero, J., Klein, D.: Painless unsupervised learning with features. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 582–590 (2010)

    Google Scholar 

  2. Biemann, C.: Unsupervised part-of-speech tagging employing efficient graph clustering. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp. 7–12 (2006)

    Google Scholar 

  3. Brown, P.F., Desouza, P.V., Mercer, R.L., Della Pietra, V.J., Lai, C.: Class-based n-gram models of natural language. Computational Linguistics 18(4), 467–479 (1992)

    Google Scholar 

  4. Christodoulopoulos, C., Goldwater, S., Steedman, M.: Two Decades of Unsupervised POS induction: How far have we come? In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 575–584 (2010)

    Google Scholar 

  5. Clark, A.: Combining distributional and morphological information for part of speech induction. In: Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics, vol. 1, pp. 59–66 (2003)

    Google Scholar 

  6. Elman, J.L.: Representation and structure in connectionist models (1989)

    Google Scholar 

  7. Erjavec, T.: MULTEXT-East: Morphosyntactic resources for Central and Eastern European languages. Language Resources and Evaluation 46(1), 131–142 (2012)

    Article  Google Scholar 

  8. Goldwater, S., Griffiths, T.: A fully Bayesian approach to unsupervised part-of-speech tagging. In: Annual Meeting of Association of Computational Linguistics, vol. 45, p. 744 (2007)

    Google Scholar 

  9. Graca, J., Ganchev, K., Taskar, B., Pereira, F.: Posterior vs. parameter sparsity in latent variable models. In: Advances in Neural Information Processing Systems, vol. 22, pp. 664–672 (2009)

    Google Scholar 

  10. Hromada, D.D.: Conditions for cognitive plausibility of computational models of category induction. In: Laurent, A., Strauss, O., Bouchon-Meunier, B., Yager, R.R. (eds.) IPMU 2014, Part II. CCIS, vol. 443, pp. 93–105. Springer, Heidelberg (2014)

    Google Scholar 

  11. Johnson, M.: Why doesnt EM find good HMM POS-taggers. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 296–305 (2007)

    Google Scholar 

  12. Karypis, G.: CLUTO-a clustering toolkit (2002)

    Google Scholar 

  13. Rosenberg, A., Hirschberg, J.: V-measure: A conditional entropy-based external cluster evaluation measure. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), vol. 410, p. 420 (2007)

    Google Scholar 

  14. de Saussure, F.: Cours de linguistique générale. Payot, Paris (1922)

    Google Scholar 

  15. Schütze, H.: Part-of-speech induction from scratch. In: Proceedings of the 31st Annual Meeting on Association for Computational Linguistics, pp. 251–258 (1993)

    Google Scholar 

  16. Slobin, D.: Cognitive prerequisities for acquisition of grammar. In: Studies of Child and Language Development, pp. 175–208 (1973)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Hromada, D.D. (2014). Comparative Study Concerning the Role of Surface Morphological Features in the Induction of Part-of-Speech Categories. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2014. Lecture Notes in Computer Science(), vol 8655. Springer, Cham. https://doi.org/10.1007/978-3-319-10816-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10816-2_6

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10815-5

  • Online ISBN: 978-3-319-10816-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics