Skip to main content

Part-of-Speech Discovery by Clustering Contextual Features

  • Conference paper
Advances in Data Analysis

Abstract

An unsupervised method for part-of-speech discovery is presented whose aim is to induce a system of word classes by looking at the distributional properties of words in raw text. Our assumption is that the word pair consisting of the left and right neighbors of a particular token is characteristic of the part of speech to be selected at this position. Based on this observation, we cluster all such word pairs according to the patterns of their middle words. This gives us centroid vectors that are useful for the induction of a system of word classes and for the correct classification of ambiguous words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • CLARK, A. (2003): Combining Distributional and Morphological Information for Part of Speech Induction. Proceedings of 10th EACL, Budapest, 59–66.

    Google Scholar 

  • FREITAG, D. (2004): Toward Unsupervised Whole-corpus Tagging. Proceedings of 20th COLING, Geneva, 357–363.

    Google Scholar 

  • RAPP, R. (2006): Part-of-speech Induction by Singular Value Decomposition and Hierarchical Clustering. In: M. Spiliopoulou, R. Kruse, C. Borgelt, A. Nürnberger and W. Gaul (Eds.) From Data and Information Analysis to Knowledge Engineering. Proceedings of the 29th Annual Conference of the GfKl, Magdeburg 2005. Springer, Berlin, 422–429.

    Chapter  Google Scholar 

  • SCHÃœTZE, H. (1993): Part-of-speech Induction from Scratch. Proceedings of 31st ACL, Columbus, Ohio, 251–258.

    Google Scholar 

  • SCHÃœTZE, H. (1995): Distributional Part-of-speech Tagging. Proceedings of 7th EACL, Dublin, Ireland, 141–148.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rapp, R. (2007). Part-of-Speech Discovery by Clustering Contextual Features. In: Decker, R., Lenz, H.J. (eds) Advances in Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70981-7_72

Download citation

Publish with us

Policies and ethics