Abstract
An unsupervised method for part-of-speech discovery is presented whose aim is to induce a system of word classes by looking at the distributional properties of words in raw text. Our assumption is that the word pair consisting of the left and right neighbors of a particular token is characteristic of the part of speech to be selected at this position. Based on this observation, we cluster all such word pairs according to the patterns of their middle words. This gives us centroid vectors that are useful for the induction of a system of word classes and for the correct classification of ambiguous words.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
CLARK, A. (2003): Combining Distributional and Morphological Information for Part of Speech Induction. Proceedings of 10th EACL, Budapest, 59–66.
FREITAG, D. (2004): Toward Unsupervised Whole-corpus Tagging. Proceedings of 20th COLING, Geneva, 357–363.
RAPP, R. (2006): Part-of-speech Induction by Singular Value Decomposition and Hierarchical Clustering. In: M. Spiliopoulou, R. Kruse, C. Borgelt, A. Nürnberger and W. Gaul (Eds.) From Data and Information Analysis to Knowledge Engineering. Proceedings of the 29th Annual Conference of the GfKl, Magdeburg 2005. Springer, Berlin, 422–429.
SCHÜTZE, H. (1993): Part-of-speech Induction from Scratch. Proceedings of 31st ACL, Columbus, Ohio, 251–258.
SCHÜTZE, H. (1995): Distributional Part-of-speech Tagging. Proceedings of 7th EACL, Dublin, Ireland, 141–148.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rapp, R. (2007). Part-of-Speech Discovery by Clustering Contextual Features. In: Decker, R., Lenz, H.J. (eds) Advances in Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70981-7_72
Download citation
DOI: https://doi.org/10.1007/978-3-540-70981-7_72
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70980-0
Online ISBN: 978-3-540-70981-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)