Part-of-Speech Discovery by Clustering Contextual Features

Rapp, Reinhard

doi:10.1007/978-3-540-70981-7_72

Reinhard Rapp³

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

3771 Accesses
1 Citations

Abstract

An unsupervised method for part-of-speech discovery is presented whose aim is to induce a system of word classes by looking at the distributional properties of words in raw text. Our assumption is that the word pair consisting of the left and right neighbors of a particular token is characteristic of the part of speech to be selected at this position. Based on this observation, we cluster all such word pairs according to the patterns of their middle words. This gives us centroid vectors that are useful for the induction of a system of word classes and for the correct classification of ambiguous words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

CLARK, A. (2003): Combining Distributional and Morphological Information for Part of Speech Induction. Proceedings of 10th EACL, Budapest, 59–66.
Google Scholar
FREITAG, D. (2004): Toward Unsupervised Whole-corpus Tagging. Proceedings of 20th COLING, Geneva, 357–363.
Google Scholar
RAPP, R. (2006): Part-of-speech Induction by Singular Value Decomposition and Hierarchical Clustering. In: M. Spiliopoulou, R. Kruse, C. Borgelt, A. Nürnberger and W. Gaul (Eds.) From Data and Information Analysis to Knowledge Engineering. Proceedings of the 29th Annual Conference of the GfKl, Magdeburg 2005. Springer, Berlin, 422–429.
Chapter Google Scholar
SCHÜTZE, H. (1993): Part-of-speech Induction from Scratch. Proceedings of 31st ACL, Columbus, Ohio, 251–258.
Google Scholar
SCHÜTZE, H. (1995): Distributional Part-of-speech Tagging. Proceedings of 7th EACL, Dublin, Ireland, 141–148.
Google Scholar

Download references

Author information

Authors and Affiliations

GRLMC, Universitat Rovira i Virgili, Tarragona, Spain
Reinhard Rapp

Authors

Reinhard Rapp
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Business Administration and Economics, Bielefeld University, Universitätsstr. 25, 33501, Bielefeld, Germany
Reinhold Decker
Department of Economics, Freie Universität Berlin, Garystraße 21, 14195, Berlin, Germany
Hans -J. Lenz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rapp, R. (2007). Part-of-Speech Discovery by Clustering Contextual Features. In: Decker, R., Lenz, H.J. (eds) Advances in Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70981-7_72

Download citation

DOI: https://doi.org/10.1007/978-3-540-70981-7_72
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70980-0
Online ISBN: 978-3-540-70981-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics