Fine-Grained POS Tagging of German Tweets

Rehbein, Ines

doi:10.1007/978-3-642-40722-2_17

Ines Rehbein²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8105))

1339 Accesses
3 Citations
2 Altmetric

Abstract

This paper presents the first work on POS tagging German Twitter data, showing that despite the noisy and often cryptic nature of the data a fine-grained analysis of POS tags on Twitter microtext is feasible. Our CRF-based tagger achieves an accuracy of around 89% when trained on LDA word clusters, features from an automatically created dictionary and additional out-of-domain training data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Foster, J.: ”cba to check the spelling” investigating parser performance on discussion forum posts. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT 2010, pp. 381–384. Association for Computational Linguistics, Stroudsburg (2010)
Google Scholar
Foster, J., Wagner, J., Roux, J.L., Hogan, S., Nivre, J., Hogan, D., Genabith, J.V.: #hardtoparse: POS tagging and parsing the twitterverse. In: Proceedings of AAAI 2011 Workshop on Analysing Microtext (2011)
Google Scholar
Gimpel, K., Schneider, N., O’Connor, B., Das, D., Mills, D., Eisenstein, J., Heilman, M., Yogatama, D., Flanigan, J., Smith, N.A.: Part-of-speech tagging for twitter: annotation, features, and experiments. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers, HLT 2011, vol. 2, pp. 42–47. Association for Computational Linguistics, Stroudsburg (2011)
Google Scholar
Owoputi, O., O’Connor, B., Dyer, C., Gimpel, K., Schneider, N., Smith, N.A.: Improved part-of-speech tagging for online conversational text with word clusters. In: Proceedings of the North American Chapter of the Association for Computational Linguistics Annual Meeting (2013)
Google Scholar
Ritter, A., Clark, S., Mausam, E.O.: Named entity recognition in tweets: an experimental study. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, pp. 1524–1534. Association for Computational Linguistics, Stroudsburg (2011)
Google Scholar
Schiller, A., Teufel, S., Thielen, C.: Guidelines für das Tagging deutscher Textcorpora mit STTS. Technical report, IMS-CL. University Stuttgart, Germany (1995)
Google Scholar
Teuber, O.: Fasel beschreib erwähn – Der Inflektiv als Wortform des Deutschen. Germanistische Linguistik 26(6), 141–142 (1998)
Google Scholar
Rehbein, I., Schalowski, S.: Extending the STTS for the annotation of spoken language. In: Proceedings of KONVENS 2012, pp. 238–242 (2012)
Google Scholar
Beißwenger, M., Ermakova, M., Geyken, A., Lemnitzer, L., Storrer, A.: A TEI schema for the representation of computer-mediated communication. Journal of the Text Encoding Initiative (3), 1–31 (2012)
Google Scholar
Okazaki, N.: CRFsuite: a fast implementation of conditional random fields, CRFs (2007)
Google Scholar
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, pp. 282–289. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Google Scholar
Biemann, C.: Unsupervised part-of-speech tagging employing efficient graph clustering. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, COLING ACL 2006, pp. 7–12. Association for Computational Linguistics, Stroudsburg (2006)
Google Scholar
Søgaard, A.: Simple semi-supervised training of part-of-speech taggers. In: Proceedings of the ACL 2010 Conference Short Papers, ACLShort 2010, pp. 205–208. Association for Computational Linguistics, Stroudsburg (2010)
Google Scholar
Chrupala, G.: Efficient induction of probabilistic word classes with LDA. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 363–372. Asian Federation of Natural Language Processing, Chiang Mai (November 2011)
Google Scholar
Owoputi, O., O’Connor, B., Dyer, C., Gimpel, K., Schneider, N.: Part-of-speech tagging for twitter: Word clusters and other advances. Technical Report CMU-ML-12-107. Carnegie Mellon University (2012)
Google Scholar
Chrupała, G.: Hierarchical clustering of word class distributions. In: Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure, Montréal, Canada. Association for Computational Linguistics, pp. 100–104 (June 2012)
Google Scholar
Brants, S., Dipper, S., Hansen, S., Lezius, W., Smith, G.: The TIGER treebank. In: Proceedings of the First Workshop on Treebanks and Linguistic Theories, pp. 24–42 (2002)
Google Scholar
Fitschen, A.: Ein computerlinguistisches Lexikon als komplexes System. PhD thesis, Institut für Maschinelle Sprachverarbeitung der Universität Stuttgart (2004)
Google Scholar
Schmid, H.: Improvements in part-of-speech tagging with an application to German. In: Proceedings of the ACL SIGDAT-Workshop (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

SFB 632 “Information Structure”, German Departement, Potsdam University, Germany
Ines Rehbein

Authors

Ines Rehbein
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Technical University Darmstadt, 64289 Darmstadt, Germany, and German Institute for International Education Research,, 60486, Frankfurt, Germany
Iryna Gurevych
Technical University Darmstadt, 64289, Darmstadt, Germany
Chris Biemann
Technical University Darmstadt, 64289 Darmsadt, and German Institute for International Educational Research, 60486, Frankfurt, Germany
Torsten Zesch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rehbein, I. (2013). Fine-Grained POS Tagging of German Tweets. In: Gurevych, I., Biemann, C., Zesch, T. (eds) Language Processing and Knowledge in the Web. Lecture Notes in Computer Science(), vol 8105. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40722-2_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-40722-2_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40721-5
Online ISBN: 978-3-642-40722-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics