Skip to main content

Towards the Automatic Sentiment Analysis of German News and Forum Documents

  • Conference paper
  • First Online:
Book cover Innovations for Community Services (I4CS 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 717))

Included in the following conference series:

Abstract

The fully automated sentiment analysis on large text collections is an important task in many applications scenarios. The sentiment analysis is a challenging task due to the domain-specific language style and the variety of sentiment indicators. The basis for learning powerful sentiment classifiers are annotated datasets, but for many domains and especially with non-English texts hardly any datasets exist. In order to support the development of sentiment classifiers, we have created two corpora: The first corpus is build based on German news articles. Although news articles should be objective, they often excite subjective emotions. The second corpus consists of annotated messages from a German telecommunication forum. In this paper we describe the process of creating the corpora and discuss our approach for tracing sentiment values, defining clear rules for assigning sentiments scores. Given the corpora we train classifiers that yields good classification results and establish valuable baselines for sentiment analysis. We compare the learned classification strategies and discuss how the approaches can be transferred to new scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Keyword/phase set: abgespeist, abzocken, ärgere, ärgerlich, arsch, beschwerde, blöd, desaster, die nase voll, dumm, dümmer, enttäuscht, ex-kunde, frechheit, frustriert, grottig, hinhalten, hohn, idiot, kann doch nicht so schwer sein, katastrophe, minderwertigste, nervt, nicht kapiert, opfer, rausnehmen, reklamiert, schämen, schnauze, scheiss, schlimmer, schuld, teufel, unfassbar, unzufrieden, verärgert, vergebens, versagt, verschlimmbesserung, verschonen, verschont, vertrösten, verzweifeln, wird mir schlecht.

  2. 2.

    c.f. https://lucene.apache.org/core/6_2_0/analyzers-common.

References

  1. Ali, T., Schramm, D., Sokolova, M., Inkpen, D.: Can I hear you? Sentiment analysis on medical forums. In: Proceedings of the International Joint Conference on Natural Language Processing 2013, pp. 667–673. ACL (2013)

    Google Scholar 

  2. Balahur, A., Steinberger, R.: Rethinking sentiment analysis in the news: from theory to practice and back. In: Proceeding of WOMSA, vol. 9 (2009)

    Google Scholar 

  3. Bosco, C., Patti, V., Bolioli, A.: Developing corpora for sentiment analysis: the case of irony and Senti-TUT. IEEE Intell. Syst. 28(2), 55–63 (2013)

    Article  Google Scholar 

  4. Bütow, F., Schultze, F., Strauch, L., Ploch, D., Lommatzsch, A.: Sentiment analysis with machine learning algorithms on German news articles. Project report, Berlin Institute of Technology, AOT (2015). http://www.dai-labor.de/publikationen/1052

  5. Clematide, S., Gindl, S., Klenner, M., Petrakis, S., Remus, R., Ruppenhofer, J., Waltinger, U., Wiegand, M.: MLSA-A multi-layered reference corpus for German sentiment analysis. In: LREC, pp. 3551–3556 (2012)

    Google Scholar 

  6. Boland, K., Wira-Alam, A., Messerschmidt, R.: Creating an annotated corpus for sentiment analysis of German product reviews. Monograph, GESIS - Leibniz-Institut für Sozialwissenschaften (2013). http://www.ssoar.info/ssoar/bitstream/handle/document/33939/ssoar-2013-boland_et_al-Creating_an_Annotated_Corpus_for.pdf

  7. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics (2002)

    Google Scholar 

  8. Remus, R., Quasthoff, U., Heyer, G.: SentiWS - a publicly available German-language resource for sentiment analysis. In: Chair, N.C.C., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010). European Language Resources Association (ELRA), Valletta, Malta (2010)

    Google Scholar 

  9. Rennie, J.D., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumptions of naive Bayes text classifiers. In: Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), vol. 3, pp. 616–623 (2003)

    Google Scholar 

  10. Scholz, T., Conrad, S., Hillekamps, L.: Opinion mining on a German corpus of a media response analysis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 39–46. Springer, Heidelberg (2012). doi:10.1007/978-3-642-32790-2_4

    Chapter  Google Scholar 

  11. University of Waikato: Weka 3 - Data Mining with Open Source Machine Learning Software in Java. http://www.cs.waikato.ac.nz/ml/weka

Download references

Acknowledgment

This work was supported in part by the German Federal Ministry of Education and Research (BMBF) under the grant number 01IS16046.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas Lommatzsch .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Lommatzsch, A., Bütow, F., Ploch, D., Albayrak, S. (2017). Towards the Automatic Sentiment Analysis of German News and Forum Documents. In: Eichler, G., Erfurth, C., Fahrnberger, G. (eds) Innovations for Community Services. I4CS 2017. Communications in Computer and Information Science, vol 717. Springer, Cham. https://doi.org/10.1007/978-3-319-60447-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-60447-3_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-60446-6

  • Online ISBN: 978-3-319-60447-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics