Skip to main content

Multilingual and Language-Agnostic Recognition of Emotions, Valence and Arousal in Large-Scale Multi-domain Text Reviews

  • Conference paper
  • First Online:
Human Language Technology. Challenges for Computer Science and Linguistics (LTC 2019)

Abstract

In this article we present extended results obtained on the multidomain dataset of Polish text reviews collected within the Sentimenti project. We present preliminary results of classification models trained and tested on 7,000 texts annotated by over 20,000 individuals using valence, arousal, and eight basic emotions from Plutchik’s model. Additionally, we present an extended evaluation using deep neural multilingual models and language-agnostic regressors on the translation of the original collection into 11 languages.

This work was financed by (1) the National Science Centre, Poland, project no. 2019/33/B/HS2/02814; (2) the Polish Ministry of Education and Science, CLARIN-PL; (3) the European Regional Development Fund as a part of the 2014-2020 Smart Growth Operational Programme, CLARIN – Common Language Resources and Technology Infrastructure, project no. POIR.04.02.00-00C002/19; (4) the National Centre for Research and Development, Poland, grant no. POIR.0

1.01.01-00-0472/16 – Sentimenti (https://sentimenti.com).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The largest dictionary of English, Oxford English Dictionary, for example, contains around 600,000 words in its online version https://public.oed.com/about.

  2. 2.

    www.znanylekarz.pl.

  3. 3.

    pl.tripadvisor.com.

  4. 4.

    naukawpolsce.pap.pl/zdrowie.

  5. 5.

    hotelarstwo.net, www.e-hotelarstwo.com.

  6. 6.

    https://clarin-pl.eu/dspace/handle/11321/606.

  7. 7.

    https://www.deepl.com/.

  8. 8.

    https://osf.io/f79bj.

  9. 9.

    https://github.com/CLARIN-PL/human-bias.

References

  1. Ameijeiras-Alonso, J., Crujeiras, R.M., Rodríguez-Casal, A.: Mode testing, critical bandwidth and excess mass. TEST 28(3), 900–919 (2018). https://doi.org/10.1007/s11749-018-0611-5

    Article  MathSciNet  MATH  Google Scholar 

  2. Artetxe, M., Schwenk, H.: Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. Trans. Assoc. Comput. Linguist. 7, 597–610 (2019)

    Article  Google Scholar 

  3. Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in tweets. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 759–760. International World Wide Web Conferences Steering Committee (2017)

    Google Scholar 

  4. Baziotis, C., Pelekis, N., Doulkeridis, C.: Datastories at SemEval-2017 Task 4: deep LSTM with attention for message-level and topic-based sentiment analysis. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 747–754 (2017)

    Google Scholar 

  5. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)

    Article  Google Scholar 

  6. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  7. Ekman, P.: An argument for basic emotions. Cogn. Emot. 6(3–4), 169–200 (1992)

    Article  Google Scholar 

  8. Felbo, B., Mislove, A., Søgaard, A., Rahwan, I., Lehmann, S.: Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1615–1625 (2017)

    Google Scholar 

  9. Habibi, M., Weber, L., Neves, M., Wiegandt, D.L., Leser, U.: Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics 33(14), i37–i48 (2017). https://doi.org/10.1093/bioinformatics/btx228

    Article  Google Scholar 

  10. Hartigan, J.A., Hartigan, P.M., et al.: The dip test of unimodality. Ann. Stat. 13(1), 70–84 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  11. Hartigan, J.A., Wong, M.A.: Algorithm as 136: a K-means clustering algorithm. J. R. Stat. Soc. Ser. C Appl. Stat. 28(1), 100–108 (1979)

    Google Scholar 

  12. Hripcsak, G., Rothschild, A.S.: Technical brief: agreement, the F-measure, and reliability in information retrieval. JAMIA 12(3), 296–298 (2005). https://doi.org/10.1197/jamia.M1733

    Article  Google Scholar 

  13. Janz, A., Kocoń, J., Piasecki, M., Zaśko-Zielińska, M.: plWordNet as a basis for large emotive lexicons of Polish. In: LTC’17 8th Language and Technology Conference. Fundacja Uniwersytetu im. Adama Mickiewicza w Poznaniu, Poznań, November 2017

    Google Scholar 

  14. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp. 427–431. Association for Computational Linguistics, Valencia, April 2017. https://www.aclweb.org/anthology/E17-2068

  15. Kanclerz, K., et al.: Controversy and conformity: from generalized to personalized aggressiveness detection. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 5915–5926. Association for Computational Linguistics, August 2021. https://doi.org/10.18653/v1/2021.acl-long.460

  16. Kitchin, R.: The Data Revolution: Big Data, Open Data, Data Infrastructures and Their consequences. Sage, Thousand Oaks (2014)

    Google Scholar 

  17. Kocoń, J., Figas, A., Gruza, M., Puchalska, D., Kajdanowicz, T., Kazienko, P.: Offensive, aggressive, and hate speech analysis: from data-centric to human-centered approach. Inf. Process. Manag. 58(5), 102643 (2021)

    Google Scholar 

  18. Kocoń, J., Janz, A., Piasecki, M.: Classifier-based polarity propagation in a WordNet. In: Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018) (2018)

    Google Scholar 

  19. Kocoń, J., Janz, A., Piasecki, M.: Context-sensitive sentiment propagation in WordNet. In: Proceedings of the 9th International Global Wordnet Conference (GWC 2018) (2018)

    Google Scholar 

  20. Kocoń, J., Gawor, M.: Evaluating KGR10 Polish word embeddings in the recognition of temporal expressions using BiLSTM-CRF. CoRR arXiv:1904.04055 (2019)

  21. Kocoń, J., et al.: Learning personal human biases and representations for subjective tasks in natural language processing. In: 2021 IEEE International Conference on Data Mining (ICDM). IEEE (2021)

    Google Scholar 

  22. Kocoń, J., Marcińczuk, M.: Generating of events dictionaries from polish wordnet for the recognition of events in polish documents. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2016. LNCS (LNAI), vol. 9924, pp. 12–19. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45510-5_2

    Chapter  Google Scholar 

  23. Kutuzov, A., Fares, M., Oepen, S., Velldal, E.: Word vectors, reuse, and replicability: towards a community repository of large-text resources. In: Proceedings of the 58th Conference on Simulation and Modelling, pp. 271–276. Linköping University Electronic Press (2017)

    Google Scholar 

  24. Ma, Y., Peng, H., Cambria, E.: Targeted aspect-based sentiment analysis via embedding commonsense knowledge into an attentive LSTM. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  25. Marcińczuk, M., Kocoń, J., Gawor, M.: Recognition of named entities for polish-comparison of deep learning and conditional random fields approaches. In: Proceedings of PolEval 2018 Workshop. Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland (2018)

    Google Scholar 

  26. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  27. Milkowski, P., Gruza, M., Kanclerz, K., Kazienko, P., Grimling, D., Kocon, J.: Personal bias in prediction of emotions elicited by textual opinions. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop, pp. 248–259. Association for Computational Linguistics, August 2021. https://doi.org/10.18653/v1/2021.acl-srw.26

  28. Neville, Z., Brownstein, N.C.: Macros to conduct tests of multimodality in SAS. J. Stat. Comput. Simul. 88(17), 3269–3290 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  29. Paolacci, G., Chandler, J.: Inside the Turk: understanding mechanical Turk as a participant pool. Curr. Dir. Psychol. Sci. 23(3), 184–188 (2014)

    Article  Google Scholar 

  30. Piasecki, M., Broda, B., Szpakowicz, S.: A WordNet from the ground up. Oficyna Wydawnicza Politechniki Wrocławskiej Wrocław (2009)

    Google Scholar 

  31. Piasecki, M., Czachor, G., Janz, A., Kaszewski, D., Kȩdzia, P.: WordNet-based evaluation of large distributional models for Polish. In: Proceedings of the 9th Global WordNet Conference (GWC 2018), pp. 232–241 (2018)

    Google Scholar 

  32. Plutchik, R.: A psychoevolutionary theory of emotions. Soc. Sci. Inf. 21(4–5), 529–553 (1982). https://doi.org/10.1177/053901882021004003

    Article  Google Scholar 

  33. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI Blog, p. 8 (2019)

    Google Scholar 

  34. Riegel, M., et al.: Nencki Affective Word List (NAWL): the cultural adaptation of the Berlin Affective Word List–Reloaded (BAWL-R) for Polish. Behav. Res. Meth. 47(4), 1222–1236 (2015). https://doi.org/10.3758/s13428-014-0552-1

    Article  Google Scholar 

  35. Russell, J.A., Mehrabian, A.: Evidence for a three-factor theory of emotions. J. Res. Pers. 11(3), 273–294 (1977). https://doi.org/10.1016/0092-6566(77)90037-X

    Article  Google Scholar 

  36. Schnabel, T., Labutov, I., Mimno, D.M., Joachims, T.: Evaluation methods for unsupervised word embeddings. In: Proceedings of Empirical Methods in Natural Language Processing Conference (EMNLP), pp. 298–307 (2015)

    Google Scholar 

  37. Silverman, B.W.: Using kernel density estimates to investigate multimodality. J. Roy. Stat. Soc. Ser. B (Methodol.) 43(1), 97–99 (1981)

    MathSciNet  Google Scholar 

  38. Tversky, A., Kahneman, D.: Rational choice and the framing of decisions. In: Multiple Criteria Decision Making and Risk Analysis Using Microcomputers, pp. 81–126. Springer, Cham (1989). https://doi.org/10.1007/978-3-642-74919-3_4

  39. Wang, Y., Huang, M., Zhao, L., et al.: Attention-based LSTM for aspect-level sentiment classification. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 606–615 (2016)

    Google Scholar 

  40. Wierzba, M., et al.: Basic emotions in the Nencki Affective Word List (NAWL BE): new method of classifying emotional stimuli. PLoS ONE 10(7), e0132305 (2015). https://doi.org/10.1371/journal.pone.0132305

    Article  Google Scholar 

  41. Wierzba, M., et al.: Emotion norms for 6,000 Polish word meanings with a direct mapping to the Polish wordnet. Behav. Res. Meth. (2021). https://doi.org/10.3758/s13428-021-01697-0, https://osf.io/f79bj/

  42. Wojatzki, M., Ruppert, E., Holschneider, S., Zesch, T., Biemann, C.: Germeval 2017: shared task on aspect-based sentiment in social media customer feedback. In: Proceedings of the GermEval, pp. 1–12 (2017)

    Google Scholar 

  43. Zaśko-Zielińska, M., Piasecki, M., Szpakowicz, S.: A large WordNet-based sentiment lexicon for Polish. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, pp. 721–730 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan Kocoń .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kocoń, J. et al. (2022). Multilingual and Language-Agnostic Recognition of Emotions, Valence and Arousal in Large-Scale Multi-domain Text Reviews. In: Vetulani, Z., Paroubek, P., Kubis, M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2019. Lecture Notes in Computer Science(), vol 13212. Springer, Cham. https://doi.org/10.1007/978-3-031-05328-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-05328-3_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-05327-6

  • Online ISBN: 978-3-031-05328-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics