Multilingual and Language-Agnostic Recognition of Emotions, Valence and Arousal in Large-Scale Multi-domain Text Reviews

Kocoń, Jan; Miłkowski, Piotr; Wierzba, Małgorzata; Konat, Barbara; Klessa, Katarzyna; Janz, Arkadiusz; Riegel, Monika; Juszczyk, Konrad; Grimling, Damian; Marchewka, Artur; Piasecki, Maciej

doi:10.1007/978-3-031-05328-3_14

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13212))

Included in the following conference series:

Language and Technology Conference

352 Accesses
3 Citations

Abstract

In this article we present extended results obtained on the multidomain dataset of Polish text reviews collected within the Sentimenti project. We present preliminary results of classification models trained and tested on 7,000 texts annotated by over 20,000 individuals using valence, arousal, and eight basic emotions from Plutchik’s model. Additionally, we present an extended evaluation using deep neural multilingual models and language-agnostic regressors on the translation of the original collection into 11 languages.

This work was financed by (1) the National Science Centre, Poland, project no. 2019/33/B/HS2/02814; (2) the Polish Ministry of Education and Science, CLARIN-PL; (3) the European Regional Development Fund as a part of the 2014-2020 Smart Growth Operational Programme, CLARIN – Common Language Resources and Technology Infrastructure, project no. POIR.04.02.00-00C002/19; (4) the National Centre for Research and Development, Poland, grant no. POIR.0

1.01.01-00-0472/16 – Sentimenti (https://sentimenti.com).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The largest dictionary of English, Oxford English Dictionary, for example, contains around 600,000 words in its online version https://public.oed.com/about.
2.
www.znanylekarz.pl.
3.
pl.tripadvisor.com.
4.
naukawpolsce.pap.pl/zdrowie.
5.
hotelarstwo.net, www.e-hotelarstwo.com.
6.
https://clarin-pl.eu/dspace/handle/11321/606.
7.
https://www.deepl.com/.
8.
https://osf.io/f79bj.
9.
https://github.com/CLARIN-PL/human-bias.

References

Ameijeiras-Alonso, J., Crujeiras, R.M., Rodríguez-Casal, A.: Mode testing, critical bandwidth and excess mass. TEST 28(3), 900–919 (2018). https://doi.org/10.1007/s11749-018-0611-5
Article MathSciNet MATH Google Scholar
Artetxe, M., Schwenk, H.: Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. Trans. Assoc. Comput. Linguist. 7, 597–610 (2019)
Article Google Scholar
Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in tweets. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 759–760. International World Wide Web Conferences Steering Committee (2017)
Google Scholar
Baziotis, C., Pelekis, N., Doulkeridis, C.: Datastories at SemEval-2017 Task 4: deep LSTM with attention for message-level and topic-based sentiment analysis. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 747–754 (2017)
Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Ekman, P.: An argument for basic emotions. Cogn. Emot. 6(3–4), 169–200 (1992)
Article Google Scholar
Felbo, B., Mislove, A., Søgaard, A., Rahwan, I., Lehmann, S.: Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1615–1625 (2017)
Google Scholar
Habibi, M., Weber, L., Neves, M., Wiegandt, D.L., Leser, U.: Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics 33(14), i37–i48 (2017). https://doi.org/10.1093/bioinformatics/btx228
Article Google Scholar
Hartigan, J.A., Hartigan, P.M., et al.: The dip test of unimodality. Ann. Stat. 13(1), 70–84 (1985)
Article MathSciNet MATH Google Scholar
Hartigan, J.A., Wong, M.A.: Algorithm as 136: a K-means clustering algorithm. J. R. Stat. Soc. Ser. C Appl. Stat. 28(1), 100–108 (1979)
Google Scholar
Hripcsak, G., Rothschild, A.S.: Technical brief: agreement, the F-measure, and reliability in information retrieval. JAMIA 12(3), 296–298 (2005). https://doi.org/10.1197/jamia.M1733
Article Google Scholar
Janz, A., Kocoń, J., Piasecki, M., Zaśko-Zielińska, M.: plWordNet as a basis for large emotive lexicons of Polish. In: LTC’17 8th Language and Technology Conference. Fundacja Uniwersytetu im. Adama Mickiewicza w Poznaniu, Poznań, November 2017
Google Scholar
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp. 427–431. Association for Computational Linguistics, Valencia, April 2017. https://www.aclweb.org/anthology/E17-2068
Kanclerz, K., et al.: Controversy and conformity: from generalized to personalized aggressiveness detection. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 5915–5926. Association for Computational Linguistics, August 2021. https://doi.org/10.18653/v1/2021.acl-long.460
Kitchin, R.: The Data Revolution: Big Data, Open Data, Data Infrastructures and Their consequences. Sage, Thousand Oaks (2014)
Google Scholar
Kocoń, J., Figas, A., Gruza, M., Puchalska, D., Kajdanowicz, T., Kazienko, P.: Offensive, aggressive, and hate speech analysis: from data-centric to human-centered approach. Inf. Process. Manag. 58(5), 102643 (2021)
Google Scholar
Kocoń, J., Janz, A., Piasecki, M.: Classifier-based polarity propagation in a WordNet. In: Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018) (2018)
Google Scholar
Kocoń, J., Janz, A., Piasecki, M.: Context-sensitive sentiment propagation in WordNet. In: Proceedings of the 9th International Global Wordnet Conference (GWC 2018) (2018)
Google Scholar
Kocoń, J., Gawor, M.: Evaluating KGR10 Polish word embeddings in the recognition of temporal expressions using BiLSTM-CRF. CoRR arXiv:1904.04055 (2019)
Kocoń, J., et al.: Learning personal human biases and representations for subjective tasks in natural language processing. In: 2021 IEEE International Conference on Data Mining (ICDM). IEEE (2021)
Google Scholar
Kocoń, J., Marcińczuk, M.: Generating of events dictionaries from polish wordnet for the recognition of events in polish documents. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2016. LNCS (LNAI), vol. 9924, pp. 12–19. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45510-5_2
Chapter Google Scholar
Kutuzov, A., Fares, M., Oepen, S., Velldal, E.: Word vectors, reuse, and replicability: towards a community repository of large-text resources. In: Proceedings of the 58th Conference on Simulation and Modelling, pp. 271–276. Linköping University Electronic Press (2017)
Google Scholar
Ma, Y., Peng, H., Cambria, E.: Targeted aspect-based sentiment analysis via embedding commonsense knowledge into an attentive LSTM. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Marcińczuk, M., Kocoń, J., Gawor, M.: Recognition of named entities for polish-comparison of deep learning and conditional random fields approaches. In: Proceedings of PolEval 2018 Workshop. Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland (2018)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Milkowski, P., Gruza, M., Kanclerz, K., Kazienko, P., Grimling, D., Kocon, J.: Personal bias in prediction of emotions elicited by textual opinions. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop, pp. 248–259. Association for Computational Linguistics, August 2021. https://doi.org/10.18653/v1/2021.acl-srw.26
Neville, Z., Brownstein, N.C.: Macros to conduct tests of multimodality in SAS. J. Stat. Comput. Simul. 88(17), 3269–3290 (2018)
Article MathSciNet MATH Google Scholar
Paolacci, G., Chandler, J.: Inside the Turk: understanding mechanical Turk as a participant pool. Curr. Dir. Psychol. Sci. 23(3), 184–188 (2014)
Article Google Scholar
Piasecki, M., Broda, B., Szpakowicz, S.: A WordNet from the ground up. Oficyna Wydawnicza Politechniki Wrocławskiej Wrocław (2009)
Google Scholar
Piasecki, M., Czachor, G., Janz, A., Kaszewski, D., Kȩdzia, P.: WordNet-based evaluation of large distributional models for Polish. In: Proceedings of the 9th Global WordNet Conference (GWC 2018), pp. 232–241 (2018)
Google Scholar
Plutchik, R.: A psychoevolutionary theory of emotions. Soc. Sci. Inf. 21(4–5), 529–553 (1982). https://doi.org/10.1177/053901882021004003
Article Google Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI Blog, p. 8 (2019)
Google Scholar
Riegel, M., et al.: Nencki Affective Word List (NAWL): the cultural adaptation of the Berlin Affective Word List–Reloaded (BAWL-R) for Polish. Behav. Res. Meth. 47(4), 1222–1236 (2015). https://doi.org/10.3758/s13428-014-0552-1
Article Google Scholar
Russell, J.A., Mehrabian, A.: Evidence for a three-factor theory of emotions. J. Res. Pers. 11(3), 273–294 (1977). https://doi.org/10.1016/0092-6566(77)90037-X
Article Google Scholar
Schnabel, T., Labutov, I., Mimno, D.M., Joachims, T.: Evaluation methods for unsupervised word embeddings. In: Proceedings of Empirical Methods in Natural Language Processing Conference (EMNLP), pp. 298–307 (2015)
Google Scholar
Silverman, B.W.: Using kernel density estimates to investigate multimodality. J. Roy. Stat. Soc. Ser. B (Methodol.) 43(1), 97–99 (1981)
MathSciNet Google Scholar
Tversky, A., Kahneman, D.: Rational choice and the framing of decisions. In: Multiple Criteria Decision Making and Risk Analysis Using Microcomputers, pp. 81–126. Springer, Cham (1989). https://doi.org/10.1007/978-3-642-74919-3_4
Wang, Y., Huang, M., Zhao, L., et al.: Attention-based LSTM for aspect-level sentiment classification. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 606–615 (2016)
Google Scholar
Wierzba, M., et al.: Basic emotions in the Nencki Affective Word List (NAWL BE): new method of classifying emotional stimuli. PLoS ONE 10(7), e0132305 (2015). https://doi.org/10.1371/journal.pone.0132305
Article Google Scholar
Wierzba, M., et al.: Emotion norms for 6,000 Polish word meanings with a direct mapping to the Polish wordnet. Behav. Res. Meth. (2021). https://doi.org/10.3758/s13428-021-01697-0, https://osf.io/f79bj/
Wojatzki, M., Ruppert, E., Holschneider, S., Zesch, T., Biemann, C.: Germeval 2017: shared task on aspect-based sentiment in social media customer feedback. In: Proceedings of the GermEval, pp. 1–12 (2017)
Google Scholar
Zaśko-Zielińska, M., Piasecki, M., Szpakowicz, S.: A large WordNet-based sentiment lexicon for Polish. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, pp. 721–730 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Wroclaw University of Science and Technology, Wrocław, Poland
Jan Kocoń, Piotr Miłkowski, Arkadiusz Janz & Maciej Piasecki
Laboratory of Brain Imaging, Nencki Institute of Experimental Biology, Polish Academy of Sciences, Warsaw, Poland
Małgorzata Wierzba, Monika Riegel & Artur Marchewka
Adam Mickiewicz University, Poznań, Poland
Barbara Konat, Katarzyna Klessa & Konrad Juszczyk
Sentimenti Sp. z o.o., Poznań, Poland
Damian Grimling

Authors

Jan Kocoń
View author publications
You can also search for this author in PubMed Google Scholar
Piotr Miłkowski
View author publications
You can also search for this author in PubMed Google Scholar
Małgorzata Wierzba
View author publications
You can also search for this author in PubMed Google Scholar
Barbara Konat
View author publications
You can also search for this author in PubMed Google Scholar
Katarzyna Klessa
View author publications
You can also search for this author in PubMed Google Scholar
Arkadiusz Janz
View author publications
You can also search for this author in PubMed Google Scholar
Monika Riegel
View author publications
You can also search for this author in PubMed Google Scholar
Konrad Juszczyk
View author publications
You can also search for this author in PubMed Google Scholar
Damian Grimling
View author publications
You can also search for this author in PubMed Google Scholar
Artur Marchewka
View author publications
You can also search for this author in PubMed Google Scholar
Maciej Piasecki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Kocoń .

Editor information

Editors and Affiliations

Adam Mickiewicz University, Poznań, Poland
Zygmunt Vetulani
LIMSI-CNRS, Orsay, France
Patrick Paroubek
Adam Mickiewicz University, Poznań, Poland
Marek Kubis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kocoń, J. et al. (2022). Multilingual and Language-Agnostic Recognition of Emotions, Valence and Arousal in Large-Scale Multi-domain Text Reviews. In: Vetulani, Z., Paroubek, P., Kubis, M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2019. Lecture Notes in Computer Science(), vol 13212. Springer, Cham. https://doi.org/10.1007/978-3-031-05328-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-05328-3_14
Published: 05 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-05327-6
Online ISBN: 978-3-031-05328-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multilingual and Language-Agnostic Recognition of Emotions, Valence and Arousal in Large-Scale Multi-domain Text Reviews