Abstract
In this article we present extended results obtained on the multidomain dataset of Polish text reviews collected within the Sentimenti project. We present preliminary results of classification models trained and tested on 7,000 texts annotated by over 20,000 individuals using valence, arousal, and eight basic emotions from Plutchik’s model. Additionally, we present an extended evaluation using deep neural multilingual models and language-agnostic regressors on the translation of the original collection into 11 languages.
This work was financed by (1) the National Science Centre, Poland, project no. 2019/33/B/HS2/02814; (2) the Polish Ministry of Education and Science, CLARIN-PL; (3) the European Regional Development Fund as a part of the 2014-2020 Smart Growth Operational Programme, CLARIN – Common Language Resources and Technology Infrastructure, project no. POIR.04.02.00-00C002/19; (4) the National Centre for Research and Development, Poland, grant no. POIR.0
1.01.01-00-0472/16 – Sentimenti (https://sentimenti.com).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The largest dictionary of English, Oxford English Dictionary, for example, contains around 600,000 words in its online version https://public.oed.com/about.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
References
Ameijeiras-Alonso, J., Crujeiras, R.M., Rodríguez-Casal, A.: Mode testing, critical bandwidth and excess mass. TEST 28(3), 900–919 (2018). https://doi.org/10.1007/s11749-018-0611-5
Artetxe, M., Schwenk, H.: Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. Trans. Assoc. Comput. Linguist. 7, 597–610 (2019)
Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in tweets. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 759–760. International World Wide Web Conferences Steering Committee (2017)
Baziotis, C., Pelekis, N., Doulkeridis, C.: Datastories at SemEval-2017 Task 4: deep LSTM with attention for message-level and topic-based sentiment analysis. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 747–754 (2017)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Ekman, P.: An argument for basic emotions. Cogn. Emot. 6(3–4), 169–200 (1992)
Felbo, B., Mislove, A., Søgaard, A., Rahwan, I., Lehmann, S.: Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1615–1625 (2017)
Habibi, M., Weber, L., Neves, M., Wiegandt, D.L., Leser, U.: Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics 33(14), i37–i48 (2017). https://doi.org/10.1093/bioinformatics/btx228
Hartigan, J.A., Hartigan, P.M., et al.: The dip test of unimodality. Ann. Stat. 13(1), 70–84 (1985)
Hartigan, J.A., Wong, M.A.: Algorithm as 136: a K-means clustering algorithm. J. R. Stat. Soc. Ser. C Appl. Stat. 28(1), 100–108 (1979)
Hripcsak, G., Rothschild, A.S.: Technical brief: agreement, the F-measure, and reliability in information retrieval. JAMIA 12(3), 296–298 (2005). https://doi.org/10.1197/jamia.M1733
Janz, A., Kocoń, J., Piasecki, M., Zaśko-Zielińska, M.: plWordNet as a basis for large emotive lexicons of Polish. In: LTC’17 8th Language and Technology Conference. Fundacja Uniwersytetu im. Adama Mickiewicza w Poznaniu, Poznań, November 2017
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp. 427–431. Association for Computational Linguistics, Valencia, April 2017. https://www.aclweb.org/anthology/E17-2068
Kanclerz, K., et al.: Controversy and conformity: from generalized to personalized aggressiveness detection. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 5915–5926. Association for Computational Linguistics, August 2021. https://doi.org/10.18653/v1/2021.acl-long.460
Kitchin, R.: The Data Revolution: Big Data, Open Data, Data Infrastructures and Their consequences. Sage, Thousand Oaks (2014)
Kocoń, J., Figas, A., Gruza, M., Puchalska, D., Kajdanowicz, T., Kazienko, P.: Offensive, aggressive, and hate speech analysis: from data-centric to human-centered approach. Inf. Process. Manag. 58(5), 102643 (2021)
Kocoń, J., Janz, A., Piasecki, M.: Classifier-based polarity propagation in a WordNet. In: Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018) (2018)
Kocoń, J., Janz, A., Piasecki, M.: Context-sensitive sentiment propagation in WordNet. In: Proceedings of the 9th International Global Wordnet Conference (GWC 2018) (2018)
Kocoń, J., Gawor, M.: Evaluating KGR10 Polish word embeddings in the recognition of temporal expressions using BiLSTM-CRF. CoRR arXiv:1904.04055 (2019)
Kocoń, J., et al.: Learning personal human biases and representations for subjective tasks in natural language processing. In: 2021 IEEE International Conference on Data Mining (ICDM). IEEE (2021)
Kocoń, J., Marcińczuk, M.: Generating of events dictionaries from polish wordnet for the recognition of events in polish documents. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2016. LNCS (LNAI), vol. 9924, pp. 12–19. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45510-5_2
Kutuzov, A., Fares, M., Oepen, S., Velldal, E.: Word vectors, reuse, and replicability: towards a community repository of large-text resources. In: Proceedings of the 58th Conference on Simulation and Modelling, pp. 271–276. Linköping University Electronic Press (2017)
Ma, Y., Peng, H., Cambria, E.: Targeted aspect-based sentiment analysis via embedding commonsense knowledge into an attentive LSTM. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Marcińczuk, M., Kocoń, J., Gawor, M.: Recognition of named entities for polish-comparison of deep learning and conditional random fields approaches. In: Proceedings of PolEval 2018 Workshop. Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland (2018)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Milkowski, P., Gruza, M., Kanclerz, K., Kazienko, P., Grimling, D., Kocon, J.: Personal bias in prediction of emotions elicited by textual opinions. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop, pp. 248–259. Association for Computational Linguistics, August 2021. https://doi.org/10.18653/v1/2021.acl-srw.26
Neville, Z., Brownstein, N.C.: Macros to conduct tests of multimodality in SAS. J. Stat. Comput. Simul. 88(17), 3269–3290 (2018)
Paolacci, G., Chandler, J.: Inside the Turk: understanding mechanical Turk as a participant pool. Curr. Dir. Psychol. Sci. 23(3), 184–188 (2014)
Piasecki, M., Broda, B., Szpakowicz, S.: A WordNet from the ground up. Oficyna Wydawnicza Politechniki Wrocławskiej Wrocław (2009)
Piasecki, M., Czachor, G., Janz, A., Kaszewski, D., Kȩdzia, P.: WordNet-based evaluation of large distributional models for Polish. In: Proceedings of the 9th Global WordNet Conference (GWC 2018), pp. 232–241 (2018)
Plutchik, R.: A psychoevolutionary theory of emotions. Soc. Sci. Inf. 21(4–5), 529–553 (1982). https://doi.org/10.1177/053901882021004003
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI Blog, p. 8 (2019)
Riegel, M., et al.: Nencki Affective Word List (NAWL): the cultural adaptation of the Berlin Affective Word List–Reloaded (BAWL-R) for Polish. Behav. Res. Meth. 47(4), 1222–1236 (2015). https://doi.org/10.3758/s13428-014-0552-1
Russell, J.A., Mehrabian, A.: Evidence for a three-factor theory of emotions. J. Res. Pers. 11(3), 273–294 (1977). https://doi.org/10.1016/0092-6566(77)90037-X
Schnabel, T., Labutov, I., Mimno, D.M., Joachims, T.: Evaluation methods for unsupervised word embeddings. In: Proceedings of Empirical Methods in Natural Language Processing Conference (EMNLP), pp. 298–307 (2015)
Silverman, B.W.: Using kernel density estimates to investigate multimodality. J. Roy. Stat. Soc. Ser. B (Methodol.) 43(1), 97–99 (1981)
Tversky, A., Kahneman, D.: Rational choice and the framing of decisions. In: Multiple Criteria Decision Making and Risk Analysis Using Microcomputers, pp. 81–126. Springer, Cham (1989). https://doi.org/10.1007/978-3-642-74919-3_4
Wang, Y., Huang, M., Zhao, L., et al.: Attention-based LSTM for aspect-level sentiment classification. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 606–615 (2016)
Wierzba, M., et al.: Basic emotions in the Nencki Affective Word List (NAWL BE): new method of classifying emotional stimuli. PLoS ONE 10(7), e0132305 (2015). https://doi.org/10.1371/journal.pone.0132305
Wierzba, M., et al.: Emotion norms for 6,000 Polish word meanings with a direct mapping to the Polish wordnet. Behav. Res. Meth. (2021). https://doi.org/10.3758/s13428-021-01697-0, https://osf.io/f79bj/
Wojatzki, M., Ruppert, E., Holschneider, S., Zesch, T., Biemann, C.: Germeval 2017: shared task on aspect-based sentiment in social media customer feedback. In: Proceedings of the GermEval, pp. 1–12 (2017)
Zaśko-Zielińska, M., Piasecki, M., Szpakowicz, S.: A large WordNet-based sentiment lexicon for Polish. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, pp. 721–730 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Kocoń, J. et al. (2022). Multilingual and Language-Agnostic Recognition of Emotions, Valence and Arousal in Large-Scale Multi-domain Text Reviews. In: Vetulani, Z., Paroubek, P., Kubis, M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2019. Lecture Notes in Computer Science(), vol 13212. Springer, Cham. https://doi.org/10.1007/978-3-031-05328-3_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-05328-3_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-05327-6
Online ISBN: 978-3-031-05328-3
eBook Packages: Computer ScienceComputer Science (R0)