MultiEmo: Multilingual, Multilevel, Multidomain Sentiment Analysis Corpus of Consumer Reviews

Kocoń, Jan; Miłkowski, Piotr; Kanclerz, Kamil

doi:10.1007/978-3-030-77964-1_24

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12743))

Included in the following conference series:

International Conference on Computational Science

1631 Accesses
10 Citations

Abstract

This article presents MultiEmo, a new benchmark data set for the multilingual sentiment analysis task including 11 languages. The collection contains consumer reviews from four domains: medicine, hotels, products and university. The original reviews in Polish contained 8,216 documents consisting of 57,466 sentences. The reviews were manually annotated with sentiment at the level of the whole document and at the level of a sentence (3 annotators per element). We achieved a high Positive Specific Agreement value of 0.91 for texts and 0.88 for sentences. The collection was then translated automatically into English, Chinese, Italian, Japanese, Russian, German, Spanish, French, Dutch and Portuguese. MultiEmo is publicly available under the MIT Licence. We present the results of the evaluation using the latest cross-lingual deep learning models such as XLM-RoBERTa, MultiFiT and LASER+BiLSTM. We have taken into account 3 aspects in the context of comparing the quality of the models: multilingualism, multilevel and multidomain knowledge transfer ability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Al-Moslmi, T., Omar, N., Abdullah, S., Albared, M.: Approaches to cross-domain sentiment analysis: a systematic literature review. IEEE Access 5, 16173–16192 (2017)
Article Google Scholar
Artetxe, M., Schwenk, H.: Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. Trans. Assoc. Comput. Linguist. 7, 597–610 (2019)
Article Google Scholar
Bradbury, J., Merity, S., Xiong, C., Socher, R.: Quasi-recurrent neural networks. arXiv preprint arXiv:1611.01576 (2016)
Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8440–8451. Association for Computational Linguistics, July 2020. https://doi.org/10.18653/v1/2020.acl-main.747
Dadas, S., Perełkiewicz, M., Poświata, R.: Evaluation of sentence representations in polish. arXiv preprint arXiv:1910.11834 (2019)
Day, M.Y., Lin, Y.D.: Deep learning for sentiment analysis on google play consumer review. In: 2017 IEEE international conference on information reuse and integration (IRI), pp. 382–388. IEEE (2017)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019)
Google Scholar
Eisenschlos, J., Ruder, S., Czapla, P., Kadras, M., Gugger, S., Howard, J.: Multifit: efficient multi-lingual language model fine-tuning. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 5706–5711 (2019)
Google Scholar
Galeshchuk, S., Qiu, J., Jourdan, J.: Sentiment analysis for multilingual corpora. In: Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, Florence, Italy, pp. 120–125. Association for Computational Linguistics, August 2019. https://doi.org/10.18653/v1/W19-3717
Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: a deep learning approach. In: Proceedings of the 28th International Conference on Machine Learning (ICML 2011), pp. 513–520 (2011)
Google Scholar
He, R., McAuley, J.: Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In: proceedings of the 25th International Conference on World Wide Web, pp. 507–517. International World Wide Web Conferences Steering Committee (2016)
Google Scholar
Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146 (2018)
Hripcsak, G., Rothschild, A.S.: Technical brief: agreement, the F-measure, and reliability in information retrieval. JAMIA 12(3), 296–298 (2005). https://doi.org/10.1197/jamia.M1733
Hu, J., Ruder, S., Siddhant, A., Neubig, G., Firat, O., Johnson, M.: Xtreme: a massively multilingual multi-task benchmark for evaluating cross-lingual generalization. arXiv preprint arXiv:2003.11080 (2020)
Kanclerz, K., Miłkowski, P., Kocoń, J.: Cross-lingual deep neural transfer learning in sentiment analysis. Procedia Comput. Sci. 176, 128–137 (2020)
Google Scholar
Kocoń, J., Zaśko-Zielińska, M., Miłkowski, P.: Multi-level analysis and recognition of the text sentiment on the example of consumer opinions. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pp. 559–567 (2019)
Google Scholar
Kocoń, J., et al.: Recognition of emotions, valence and arousal in large-scale multi-domain text reviews. In: Human Language Technologies as a Challenge for Computer Science and Linguistics, pp. 274-280 (2019). ISBN 978-83-65988-31-7
Google Scholar
Kocoń, J., et al.: Propagation of emotions, arousal and polarity in WordNet using Heterogeneous Structured Synset Embeddings. In: Proceedings of the 10^th International Global Wordnet Conference (GWC’19), (2019)
Google Scholar
Kocoń, J., Miłkowski, P., Zaśko-Zielińska, M.: Multi-level sentiment analysis of PolEmo 2.0: Extended corpus of multi-domain consumer reviews. In: Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pp. 980–991 (2019)
Google Scholar
Liang, Y., et al.: Xglue: a new benchmark dataset for cross-lingual pre-training, understanding and generation. arXiv preprint arXiv:2004.01401 (2020)
Liu, Y., et al.: Roberta: a robustly optimized Bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Pontiki, M., et al.: SemEval-2016 task 5: aspect based sentiment analysis. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), San Diego, California, pp. 19–30. Association for Computational Linguistics, June 2016. https://doi.org/10.18653/v1/S16-1002
Rybak, P., Mroczkowski, R., Tracz, J., Gawlik, I.: KLEJ: comprehensive benchmark for polish language understanding. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1191–1201. Association for Computational Linguistics, July 2020
Google Scholar
Shoukry, A., Rafea, A.: Sentence-level Arabic sentiment analysis. In: 2012 International Conference on Collaboration Technologies and Systems (CTS), pp. 546–550. IEEE (2012)
Google Scholar
Subramaniyaswamy, V., Logesh, R., Abejith, M., Umasankar, S., Umamakeswari, A.: Sentiment analysis of tweets for estimating criticality and security of events. J. Organ. End User Comput. (JOEUC) 29(4), 51–71 (2017)
Article Google Scholar
Volkart, L., Bouillon, P., Girletti, S.: Statistical vs. neural machine translation: a comparison of MTH and DeepL at swiss post’s language service. In: Proceedings of the 40th Conference Translating and the Computer, AsLing, pp. 145–150 (2018) iD: unige:111777
Google Scholar
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.: Glue: a multi-task benchmark and analysis platform for natural language understanding. In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pp. 353–355 (2018)
Google Scholar
Warstadt, A., Singh, A., Bowman, S.R.: Neural network acceptability judgments. Trans. Assoc. Comput. Linguist. 7, 625–641 (2019)
Article Google Scholar

Download references

Acknowledgements

Funded by the Polish Ministry of Education and Science, CLARIN-PL Project.

Author information

Authors and Affiliations

Wrocław University of Science and Technology, Wrocław, Poland
Jan Kocoń, Piotr Miłkowski & Kamil Kanclerz

Authors

Jan Kocoń
View author publications
You can also search for this author in PubMed Google Scholar
Piotr Miłkowski
View author publications
You can also search for this author in PubMed Google Scholar
Kamil Kanclerz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jan Kocoń , Piotr Miłkowski or Kamil Kanclerz .

Editor information

Editors and Affiliations

AGH University of Science and Technology, Krakow, Poland
Maciej Paszynski
Ludwig-Maximilians-Universität München, Munich, Germany
Dieter Kranzlmüller
University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Tennessee at Knoxville, Knoxville, TN, USA
Jack J. Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M. A. Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kocoń, J., Miłkowski, P., Kanclerz, K. (2021). MultiEmo: Multilingual, Multilevel, Multidomain Sentiment Analysis Corpus of Consumer Reviews. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12743. Springer, Cham. https://doi.org/10.1007/978-3-030-77964-1_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-77964-1_24
Published: 09 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77963-4
Online ISBN: 978-3-030-77964-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics