Skip to main content

MultiEmo: Multilingual, Multilevel, Multidomain Sentiment Analysis Corpus of Consumer Reviews

  • Conference paper
  • First Online:
Computational Science – ICCS 2021 (ICCS 2021)

Abstract

This article presents MultiEmo, a new benchmark data set for the multilingual sentiment analysis task including 11 languages. The collection contains consumer reviews from four domains: medicine, hotels, products and university. The original reviews in Polish contained 8,216 documents consisting of 57,466 sentences. The reviews were manually annotated with sentiment at the level of the whole document and at the level of a sentence (3 annotators per element). We achieved a high Positive Specific Agreement value of 0.91 for texts and 0.88 for sentences. The collection was then translated automatically into English, Chinese, Italian, Japanese, Russian, German, Spanish, French, Dutch and Portuguese. MultiEmo is publicly available under the MIT Licence. We present the results of the evaluation using the latest cross-lingual deep learning models such as XLM-RoBERTa, MultiFiT and LASER+BiLSTM. We have taken into account 3 aspects in the context of comparing the quality of the models: multilingualism, multilevel and multidomain knowledge transfer ability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.deepl.com/.

  2. 2.

    https://clarin-pl.eu/dspace/handle/11321/798.

  3. 3.

    http://ws.clarin-pl.eu/multiemo.

  4. 4.

    http://clarin-pl.eu/.

  5. 5.

    https://github.com/CLARIN-PL/multiemo.

References

  1. Al-Moslmi, T., Omar, N., Abdullah, S., Albared, M.: Approaches to cross-domain sentiment analysis: a systematic literature review. IEEE Access 5, 16173–16192 (2017)

    Article  Google Scholar 

  2. Artetxe, M., Schwenk, H.: Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. Trans. Assoc. Comput. Linguist. 7, 597–610 (2019)

    Article  Google Scholar 

  3. Bradbury, J., Merity, S., Xiong, C., Socher, R.: Quasi-recurrent neural networks. arXiv preprint arXiv:1611.01576 (2016)

  4. Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8440–8451. Association for Computational Linguistics, July 2020. https://doi.org/10.18653/v1/2020.acl-main.747

  5. Dadas, S., Perełkiewicz, M., Poświata, R.: Evaluation of sentence representations in polish. arXiv preprint arXiv:1910.11834 (2019)

  6. Day, M.Y., Lin, Y.D.: Deep learning for sentiment analysis on google play consumer review. In: 2017 IEEE international conference on information reuse and integration (IRI), pp. 382–388. IEEE (2017)

    Google Scholar 

  7. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019)

    Google Scholar 

  8. Eisenschlos, J., Ruder, S., Czapla, P., Kadras, M., Gugger, S., Howard, J.: Multifit: efficient multi-lingual language model fine-tuning. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 5706–5711 (2019)

    Google Scholar 

  9. Galeshchuk, S., Qiu, J., Jourdan, J.: Sentiment analysis for multilingual corpora. In: Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, Florence, Italy, pp. 120–125. Association for Computational Linguistics, August 2019. https://doi.org/10.18653/v1/W19-3717

  10. Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: a deep learning approach. In: Proceedings of the 28th International Conference on Machine Learning (ICML 2011), pp. 513–520 (2011)

    Google Scholar 

  11. He, R., McAuley, J.: Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In: proceedings of the 25th International Conference on World Wide Web, pp. 507–517. International World Wide Web Conferences Steering Committee (2016)

    Google Scholar 

  12. Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146 (2018)

  13. Hripcsak, G., Rothschild, A.S.: Technical brief: agreement, the F-measure, and reliability in information retrieval. JAMIA 12(3), 296–298 (2005). https://doi.org/10.1197/jamia.M1733

  14. Hu, J., Ruder, S., Siddhant, A., Neubig, G., Firat, O., Johnson, M.: Xtreme: a massively multilingual multi-task benchmark for evaluating cross-lingual generalization. arXiv preprint arXiv:2003.11080 (2020)

  15. Kanclerz, K., Miłkowski, P., Kocoń, J.: Cross-lingual deep neural transfer learning in sentiment analysis. Procedia Comput. Sci. 176, 128–137 (2020)

    Google Scholar 

  16. Kocoń, J., Zaśko-Zielińska, M., Miłkowski, P.: Multi-level analysis and recognition of the text sentiment on the example of consumer opinions. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pp. 559–567 (2019)

    Google Scholar 

  17. Kocoń, J., et al.: Recognition of emotions, valence and arousal in large-scale multi-domain text reviews. In: Human Language Technologies as a Challenge for Computer Science and Linguistics, pp. 274-280 (2019). ISBN 978-83-65988-31-7

    Google Scholar 

  18. Kocoń, J., et al.: Propagation of emotions, arousal and polarity in WordNet using Heterogeneous Structured Synset Embeddings. In: Proceedings of the 10th International Global Wordnet Conference (GWC’19), (2019)

    Google Scholar 

  19. Kocoń, J., Miłkowski, P., Zaśko-Zielińska, M.: Multi-level sentiment analysis of PolEmo 2.0: Extended corpus of multi-domain consumer reviews. In: Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pp. 980–991 (2019)

    Google Scholar 

  20. Liang, Y., et al.: Xglue: a new benchmark dataset for cross-lingual pre-training, understanding and generation. arXiv preprint arXiv:2004.01401 (2020)

  21. Liu, Y., et al.: Roberta: a robustly optimized Bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  22. Pontiki, M., et al.: SemEval-2016 task 5: aspect based sentiment analysis. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), San Diego, California, pp. 19–30. Association for Computational Linguistics, June 2016. https://doi.org/10.18653/v1/S16-1002

  23. Rybak, P., Mroczkowski, R., Tracz, J., Gawlik, I.: KLEJ: comprehensive benchmark for polish language understanding. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1191–1201. Association for Computational Linguistics, July 2020

    Google Scholar 

  24. Shoukry, A., Rafea, A.: Sentence-level Arabic sentiment analysis. In: 2012 International Conference on Collaboration Technologies and Systems (CTS), pp. 546–550. IEEE (2012)

    Google Scholar 

  25. Subramaniyaswamy, V., Logesh, R., Abejith, M., Umasankar, S., Umamakeswari, A.: Sentiment analysis of tweets for estimating criticality and security of events. J. Organ. End User Comput. (JOEUC) 29(4), 51–71 (2017)

    Article  Google Scholar 

  26. Volkart, L., Bouillon, P., Girletti, S.: Statistical vs. neural machine translation: a comparison of MTH and DeepL at swiss post’s language service. In: Proceedings of the 40th Conference Translating and the Computer, AsLing, pp. 145–150 (2018) iD: unige:111777

    Google Scholar 

  27. Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.: Glue: a multi-task benchmark and analysis platform for natural language understanding. In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pp. 353–355 (2018)

    Google Scholar 

  28. Warstadt, A., Singh, A., Bowman, S.R.: Neural network acceptability judgments. Trans. Assoc. Comput. Linguist. 7, 625–641 (2019)

    Article  Google Scholar 

Download references

Acknowledgements

Funded by the Polish Ministry of Education and Science, CLARIN-PL Project.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jan Kocoń , Piotr Miłkowski or Kamil Kanclerz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kocoń, J., Miłkowski, P., Kanclerz, K. (2021). MultiEmo: Multilingual, Multilevel, Multidomain Sentiment Analysis Corpus of Consumer Reviews. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12743. Springer, Cham. https://doi.org/10.1007/978-3-030-77964-1_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-77964-1_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-77963-4

  • Online ISBN: 978-3-030-77964-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics