Linear Transformations for Cross-lingual Sentiment Analysis

Přibáň, Pavel; Šmíd, Jakub; Mištera, Adam; Král, Pavel

doi:10.1007/978-3-031-16270-1_11

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13502))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1085 Accesses
1 Citations
2 Altmetric

Abstract

This paper deals with cross-lingual sentiment analysis in Czech, English and French languages. We perform zero-shot cross-lingual classification using five linear transformations combined with LSTM and CNN based classifiers. We compare the performance of the individual transformations, and in addition, we confront the transformation-based approach with existing state-of-the-art BERT-like models. We show that the pre-trained embeddings from the target domain are crucial to improving the cross-lingual classification results, unlike in the monolingual classification, where the effect is not so distinctive.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Survey of Cross-lingual Sentiment Analysis: Methodologies, Models and Evaluations

Article Open access 08 June 2022

MultiEmo: Language-Agnostic Sentiment Analysis

SentiCode: A new paradigm for one-time training and global prediction in multilingual sentiment analysis

Article 25 May 2022

Notes

1.
Here, we consider sentiment analysis and sentiment classification as the same task.
2.
https://github.com/pauli31/linear-transformation-4-cs-sa.
3.
https://www.csfd.cz.
4.
https://www.allocine.fr.
5.
https://www.imdb.com.
6.
Matrix W is orthogonal when it is a square matrix and the columns and rows are orthonormal vectors (${W}^{\textsf{T}}{W} = {W}{W}^{\textsf{T}} = I$, where I is the identity matrix).
7.
Available from https://fasttext.cc/docs/en/crawl-vectors.html.
8.
For example, the column labeled as EN-s $\Rightarrow $ CS-t means that English space was transformed into Czech space. English is the source language (-s suffix) and Czech is the target language (-t suffix), in other words, the English dataset is used for training and Czech for the evaluation.
9.
We provide the details of the used hyper-parameters at our GitHub repository.

References

Abdalla, M., Hirst, G.: Cross-lingual sentiment analysis without (good) translation. In: Proceedings of the Eighth International Joint Conference on NLP (2017)
Google Scholar
Aliramezani, M., Doostmohammadi, E., Bokaei, M.H., Sameti, H.: Persian sentiment analysis without training data using cross-lingual word embeddings. In: 2020 10th International Symposium onTelecommunications (IST) (2020)
Google Scholar
Ammar, W., Mulcaire, G., Tsvetkov, Y., Lample, G., Dyer, C., Smith, N.A.: Massively multilingual word embeddings (2016)
Google Scholar
Artetxe, M., Labaka, G., Agirre, E.: Learning principled bilingual mappings of word embeddings while preserving monolingual invariance. In: Proceedings of the 2016 Conference on Empirical Methods in NLP (2016)
Google Scholar
Balahur, A., Turchi, M.: Multilingual sentiment analysis using machine translation? In: Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis (2012)
Google Scholar
Barnes, J., Klinger, R., Schulte im Walde, S.: Bilingual sentiment embeddings: Joint projection of sentiment across languages. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (2018)
Google Scholar
Barnes, J., Lambert, P., Badia, T.: Exploring distributional representations and machine translation for aspect-based cross-lingual sentiment classification. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (2016)
Google Scholar
Barriere, V., Balahur, A.: Improving sentiment analysis over non-English tweets using multilingual transformers and automatic translation for data-augmentation. In: Proceedings of the 28th COLING (2020)
Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016)
Brychcín, T.: Linear transformations for cross-lingual semantic textual similarity. Knowledge-Based Systems 187 (2020)
Google Scholar
Can, E.F., Ezen-Can, A., Can, F.: Multilingual sentiment analysis: An rnn-based framework for limited data. CoRR abs/1806.04511 (2018)
Google Scholar
Chen, X., Sun, Y., Athiwaratkun, B., Cardie, C., Weinberger, K.: Adversarial deep averaging networks for cross-lingual sentiment classification. Trans. Assoc. Comput. Linguist. 6 (2018)
Google Scholar
Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2019)
Google Scholar
Dong, X., De Melo, G.: Cross-lingual propagation for deep sentiment analysis. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Habernal, I., Ptáček, T., Steinberger, J.: Sentiment analysis in Czech social media using supervised machine learning. In: Proceedings of the 4th Workshop on Computational Approaches to Subjectivity and Social Media Analysis (2013)
Google Scholar
Hardoon, D.R., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis: An overview with application to learning methods. Neural Comput. 16(12) (2004)
Google Scholar
Jain, S., Batra, S.: Cross lingual sentiment analysis using modified BRAE. In: Proceedings of the 2015 Conference on Empirical Methods in NLP (2015)
Google Scholar
Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Zhao, T.: SMART: robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020)
Google Scholar
Johnson, R., Zhang, T.: Supervised and semi-supervised text categorization using lstm for region embeddings. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 (2016)
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in NLP (2014)
Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kuriyozov, E., Doval, Y., Gómez-Rodríguez, C.: Cross-lingual word embeddings for Turkic languages. In: Proceedings of the 12th LREC Conference (2020)
Google Scholar
Lazaridou, A., Dinu, G., Baroni, M.: Hubness and pollution: Delving into cross-space mapping for zero-shot learning. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China (2015)
Google Scholar
Lehečka, J., Švec, J., Ircing, P., Šmídl, L.: Bert-based sentiment analysis using distillation. In: Espinosa-Anke, L., Martín-Vide, C., Spasić, I. (eds.) Statistical Language and Speech Processing (2020)
Google Scholar
Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (2011)
Google Scholar
Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting similarities among languages for machine translation. CoRR abs/1309.4168 (2013)
Google Scholar
Přibáň, P., Steinberger, J.: Are the multilingual models better? improving Czech sentiment with transformers. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021) (2021)
Google Scholar
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140) (2020)
Google Scholar
Ruder, S., Vulić, I., Søgaard, A.: A survey of cross-lingual word embedding models. J. Artif. Intell. Res. 65 (2019)
Google Scholar
Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in NLP (2013)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1) (2014)
Google Scholar
Sun, C., Qiu, X., Xu, Y., Huang, X.: How to fine-tune BERT for text classification? In: China National Conference on Chinese Computational Linguistics (2019)
Google Scholar
Thakkar, G., Preradovic, N.M., Tadic, M.: Multi-task learning for cross-lingual sentiment analysis. In: CLEOPATRA@ WWW (2021)
Google Scholar
Thongtan, T., Phienthrakul, T.: Sentiment classification using document embeddings trained with cosine similarity. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop (2019)
Google Scholar
Théophile, B.: French sentiment analysis with bert (2020). https://github.com/TheophileBlard/french-sentiment-analysis-with-bert
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Le, Q.V.: Xlnet: Generalized autoregressive pretraining for language understanding. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019)
Google Scholar
Zhang, W., He, R., Peng, H., Bing, L., Lam, W.: Cross-lingual aspect-based sentiment analysis with aspect term code-switching. In: Proceedings of the 2021 Conference on Empirical Methods in NLP (2021)
Google Scholar
Zhou, H., Chen, L., Shi, F., Huang, D.: Learning bilingual sentiment word embeddings for cross-language sentiment classification. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (2015)
Google Scholar
Zhou, X., Wan, X., Xiao, J.: Attention-based LSTM network for cross-lingual sentiment classification. In: Proceedings of the 2016 Conference on Empirical Methods in NLP (2016)
Google Scholar

Download references

Acknowledgments

This work has been partly supported by ERDF “Research and Development of Intelligent Components of Advanced Technologies for the Pilsen Metropolitan Area (InteCom)” (no.: CZ.02.1.01/0.0/0.0/17 048/0007267); and by Grant No. SGS-2022-016 Advanced methods of data processing and analysis. Computational resources were supplied by the project “e-Infrastruktura CZ" (e-INFRA CZ LM2018140) supported by the Ministry of Education, Youth and Sports of the Czech Republic.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of West Bohemia Faculty of Applied Sciences, Pilsen, Czech Republic
Pavel Přibáň, Jakub Šmíd, Adam Mištera & Pavel Král
NTIS – New Technologies for the Information Society, Univerzitni 8, 301 00, Pilsen, Czech Republic
Pavel Přibáň & Adam Mištera

Authors

Pavel Přibáň
View author publications
You can also search for this author in PubMed Google Scholar
Jakub Šmíd
View author publications
You can also search for this author in PubMed Google Scholar
Adam Mištera
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Král
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pavel Přibáň .

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Aleš Horák
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Ivan Kopeček
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Přibáň, P., Šmíd, J., Mištera, A., Král, P. (2022). Linear Transformations for Cross-lingual Sentiment Analysis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2022. Lecture Notes in Computer Science(), vol 13502. Springer, Cham. https://doi.org/10.1007/978-3-031-16270-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-16270-1_11
Published: 16 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16269-5
Online ISBN: 978-3-031-16270-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Linear Transformations for Cross-lingual Sentiment Analysis