Skip to main content

Linear Transformations for Cross-lingual Sentiment Analysis

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2022)

Abstract

This paper deals with cross-lingual sentiment analysis in Czech, English and French languages. We perform zero-shot cross-lingual classification using five linear transformations combined with LSTM and CNN based classifiers. We compare the performance of the individual transformations, and in addition, we confront the transformation-based approach with existing state-of-the-art BERT-like models. We show that the pre-trained embeddings from the target domain are crucial to improving the cross-lingual classification results, unlike in the monolingual classification, where the effect is not so distinctive.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Here, we consider sentiment analysis and sentiment classification as the same task.

  2. 2.

    https://github.com/pauli31/linear-transformation-4-cs-sa.

  3. 3.

    https://www.csfd.cz.

  4. 4.

    https://www.allocine.fr.

  5. 5.

    https://www.imdb.com.

  6. 6.

    Matrix W is orthogonal when it is a square matrix and the columns and rows are orthonormal vectors (\({W}^{\textsf{T}}{W} = {W}{W}^{\textsf{T}} = I\), where I is the identity matrix).

  7. 7.

    Available from https://fasttext.cc/docs/en/crawl-vectors.html.

  8. 8.

    For example, the column labeled as EN-s \(\Rightarrow \) CS-t means that English space was transformed into Czech space. English is the source language (-s suffix) and Czech is the target language (-t suffix), in other words, the English dataset is used for training and Czech for the evaluation.

  9. 9.

    We provide the details of the used hyper-parameters at our GitHub repository.

References

  1. Abdalla, M., Hirst, G.: Cross-lingual sentiment analysis without (good) translation. In: Proceedings of the Eighth International Joint Conference on NLP (2017)

    Google Scholar 

  2. Aliramezani, M., Doostmohammadi, E., Bokaei, M.H., Sameti, H.: Persian sentiment analysis without training data using cross-lingual word embeddings. In: 2020 10th International Symposium onTelecommunications (IST) (2020)

    Google Scholar 

  3. Ammar, W., Mulcaire, G., Tsvetkov, Y., Lample, G., Dyer, C., Smith, N.A.: Massively multilingual word embeddings (2016)

    Google Scholar 

  4. Artetxe, M., Labaka, G., Agirre, E.: Learning principled bilingual mappings of word embeddings while preserving monolingual invariance. In: Proceedings of the 2016 Conference on Empirical Methods in NLP (2016)

    Google Scholar 

  5. Balahur, A., Turchi, M.: Multilingual sentiment analysis using machine translation? In: Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis (2012)

    Google Scholar 

  6. Barnes, J., Klinger, R., Schulte im Walde, S.: Bilingual sentiment embeddings: Joint projection of sentiment across languages. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (2018)

    Google Scholar 

  7. Barnes, J., Lambert, P., Badia, T.: Exploring distributional representations and machine translation for aspect-based cross-lingual sentiment classification. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (2016)

    Google Scholar 

  8. Barriere, V., Balahur, A.: Improving sentiment analysis over non-English tweets using multilingual transformers and automatic translation for data-augmentation. In: Proceedings of the 28th COLING (2020)

    Google Scholar 

  9. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016)

  10. Brychcín, T.: Linear transformations for cross-lingual semantic textual similarity. Knowledge-Based Systems 187 (2020)

    Google Scholar 

  11. Can, E.F., Ezen-Can, A., Can, F.: Multilingual sentiment analysis: An rnn-based framework for limited data. CoRR abs/1806.04511 (2018)

    Google Scholar 

  12. Chen, X., Sun, Y., Athiwaratkun, B., Cardie, C., Weinberger, K.: Adversarial deep averaging networks for cross-lingual sentiment classification. Trans. Assoc. Comput. Linguist. 6 (2018)

    Google Scholar 

  13. Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020)

    Google Scholar 

  14. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2019)

    Google Scholar 

  15. Dong, X., De Melo, G.: Cross-lingual propagation for deep sentiment analysis. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  16. Habernal, I., Ptáček, T., Steinberger, J.: Sentiment analysis in Czech social media using supervised machine learning. In: Proceedings of the 4th Workshop on Computational Approaches to Subjectivity and Social Media Analysis (2013)

    Google Scholar 

  17. Hardoon, D.R., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis: An overview with application to learning methods. Neural Comput. 16(12) (2004)

    Google Scholar 

  18. Jain, S., Batra, S.: Cross lingual sentiment analysis using modified BRAE. In: Proceedings of the 2015 Conference on Empirical Methods in NLP (2015)

    Google Scholar 

  19. Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Zhao, T.: SMART: robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020)

    Google Scholar 

  20. Johnson, R., Zhang, T.: Supervised and semi-supervised text categorization using lstm for region embeddings. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 (2016)

    Google Scholar 

  21. Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in NLP (2014)

    Google Scholar 

  22. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  23. Kuriyozov, E., Doval, Y., Gómez-Rodríguez, C.: Cross-lingual word embeddings for Turkic languages. In: Proceedings of the 12th LREC Conference (2020)

    Google Scholar 

  24. Lazaridou, A., Dinu, G., Baroni, M.: Hubness and pollution: Delving into cross-space mapping for zero-shot learning. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China (2015)

    Google Scholar 

  25. Lehečka, J., Švec, J., Ircing, P., Šmídl, L.: Bert-based sentiment analysis using distillation. In: Espinosa-Anke, L., Martín-Vide, C., Spasić, I. (eds.) Statistical Language and Speech Processing (2020)

    Google Scholar 

  26. Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (2011)

    Google Scholar 

  27. Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting similarities among languages for machine translation. CoRR abs/1309.4168 (2013)

    Google Scholar 

  28. Přibáň, P., Steinberger, J.: Are the multilingual models better? improving Czech sentiment with transformers. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021) (2021)

    Google Scholar 

  29. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140) (2020)

    Google Scholar 

  30. Ruder, S., Vulić, I., Søgaard, A.: A survey of cross-lingual word embedding models. J. Artif. Intell. Res. 65 (2019)

    Google Scholar 

  31. Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in NLP (2013)

    Google Scholar 

  32. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1) (2014)

    Google Scholar 

  33. Sun, C., Qiu, X., Xu, Y., Huang, X.: How to fine-tune BERT for text classification? In: China National Conference on Chinese Computational Linguistics (2019)

    Google Scholar 

  34. Thakkar, G., Preradovic, N.M., Tadic, M.: Multi-task learning for cross-lingual sentiment analysis. In: CLEOPATRA@ WWW (2021)

    Google Scholar 

  35. Thongtan, T., Phienthrakul, T.: Sentiment classification using document embeddings trained with cosine similarity. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop (2019)

    Google Scholar 

  36. Théophile, B.: French sentiment analysis with bert (2020). https://github.com/TheophileBlard/french-sentiment-analysis-with-bert

  37. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Le, Q.V.: Xlnet: Generalized autoregressive pretraining for language understanding. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019)

    Google Scholar 

  38. Zhang, W., He, R., Peng, H., Bing, L., Lam, W.: Cross-lingual aspect-based sentiment analysis with aspect term code-switching. In: Proceedings of the 2021 Conference on Empirical Methods in NLP (2021)

    Google Scholar 

  39. Zhou, H., Chen, L., Shi, F., Huang, D.: Learning bilingual sentiment word embeddings for cross-language sentiment classification. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (2015)

    Google Scholar 

  40. Zhou, X., Wan, X., Xiao, J.: Attention-based LSTM network for cross-lingual sentiment classification. In: Proceedings of the 2016 Conference on Empirical Methods in NLP (2016)

    Google Scholar 

Download references

Acknowledgments

This work has been partly supported by ERDF “Research and Development of Intelligent Components of Advanced Technologies for the Pilsen Metropolitan Area (InteCom)” (no.: CZ.02.1.01/0.0/0.0/17 048/0007267); and by Grant No. SGS-2022-016 Advanced methods of data processing and analysis. Computational resources were supplied by the project “e-Infrastruktura CZ" (e-INFRA CZ LM2018140) supported by the Ministry of Education, Youth and Sports of the Czech Republic.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pavel Přibáň .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Přibáň, P., Šmíd, J., Mištera, A., Král, P. (2022). Linear Transformations for Cross-lingual Sentiment Analysis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2022. Lecture Notes in Computer Science(), vol 13502. Springer, Cham. https://doi.org/10.1007/978-3-031-16270-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16270-1_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16269-5

  • Online ISBN: 978-3-031-16270-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics