Similarity-Based Dataset Recommendation Across Languages and Domains to Sentiment Analysis in the Electoral Domain

dos Santos, Jéssica Soares; Bernardini, Flavia; Paes, Aline

doi:10.1007/978-3-031-23213-8_7

Jéssica Soares dos Santos ORCID: orcid.org/0000-0001-5082-4583¹⁴,
Flavia Bernardini¹⁴ &
Aline Paes¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13392))

Included in the following conference series:

International Conference on Electronic Participation

367 Accesses

Abstract

Traditional machine learning classifiers usually fail at predicting labels for new data when their distribution differs from the training data distribution. This is particularly true with sentiment classifiers as the vocabulary and people’s opinions rapidly evolve. Naturally, the problem aggravates when there are only a few or even none labeled instances in the target domain. In this paper, we propose a dataset recommendation method based on multilingual embeddings and similarity metrics to properly choose sentiment analysis datasets to be used as training set when labeled data is unavailable or scarce. We adopted the sentiment analysis of electoral domain as our case study, considering the complexity and difficulty for manually label millions of political social media opinions during the short period of campaigns. Our results suggest that dataset similarity may be considered, even when datasets belong to different languages, to minimize negative effects that may occur due to domain shift in sentiment classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Al-Moslmi, T., Omar, N., Abdullah, S., Albared, M.: Approaches to cross-domain sentiment analysis: A systematic literature review. IEEE Access 5, 16173–16192 (2017)
Article Google Scholar
Bilal, M., Gani, A., Marjani, M., Malik, N.: Predicting elections: social media data and techniques. In: 2019 International Conference on Engineering and Emerging Technologies (ICEET), pp. 1–6. IEEE (2019)
Google Scholar
Blitzer, J., McDonald, R., Pereira, F.: Domain adaptation with structural correspondence learning. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 120–128 (2006)
Google Scholar
Calais Guerra, P.H., Veloso, A., Meira Jr, W., Almeida, V.: From bias to opinion: a transfer-learning approach to real-time sentiment analysis. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 150–158 (2011)
Google Scholar
Chidambaram, M., et al.: Learning cross-lingual sentence representations via a multi-task dual-encoder model. arXiv preprint arXiv:1810.12836 (2018)
Dai, X., Karimi, S., Hachey, B., Paris, C.: Using similarity measures to select pretraining data for NER. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 1460–1470 (2019)
Google Scholar
Elsahar, H., Gallé, M.: To annotate or not? Predicting performance drop under domain shift. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). pp. 2163–2173 (2019)
Google Scholar
Fan, W., Davidson, I.: Reverse testing: an efficient framework to select amongst classifiers under sample selection bias. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 147–156 (2006)
Google Scholar
Ghani, N.A., Hamid, S., Hashem, I.A.T., Ahmed, E.: Social media big data analytics: a survey. Comput. Hum. Behav. 101, 417–428 (2019)
Article Google Scholar
Joshi, M., Prajapati, P., Shaikh, A., Vala, V.: A survey on sentiment analysis. Int. J. Comput. Appl. 163(6), 34–38 (2017)
Google Scholar
Kouw, W.M., Loog, M.: An introduction to domain adaptation and transfer learning. arXiv preprint arXiv:1812.11806 (2018)
Li, N., Zhai, S., Zhang, Z., Liu, B.: Structural correspondence learning for cross-lingual sentiment classification with one-to-many mappings. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)
Google Scholar
Li, Y., Guo, H., Zhang, Q., Gu, M., Yang, J.: Imbalanced text sentiment classification using universal and domain-specific knowledge. Knowl. Based Syst. 160, 1–15 (2018)
Article Google Scholar
Liu, B.: Sentiment Analysis: Mining Opinions, Sentiments, and Emotions. Studies in Natural Language Processing, 2 edn. Cambridge University Press, (2020). https://doi.org/10.1017/9781108639286
Mahendiran, A., et al.: Discovering evolving political vocabulary in social media. In: 2014 International Conference on Behavioral, Economic, and Socio-Cultural Computing (BESC2014), pp. 1–7. IEEE (2014)
Google Scholar
Pan, S.J., Ni, X., Sun, J.T., Yang, Q., Chen, Z.: Cross-domain sentiment classification via spectral feature alignment. In: Proceedings of the 19th International Conference on World Wide Web, pp. 751–760 (2010)
Google Scholar
Santos, J.S., Bernardini, F., Paes, A.: Measuring the degree of divergence when labeling tweets in the electoral scenario. In: Anais do X Brazilian Workshop on Social Network Analysis and Mining. pp. 127–138. SBC (2021)
Google Scholar
Santos, J.S., Bernardini, F., Paes, A.: A survey on the use of data and opinion mining in social media to political electoral outcomes prediction. Soc. Netw. Anal. Min. 11(1), 1–39 (2021)
Article Google Scholar
Santos, J.S., Paes, A., Bernardini, F.: Combining labeled datasets for sentiment analysis from different domains based on dataset similarity to predict electors sentiment. In: Proceedings of the 8th Brazilian Conference on Intelligent Systems (BRACIS), pp. 455–460. IEEE (2019)
Google Scholar
Wu, F., Huang, Y.: Sentiment domain adaptation with multiple sources. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 301–310 (2016)
Google Scholar
Wu, F., Huang, Y., Yuan, Z.: Domain-specific sentiment classification via fusing sentiment knowledge from multiple sources. Inf. Fus. 35, 26–37 (2017)
Article Google Scholar
Yang, Y., et al.: Multilingual universal sentence encoder for semantic retrieval. arXiv preprint arXiv:1907.04307 (2019)
Zhang, Y., Hu, X., Li, P., Li, L., Wu, X.: Cross-domain sentiment classification-feature divergence, polarity divergence or both? Pattern Recogn. Lett. 65, 44–50 (2015)
Article Google Scholar
Zhong, E., Fan, W., Yang, Q., Verscheure, O., Ren, J.: Cross validation framework to choose amongst models and datasets for transfer learning. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS (LNAI), vol. 6323, pp. 547–562. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15939-8_35
Chapter Google Scholar

Download references

Acknowledgement

This research was supported by the Brazilian Research CNPq APQ Universal (Grant 421608/2018-8), CNPq Research Grant 311275/2020-6, FAPERJ Research grant E26/202.914/2019 (247109), Microsoft Research Grant and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brazil (CAPES).

Author information

Authors and Affiliations

Institute of Computing, Fluminense Federal University, Niterói, RJ, Brazil
Jéssica Soares dos Santos, Flavia Bernardini & Aline Paes

Authors

Jéssica Soares dos Santos
View author publications
You can also search for this author in PubMed Google Scholar
Flavia Bernardini
View author publications
You can also search for this author in PubMed Google Scholar
Aline Paes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jéssica Soares dos Santos .

Editor information

Editors and Affiliations

University of Tartu, Tartu, Estonia
Robert Krimmer
University of South-Eastern Norway, Borre, Norway
Marius Rohde Johannessen
Danube University Krems, Krems, Austria
Thomas Lampoltshammer
Linköping University, Linköping, Sweden
Ida Lindgren
Danube University Krems, Krems, Austria
Peter Parycek
University of Zurich, Zurich, Switzerland
Gerhard Schwabe
Delft University of Technology, Delft, The Netherlands
Jolien Ubacht

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

dos Santos, J.S., Bernardini, F., Paes, A. (2022). Similarity-Based Dataset Recommendation Across Languages and Domains to Sentiment Analysis in the Electoral Domain. In: Krimmer, R., et al. Electronic Participation. ePart 2022. Lecture Notes in Computer Science, vol 13392. Springer, Cham. https://doi.org/10.1007/978-3-031-23213-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-23213-8_7
Published: 08 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23212-1
Online ISBN: 978-3-031-23213-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics