Generating Word and Document Embeddings for Sentiment Analysis

Aydın, Cem Rıfkı; Güngör, Tunga; Erkan, Ali

doi:10.1007/978-3-031-24340-0_23

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13452))

Included in the following conference series:

International Conference on Computational Linguistics and Intelligent Text Processing

509 Accesses
2 Citations

Abstract

Sentiments of words can differ from one corpus to another. Inducing general sentiment lexicons for languages and using them cannot, in general, produce meaningful results for different domains. In this paper, we combine contextual and supervised information with the general semantic representations of words occurring in the dictionary. Contexts of words help us capture the domain-specific information and supervised scores of words are indicative of the polarities of those words. When we combine supervised features of words with the features extracted from their dictionary definitions, we observe an increase in the success rates. We try out the combinations of contextual, supervised, and dictionary-based approaches, and generate original vectors. We also combine the word2vec approach with hand-crafted features. We induce domain-specific sentimental vectors for two corpora, which are the movie domain and the Twitter datasets in Turkish. When we thereafter generate document vectors and employ the support vector machines method utilising those vectors, our approaches perform better than the baseline studies for Turkish with a significant margin. We evaluated our models on two English corpora as well and these also outperformed the word2vec approach. It shows that our approaches are cross-domain and portable to other languages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Sentiment Dictionary Refinement Using Word Embeddings

DoSLex: automatic generation of all domain semantically rich sentiment lexicon

Article 18 July 2024

Enhanced Word Embeddings with Sentiment Contextualized Vectors for Sentiment Analysis

Notes

References

Akın, A.A., Akın, M.D.: Zemberek, an open source NLP framework for Turkic languages. Structure 10, 1–5 (2007)
Google Scholar
Baccianella, S., Esuli, A., Sebastiani, F.: SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: Calzolari, N., et al. (eds.) LREC European Language Resources Association (2010)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Google Scholar
Boyd-Graber, J., Resnik, P.: Holistic sentiment analysis across languages: multilingual supervised latent Dirichlet allocation. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 45–55. Association for Computational Linguistics (2010)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https://doi.org/10.1023/A:1022627411411
Article Google Scholar
Ertugrul, A.M., Önal, I., Acartürk, C.: Does the strength of sentiment matter? A regression based approach on Turkish social media. In: Natural Language Processing and Information Systems - 22nd International Conference on Applications of Natural Language to Information Systems, NLDB 2017, Liège, Belgium, 21–23 June 2017, Proceedings, pp. 149–155 (2017). https://doi.org/10.1007/978-3-319-59569-6_16
Felbo, B., Mislove, A., Søgaard, A., Rahwan, I., Lehmann, S.: Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. In: EMNLP, pp. 1615–1625. Association for Computational Linguistics (2017)
Google Scholar
Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. In: Processing, pp. 1–6 (2009)
Google Scholar
Goldberg, Y.: A primer on neural network models for natural language processing. J. Artif. Intell. Res. 1510(726), 345–420 (2016)
Article Google Scholar
Hamilton, W.L., Clark, K., Leskovec, J., Jurafsky, D.: Inducing domain-specific sentiment lexicons from unlabeled corpora. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 595–605. Association for Computational Linguistics (2016)
Google Scholar
Li, F., Huang, M., Zhu, X.: Sentiment analysis with global topics and local dependency. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10), pp. 1371–1376. Association for Computational Linguistics (2010)
Google Scholar
Lin, C., He, Y.: Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 375–384. ACM (2009). https://doi.org/10.1145/1645953.1646003
Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, pp. 142–150. Association for Computational Linguistics (2011)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR 1301(3781), 1–12 (2013)
Google Scholar
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, Valletta, May 2010
Google Scholar
Sak, H., Güngör, T., Saraçlar, M.: Morphological disambiguation of Turkish text with perceptron algorithm. In: Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing 2007), pp. 107–118. CICLing Press (2007). https://doi.org/10.1007/978-3-540-70939-8_10
Sak, H., Güngör, T., Saraçlar, M.: Turkish language resources: morphological parser, morphological disambiguator and web corpus. In: Nordström, B., Ranta, A. (eds.) GoTAL 2008. LNCS (LNAI), vol. 5221, pp. 417–427. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85287-2_40
Chapter Google Scholar
Tang, D., Wei, F., Qin, B., Liu, T., Zhou, M.: Coooolll: a deep learning system for Twitter sentiment classification. In: Proceedings of the 8th International Workshop on Semantic Evaluation, SemEval@COLING 2014, Dublin, Ireland, 23–24 August 2014, pp. 208–212 (2014)
Google Scholar
Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37, 141–188 (2010)
Article Google Scholar

Download references

Acknowledgments

This work was supported by Boǧaziçi University Research Fund Grant Number 6980D, and by Turkish Ministry of Development under the TAM Project number DPT2007K12-0610. Cem Rifki Aydin has been supported by TüBİTAK BIDEB 2211E.

Author information

Authors and Affiliations

Computer Engineering Department, Boğaziçi University, Bebek, 34342, Istanbul, Turkey
Cem Rıfkı Aydın, Tunga Güngör & Ali Erkan

Authors

Cem Rıfkı Aydın
View author publications
You can also search for this author in PubMed Google Scholar
Tunga Güngör
View author publications
You can also search for this author in PubMed Google Scholar
Ali Erkan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cem Rıfkı Aydın .

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aydın, C.R., Güngör, T., Erkan, A. (2023). Generating Word and Document Embeddings for Sentiment Analysis. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol 13452. Springer, Cham. https://doi.org/10.1007/978-3-031-24340-0_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-24340-0_23
Published: 26 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24339-4
Online ISBN: 978-3-031-24340-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Generating Word and Document Embeddings for Sentiment Analysis