Abstract
Previous works on Polish sentiment dictionaries revealed the superiority of machine learning on vectors created from word contexts (concordances or word co-occurrence distributions), especially compared to the SO-PMI method (semantic orientation of pointwise mutual information). This paper demonstrates that this state-of-the-art method could be improved upon when extending the vectors by word embeddings, obtained from skip-gram language models. Specifically, it proposes a new method of computing word sentiment polarity using feature sets composed of vectors created from word embeddings and word co-occurrence distributions. The new technique is evaluated in a number of experimental settings.
This work was funded by the National Science Centre of Poland grant nr UMO-2012/05/N/ST6/03587.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Micro means to calculate metrics globally by counting the total true positives, false negatives and false positives. Thus, it takes into account label imbalance. The formula is: \(F1 = 2 * (precision * recall) / (precision + recall)\) .
- 2.
Macro means to calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
References
Stone, P.J., Dunphy, D.C., Ogilvie, D.M., Smith, M.S.: The General Inquirer: A Computer Approach to Content Analysis. MIT Press, Cambridge (1966)
Liu, B.: Sentiment Analysis and Opinion Mining. Morgan and Claypool Publishers (2012)
Zhu, X., Kiritchenko, S., Mohammad, S.: Nrc-canada-2014: Recent improvements in the sentiment analysis of tweets. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), Dublin, Ireland, Association for Computational Linguistics and Dublin City University, pp. 443–447, August 2014
Wawer, A., Rogozinska, D.: How much supervision? Corpus-based lexeme sentiment estimation. In: 2012 IEEE 12th International Conference on Data Mining Workshops (SENTIRE 2012), Los Alamitos, CA, USA, pp. 724–730. IEEE Computer Society (2012)
Turney, P., Littman, M.: Measuring praise and criticism: Inference of semantic orientation from association. ACM Trans. Inf. Syst. 21, 315–346 (2003)
Grefenstette, G., Qu, Y., Evans, D.A., Shanahan, J.G.: In: Validating the Coverage of Lexical Resources for Affect Analysis and Automatically Classifying New Words along Semantic Axes. Springer, Netherlands (2006)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS (2013)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013)
Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C., Ng, A., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Conference on Empirical Methods in Natural Language Processing (2013)
Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: The Twenty-Eighth Annual Conference on Neural Information Processing Systems (NIPS 2014) (2014)
Tiedemann, J.: Parallel data, tools and interfaces in opus. In: Chair, N.C.C., Choukri, K., Declerck, T., Dogan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., (eds.): Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey, European Language Resources Association (ELRA), May 2012
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: A library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, pp. 1532–1543 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Wawer, A. (2015). Sentiment Dictionary Refinement Using Word Embeddings. In: Esposito, F., Pivert, O., Hacid, MS., Rás, Z., Ferilli, S. (eds) Foundations of Intelligent Systems. ISMIS 2015. Lecture Notes in Computer Science(), vol 9384. Springer, Cham. https://doi.org/10.1007/978-3-319-25252-0_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-25252-0_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25251-3
Online ISBN: 978-3-319-25252-0
eBook Packages: Computer ScienceComputer Science (R0)