Skip to main content

Sentiment Dictionary Refinement Using Word Embeddings

  • Conference paper
  • First Online:
Foundations of Intelligent Systems (ISMIS 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9384))

Included in the following conference series:

Abstract

Previous works on Polish sentiment dictionaries revealed the superiority of machine learning on vectors created from word contexts (concordances or word co-occurrence distributions), especially compared to the SO-PMI method (semantic orientation of pointwise mutual information). This paper demonstrates that this state-of-the-art method could be improved upon when extending the vectors by word embeddings, obtained from skip-gram language models. Specifically, it proposes a new method of computing word sentiment polarity using feature sets composed of vectors created from word embeddings and word co-occurrence distributions. The new technique is evaluated in a number of experimental settings.

This work was funded by the National Science Centre of Poland grant nr UMO-2012/05/N/ST6/03587.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Micro means to calculate metrics globally by counting the total true positives, false negatives and false positives. Thus, it takes into account label imbalance. The formula is: \(F1 = 2 * (precision * recall) / (precision + recall)\) .

  2. 2.

    Macro means to calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.

References

  1. Stone, P.J., Dunphy, D.C., Ogilvie, D.M., Smith, M.S.: The General Inquirer: A Computer Approach to Content Analysis. MIT Press, Cambridge (1966)

    Google Scholar 

  2. Liu, B.: Sentiment Analysis and Opinion Mining. Morgan and Claypool Publishers (2012)

    Google Scholar 

  3. Zhu, X., Kiritchenko, S., Mohammad, S.: Nrc-canada-2014: Recent improvements in the sentiment analysis of tweets. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), Dublin, Ireland, Association for Computational Linguistics and Dublin City University, pp. 443–447, August 2014

    Google Scholar 

  4. Wawer, A., Rogozinska, D.: How much supervision? Corpus-based lexeme sentiment estimation. In: 2012 IEEE 12th International Conference on Data Mining Workshops (SENTIRE 2012), Los Alamitos, CA, USA, pp. 724–730. IEEE Computer Society (2012)

    Google Scholar 

  5. Turney, P., Littman, M.: Measuring praise and criticism: Inference of semantic orientation from association. ACM Trans. Inf. Syst. 21, 315–346 (2003)

    Article  Google Scholar 

  6. Grefenstette, G., Qu, Y., Evans, D.A., Shanahan, J.G.: In: Validating the Coverage of Lexical Resources for Affect Analysis and Automatically Classifying New Words along Semantic Axes. Springer, Netherlands (2006)

    Google Scholar 

  7. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS (2013)

    Google Scholar 

  8. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013)

    Google Scholar 

  9. Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C., Ng, A., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Conference on Empirical Methods in Natural Language Processing (2013)

    Google Scholar 

  10. Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: The Twenty-Eighth Annual Conference on Neural Information Processing Systems (NIPS 2014) (2014)

    Google Scholar 

  11. Tiedemann, J.: Parallel data, tools and interfaces in opus. In: Chair, N.C.C., Choukri, K., Declerck, T., Dogan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., (eds.): Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey, European Language Resources Association (ELRA), May 2012

    Google Scholar 

  12. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: A library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)

    MATH  Google Scholar 

  13. Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, pp. 1532–1543 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aleksander Wawer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Wawer, A. (2015). Sentiment Dictionary Refinement Using Word Embeddings. In: Esposito, F., Pivert, O., Hacid, MS., Rás, Z., Ferilli, S. (eds) Foundations of Intelligent Systems. ISMIS 2015. Lecture Notes in Computer Science(), vol 9384. Springer, Cham. https://doi.org/10.1007/978-3-319-25252-0_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25252-0_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25251-3

  • Online ISBN: 978-3-319-25252-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics