Skip to main content

Acquiring a Large Scale Polarity Lexicon Through Unsupervised Distributional Methods

  • Conference paper
  • First Online:
Natural Language Processing and Information Systems (NLDB 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9103))

Abstract

The recent interests in Sentiment Analysis systems brought the attention on the definition of effective methods to detect opinions and sentiments in texts with a good accuracy. Many approaches that can be found in literature are based on hand-coded resources that model the prior polarity of words or multi-word expressions. The construction of such resources is in general expensive and coverage issues arise with respect to the multiplicity of linguistic phenomena of sentiment expressions. This paper presents an automatic method for deriving a large-scale polarity lexicon based on Distributional Models of lexical semantics. Given a set of sentences annotated with polarity, we transfer the sentiment information from sentences to words. The set of annotated examples is derived from Twitter and the polarity assignment to sentences is derived by simple heuristics. The approach is mostly unsupervised, and the experimental evaluation carried out on two Sentiment Analysis tasks shows the benefits of the generated resource.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.twitter.com.

  2. 2.

    http://sag.art.uniroma2.it/distributional-polarity-lexicon.

  3. 3.

    \({o_i^k} = e^{s_i^k}/\sum _{j=1}^{m} e^{s_j^k}\).

  4. 4.

    https://code.google.com/p/word2vec/.

  5. 5.

    word2vec settings are: min-count=50, window=5, iter=10 and negative=10.

  6. 6.

    http://sag.art.uniroma2.it/demo-and-software/kelp.

  7. 7.

    In order to not have a bias over the query terms, the last token is not used in the combination.

  8. 8.

    The Distributional Model for Italian is acquired with word2vec on 2 million Italian tweet; we used exactly the same emoticons to select the messages and the same learning strategy.

  9. 9.

    We apply a normalization on the resulting vector \({{\varvec{t}}}\) so it has norm \(1\).

  10. 10.

    We tested the Twitter Word Space, but the results were unsatisfactory as of the noise of tweets.

  11. 11.

    http://sifaka.cs.uiuc.edu/~wang296/Data/index.html.

References

  1. Baroni, M., Bernardini, S., Ferraresi, A., Zanchetta, E.: The wacky wide web: a collection of very large linguistically processed web-crawled corpora. LRE 43(3), 209–226 (2009)

    Google Scholar 

  2. Basili, R., Pazienza, M.T., Zanzotto, F.M.: Efficient parsing for information extraction. In: ECAI, pp. 135–139 (1998)

    Google Scholar 

  3. Castellucci, G., Filice, S., Croce, D., Basili, R.: Unitor: combining syntactic and semantic kernels for twitter sentiment analysis. In: Proceedings of SemEval, 2nd Joint Conference on Lexical and Computational Semantics (*SEM), pp. 369–374. ACL, Atlanta (2013)

    Google Scholar 

  4. Croce, D., Moschitti, A., Basili, R.: Structured lexical similarity via convolution kernels on dependency trees. In: Proceedings of EMNLP, Edinburgh, Scotland, UK (2011)

    Google Scholar 

  5. Esuli, A., Sebastiani, F.: Sentiwordnet: a publicly available lexical resource for opinion mining. In: Proceedings of 5th Language Resources and Evaluation Conference, pp. 417–422 (2006)

    Google Scholar 

  6. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)

    MATH  Google Scholar 

  7. Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. Technical report. Stanford University (2009). https://sites.google.com/site/twittersentimenthelp/home

  8. Harris, Z.: Distributional structure. In: Katz, J.J., Fodor, J.A. (eds.) The Philosophy of Linguistics. Oxford University Press, Oxford (1964)

    Google Scholar 

  9. Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of 10th International Conference on Knowledge Discovery and Data Mining, pp. 168–177. ACM (2004)

    Google Scholar 

  10. Kiritchenko, S., Zhu, X., Mohammad, S.M.: Sentiment analysis of short informal texts. J. Artif. Intell. Res. 50, 723–762 (2014)

    MATH  Google Scholar 

  11. Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of ACL 2003, pp. 423–430 (2003)

    Google Scholar 

  12. Landauer, T., Dumais, S.: A solution to plato’s problem: the latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychological Review 104(2), 211 (1997)

    Article  Google Scholar 

  13. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013). http://arxiv.org/abs/1301.3781

  14. Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  15. Mohammad, S.M., Turney, P.D.: Emotions evoked by common words and phrases: using mechanical turk to create an emotion lexicon. In: Proceedings of NAACL 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text. ACL (2010)

    Google Scholar 

  16. Nakov, P., Rosenthal, S., Kozareva, Z., Stoyanov, V., Ritter, A., Wilson, T.: Semeval-2013 task 2: sentiment analysis in twitter. In: Proceedings of SemEval, 2nd Joint Conference on Lexical and Computational Semantics (*SEM), pp. 312–320. ACL, Atlanta, June 2013

    Google Scholar 

  17. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2008)

    Article  Google Scholar 

  18. Pontiki, M., Galanis, D., Pavlopoulos, J., Papageorgiou, H., Androutsopoulos, I., Manandhar, S.: Semeval-2014 task 4: aspect based sentiment analysis. In: Proceedings of SemEval (2014)

    Google Scholar 

  19. Rao, D., Ravichandran, D.: Semi-supervised polarity lexicon induction. In: Proceedings of the EACL, pp. 675–682. ACL (2009)

    Google Scholar 

  20. Rifkin, R., Klautau, A.: In defense of one-vs-all classification. J. Mach. Learn. Res. 5, 101–141 (2004)

    MATH  MathSciNet  Google Scholar 

  21. Rosenthal, S., Ritter, A., Nakov, P., Stoyanov, V.: Semeval-2014 task 9: sentiment analysis in twitter. In: Proceedings SemEval. ACL and Dublin City University (2014)

    Google Scholar 

  22. Sahlgren, M.: The Word-Space Model. Ph.D. thesis, Stockholm University (2006)

    Google Scholar 

  23. San Vicente, I., Agerri, R., Rigau, G.: Simple, robust and (almost) unsupervised generation of polarity lexicons for multiple languages. In: Proceedings of the 14th EACL. ACL (2014)

    Google Scholar 

  24. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)

    Book  Google Scholar 

  25. Stone, P.J., Dunphy, D.C., Smith, M.S., Ogilvie, D.M.: The General Inquirer: A Computer Approach to Content Analysis. MIT Press, Cambridge (1966)

    Google Scholar 

  26. Turney, P.D., Littman, M.L.: Measuring praise and criticism: Inference of semantic orientation from association. ACM Trans. Inf. Syst. 21(4), 315–346 (2003)

    Article  Google Scholar 

  27. Vapnik, V.N.: Statistical Learning Theory. Wiley-Interscience, Wiley (1998)

    MATH  Google Scholar 

  28. Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of EMNLP. ACL (2005)

    Google Scholar 

  29. Zhang, Z., Singh, P.M.: Renew: a semi-supervised framework for generating domain-specific lexicons and sentiment analysis. In: Proceedings of ACL, pp. 542–551. ACL (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giuseppe Castellucci .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Castellucci, G., Croce, D., Basili, R. (2015). Acquiring a Large Scale Polarity Lexicon Through Unsupervised Distributional Methods. In: Biemann, C., Handschuh, S., Freitas, A., Meziane, F., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2015. Lecture Notes in Computer Science(), vol 9103. Springer, Cham. https://doi.org/10.1007/978-3-319-19581-0_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19581-0_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19580-3

  • Online ISBN: 978-3-319-19581-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics