Abstract
The recent interests in Sentiment Analysis systems brought the attention on the definition of effective methods to detect opinions and sentiments in texts with a good accuracy. Many approaches that can be found in literature are based on hand-coded resources that model the prior polarity of words or multi-word expressions. The construction of such resources is in general expensive and coverage issues arise with respect to the multiplicity of linguistic phenomena of sentiment expressions. This paper presents an automatic method for deriving a large-scale polarity lexicon based on Distributional Models of lexical semantics. Given a set of sentences annotated with polarity, we transfer the sentiment information from sentences to words. The set of annotated examples is derived from Twitter and the polarity assignment to sentences is derived by simple heuristics. The approach is mostly unsupervised, and the experimental evaluation carried out on two Sentiment Analysis tasks shows the benefits of the generated resource.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
\({o_i^k} = e^{s_i^k}/\sum _{j=1}^{m} e^{s_j^k}\).
- 4.
- 5.
word2vec settings are: min-count=50, window=5, iter=10 and negative=10.
- 6.
- 7.
In order to not have a bias over the query terms, the last token is not used in the combination.
- 8.
The Distributional Model for Italian is acquired with word2vec on 2 million Italian tweet; we used exactly the same emoticons to select the messages and the same learning strategy.
- 9.
We apply a normalization on the resulting vector \({{\varvec{t}}}\) so it has norm \(1\).
- 10.
We tested the Twitter Word Space, but the results were unsatisfactory as of the noise of tweets.
- 11.
References
Baroni, M., Bernardini, S., Ferraresi, A., Zanchetta, E.: The wacky wide web: a collection of very large linguistically processed web-crawled corpora. LRE 43(3), 209–226 (2009)
Basili, R., Pazienza, M.T., Zanzotto, F.M.: Efficient parsing for information extraction. In: ECAI, pp. 135–139 (1998)
Castellucci, G., Filice, S., Croce, D., Basili, R.: Unitor: combining syntactic and semantic kernels for twitter sentiment analysis. In: Proceedings of SemEval, 2nd Joint Conference on Lexical and Computational Semantics (*SEM), pp. 369–374. ACL, Atlanta (2013)
Croce, D., Moschitti, A., Basili, R.: Structured lexical similarity via convolution kernels on dependency trees. In: Proceedings of EMNLP, Edinburgh, Scotland, UK (2011)
Esuli, A., Sebastiani, F.: Sentiwordnet: a publicly available lexical resource for opinion mining. In: Proceedings of 5th Language Resources and Evaluation Conference, pp. 417–422 (2006)
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. Technical report. Stanford University (2009). https://sites.google.com/site/twittersentimenthelp/home
Harris, Z.: Distributional structure. In: Katz, J.J., Fodor, J.A. (eds.) The Philosophy of Linguistics. Oxford University Press, Oxford (1964)
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of 10th International Conference on Knowledge Discovery and Data Mining, pp. 168–177. ACM (2004)
Kiritchenko, S., Zhu, X., Mohammad, S.M.: Sentiment analysis of short informal texts. J. Artif. Intell. Res. 50, 723–762 (2014)
Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of ACL 2003, pp. 423–430 (2003)
Landauer, T., Dumais, S.: A solution to plato’s problem: the latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychological Review 104(2), 211 (1997)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013). http://arxiv.org/abs/1301.3781
Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)
Mohammad, S.M., Turney, P.D.: Emotions evoked by common words and phrases: using mechanical turk to create an emotion lexicon. In: Proceedings of NAACL 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text. ACL (2010)
Nakov, P., Rosenthal, S., Kozareva, Z., Stoyanov, V., Ritter, A., Wilson, T.: Semeval-2013 task 2: sentiment analysis in twitter. In: Proceedings of SemEval, 2nd Joint Conference on Lexical and Computational Semantics (*SEM), pp. 312–320. ACL, Atlanta, June 2013
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2008)
Pontiki, M., Galanis, D., Pavlopoulos, J., Papageorgiou, H., Androutsopoulos, I., Manandhar, S.: Semeval-2014 task 4: aspect based sentiment analysis. In: Proceedings of SemEval (2014)
Rao, D., Ravichandran, D.: Semi-supervised polarity lexicon induction. In: Proceedings of the EACL, pp. 675–682. ACL (2009)
Rifkin, R., Klautau, A.: In defense of one-vs-all classification. J. Mach. Learn. Res. 5, 101–141 (2004)
Rosenthal, S., Ritter, A., Nakov, P., Stoyanov, V.: Semeval-2014 task 9: sentiment analysis in twitter. In: Proceedings SemEval. ACL and Dublin City University (2014)
Sahlgren, M.: The Word-Space Model. Ph.D. thesis, Stockholm University (2006)
San Vicente, I., Agerri, R., Rigau, G.: Simple, robust and (almost) unsupervised generation of polarity lexicons for multiple languages. In: Proceedings of the 14th EACL. ACL (2014)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Stone, P.J., Dunphy, D.C., Smith, M.S., Ogilvie, D.M.: The General Inquirer: A Computer Approach to Content Analysis. MIT Press, Cambridge (1966)
Turney, P.D., Littman, M.L.: Measuring praise and criticism: Inference of semantic orientation from association. ACM Trans. Inf. Syst. 21(4), 315–346 (2003)
Vapnik, V.N.: Statistical Learning Theory. Wiley-Interscience, Wiley (1998)
Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of EMNLP. ACL (2005)
Zhang, Z., Singh, P.M.: Renew: a semi-supervised framework for generating domain-specific lexicons and sentiment analysis. In: Proceedings of ACL, pp. 542–551. ACL (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Castellucci, G., Croce, D., Basili, R. (2015). Acquiring a Large Scale Polarity Lexicon Through Unsupervised Distributional Methods. In: Biemann, C., Handschuh, S., Freitas, A., Meziane, F., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2015. Lecture Notes in Computer Science(), vol 9103. Springer, Cham. https://doi.org/10.1007/978-3-319-19581-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-19581-0_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19580-3
Online ISBN: 978-3-319-19581-0
eBook Packages: Computer ScienceComputer Science (R0)