Abstract
The goal of this paper is to compare existing sentiment analysis models, namely Doc2Vec and Recursive Neural Tensor Network, when applied to a skewed class corpus. Such setting is not uncommon, but the literature lacks results on it. We used two techniques to create more balance between classes: under-sampling and over-sampling the target corpora. Doc2Vec achieved the best result overall on the skewed classes, but performed poorly over small and sampled configurations. RNTN achieved the best result on the over-sampled corpus. The Naive Bayes baseline was not surpassed in the under-sampled corpus with Pos/Neg classes, which was the smallest corpus configuration.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alm, C.O., Roth, D., Sproat, R.: Emotions from text: machine learning for text-based emotion prediction. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 579–586 (2005)
Araújo, M., Gonçalves, P., Benevenuto, F.: Measuring sentiments in online social networks. In: Proceedings of the 19th Brazilian Symposium on Multimedia and the web, pp. 97–104 (2013)
Brum, H., Kepler, F.: Análise de sentimentos para português brasileiro usando redes neurais recursivas. In: IV Workshop de Iniciação Cientifica em Tecnologia da Informação e da Linguagem Humana (TiLic) (2015)
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)
Esuli, A., Sebastiani, F.: Sentiwordnet: a publicly available lexical resource for opinion mining. In: Proceedings of LREC, vol. 6, pp. 417–422 (2006)
Freitas, C., Motta, E., Milidiú, R., César, J.: Vampiro que brilha.. rá! Desafios na anotaçao de opiniao em um corpus de resenhas de livros. 11 Encontro de Linguistica de Corpus (2012)
Godbole, N., Srinivasaiah, M., Skiena, S.: Large-Scale Sentiment Analysis for News and Blogs. In: ICWSM, vol. 7 (2007)
Le, Q., Mikolov, T.: Distributed Representations of Sentences and Documents CoRR (2014)
Mikolov, T., Chen, K., Conrrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space CoRR (2013)
Monard, M.C., Batista, G.: Learning with skewed class distributions. Adv. Logic, Artif. Intell. Robot. LAPTEC 2002 85, 173 (2002)
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2, 1–135 (2008)
Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 115–124 (2005)
Pennebaker, J.W., Francis, M.E., Booth, R.J.: Linguistic inquiry and word count: LIWC 2001. Mahway: Lawrence Erlbaum Associates 71, 2001 (2001)
Socher, R., Alex, P., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1631–1642 (2013)
Acknowledgments
This work was partially supported by national funds through Fundação para a Ciência e a Tecnologia (FCT) with reference UID/CEC/50021/2013.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Brum, H., Araujo, F., Kepler, F. (2016). Sentiment Analysis for Brazilian Portuguese over a Skewed Class Corpora. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds) Computational Processing of the Portuguese Language. PROPOR 2016. Lecture Notes in Computer Science(), vol 9727. Springer, Cham. https://doi.org/10.1007/978-3-319-41552-9_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-41552-9_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41551-2
Online ISBN: 978-3-319-41552-9
eBook Packages: Computer ScienceComputer Science (R0)