Skip to main content

Sentiment Analysis for Brazilian Portuguese over a Skewed Class Corpora

  • Conference paper
  • First Online:
Book cover Computational Processing of the Portuguese Language (PROPOR 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9727))

Abstract

The goal of this paper is to compare existing sentiment analysis models, namely Doc2Vec and Recursive Neural Tensor Network, when applied to a skewed class corpus. Such setting is not uncommon, but the literature lacks results on it. We used two techniques to create more balance between classes: under-sampling and over-sampling the target corpora. Doc2Vec achieved the best result overall on the skewed classes, but performed poorly over small and sampled configurations. RNTN achieved the best result on the over-sampled corpus. The Naive Bayes baseline was not surpassed in the under-sampled corpus with Pos/Neg classes, which was the smallest corpus configuration.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    www.skoob.com.br.

  2. 2.

    http://nlp.stanford.edu/software/lex-parser.shtml.

  3. 3.

    http://radimrehurek.com/gensim/models/doc2vec.html.

References

  1. Alm, C.O., Roth, D., Sproat, R.: Emotions from text: machine learning for text-based emotion prediction. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 579–586 (2005)

    Google Scholar 

  2. Araújo, M., Gonçalves, P., Benevenuto, F.: Measuring sentiments in online social networks. In: Proceedings of the 19th Brazilian Symposium on Multimedia and the web, pp. 97–104 (2013)

    Google Scholar 

  3. Brum, H., Kepler, F.: Análise de sentimentos para português brasileiro usando redes neurais recursivas. In: IV Workshop de Iniciação Cientifica em Tecnologia da Informação e da Linguagem Humana (TiLic) (2015)

    Google Scholar 

  4. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)

    Google Scholar 

  5. Esuli, A., Sebastiani, F.: Sentiwordnet: a publicly available lexical resource for opinion mining. In: Proceedings of LREC, vol. 6, pp. 417–422 (2006)

    Google Scholar 

  6. Freitas, C., Motta, E., Milidiú, R., César, J.: Vampiro que brilha.. rá! Desafios na anotaçao de opiniao em um corpus de resenhas de livros. 11 Encontro de Linguistica de Corpus (2012)

    Google Scholar 

  7. Godbole, N., Srinivasaiah, M., Skiena, S.: Large-Scale Sentiment Analysis for News and Blogs. In: ICWSM, vol. 7 (2007)

    Google Scholar 

  8. Le, Q., Mikolov, T.: Distributed Representations of Sentences and Documents CoRR (2014)

    Google Scholar 

  9. Mikolov, T., Chen, K., Conrrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space CoRR (2013)

    Google Scholar 

  10. Monard, M.C., Batista, G.: Learning with skewed class distributions. Adv. Logic, Artif. Intell. Robot. LAPTEC 2002 85, 173 (2002)

    Google Scholar 

  11. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2, 1–135 (2008)

    Article  Google Scholar 

  12. Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 115–124 (2005)

    Google Scholar 

  13. Pennebaker, J.W., Francis, M.E., Booth, R.J.: Linguistic inquiry and word count: LIWC 2001. Mahway: Lawrence Erlbaum Associates 71, 2001 (2001)

    Google Scholar 

  14. Socher, R., Alex, P., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1631–1642 (2013)

    Google Scholar 

Download references

Acknowledgments

This work was partially supported by national funds through Fundação para a Ciência e a Tecnologia (FCT) with reference UID/CEC/50021/2013.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Henrico Brum .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Brum, H., Araujo, F., Kepler, F. (2016). Sentiment Analysis for Brazilian Portuguese over a Skewed Class Corpora. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds) Computational Processing of the Portuguese Language. PROPOR 2016. Lecture Notes in Computer Science(), vol 9727. Springer, Cham. https://doi.org/10.1007/978-3-319-41552-9_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41552-9_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41551-2

  • Online ISBN: 978-3-319-41552-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics