Skip to main content
Log in

CatSent: a Catalan sentiment analysis website

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper we investigate, analyze and compare sentimental analysis methodologies in Catalan tweets. The main goal is to develop a high-performance Catalan classifier. There are three main steps: Catalan language preprocessing tool, classification model and corpus training. The preprocessing tool is used for cleaning and extracting features from a document (or tweet). This is a key step due to the great morphological complexity of the Catalan language. The tool will remove empty words from the text and find the roots of other words. The classification algorithm will divide the tweet into “positive” and “negative” classes. To choose the best algorithm, five models are compared: Naïve Bayes, Maximum Entropy, Support Vector Machine, Decision Tree and Neural Networks. Finally, the corpus will be used for training and testing these methods. There is no known public corpus in Catalan, so we created one using a lexicon-based approach. This work aims to enable the tools to carry out sentiment analysis studies in the Catalan language. The last step is to develop a public web service with the best classification model achieved where users will be able to check its effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. GitHup repository: https://github.com/pbalaguer19/catalan-sentiment-analysis

  2. API repository: https://github.com/pbalaguer19/catsent-api/

  3. NumPy: http://www.numpy.org

  4. Pandas: http://pandas.pydata.org

  5. TensorFlow: https://www.tensorflow.org

  6. Language detection service: https://detectlanguage.com

  7. Project on Github: https://github.com/pbalaguer19/catalan-sentiment-analysis

  8. Ranks NL: http://www.ranks.nl/stopwords/catalan

  9. LaTeL: http://latel.upf.edu/morgana/altres/pub/ca_stop.htm

  10. In geometry, a hyperplane is a division of a space into two parts

  11. http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow

  12. https://github.com/dennybritz/cnn-text-classification-tf

  13. https://github.com/dennybritz/cnn-text-classification-tf

  14. RESTful Web services are one way of providing interoperability between computer systems on the Internet.

References

  1. N.A. Abdulla, N.A. Ahmed, M.A. Shehab, M. Al-Ayyoub. (2013) Arabic sentiment analysis: Lexiconbased and corpus-based. 2013 IEEE Jordan conference on applied electrical engineering and computing technologies (AEECT) IEEE

  2. Aparicio J, Taule M, Mart MA (2008) AnCora-verb: two large-scale verbal lexicons for Catalan and Spanish. Proceedings of the XIII EURALEX international congress: ISBN 978–84–96742-67-3

  3. Barnes J, Lambert P, Badia T (2018) MultiBooked: A Corpus of Basque and Catalan Hotel Reviews Annotated for Aspect-level Sentiment Classification. CoRR, abs/1803.08614

  4. Bosco C, Lai M, Patti V, Pardo F, Rosso P (2016) Tweeting in the debate about Catalan elections. Proceedings of the tenth international conference on language resources and evaluation (LREC 2016)

  5. Buitinck L et al. (2013) API design for machine learning software: experiences from the scikit learn project. arXiv:1309.0238

  6. van de Camp M, van den Bosch A (2012) The socialist network. Decis Support Syst 53:761–769

    Article  Google Scholar 

  7. Chen CC, Tseng YD (2011) Quality evaluation of product reviews using an information quality framework. Decis Support Syst 50:755–768

    Article  Google Scholar 

  8. Chen T, Xu R, He Y, Wang X (2017) Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN. Expert Syst Appl 72:221–230. ISSN 0957-4174

    Article  Google Scholar 

  9. Cruz FL, Troyano JA, Pontes B, Ortega FJ (2014) ML-SentiCon: Un lexicon multilingüe de polaridades semánticas a nivel de lemas. Procesamiento del Lenguaje Natural :113–120

  10. Dubiau L, Ale JM (2013) Analisis de Sentimientos sobre un Corpus en Español: Experimentación con un Caso de Estudio. ASAI :1850–2784

  11. Duric A, Song F (2012) Feature selection for sentiment analysis based on content and syntax models. Expert Syst Appl 39:9166–9180

    Article  Google Scholar 

  12. Feixa C, Rubio C, Ganau J, Solsona F (2015) L'Emigrant 2.0 : emigració juvenil, nous moviments socials i xarxes digitals. (Col·leccio Estudis ; 35), ISBN 9788439395348

  13. Goeldi A (2011) Website network and advertisement analysis using analytic measurement of online social media content. U.S. patent no. 7,974,983

  14. Huh JH (2018) Big data analysis for personalized health activities: machine learning processing for automatic keyword extraction approach. Symmetry (2018) 10(4):93

    Article  Google Scholar 

  15. Kang H, Yoo SJ, Han D (2012) Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews. Expert Syst Appl 39:6000–6010

    Article  Google Scholar 

  16. Kim Y (2014) Convolutional neural networks for sentence classification: arXiv:1408.5882

  17. Kralj P, Smailovic J, Sluban M (2015) Sentiment of Emojis. PLoS One 10(12):e0144296

    Article  Google Scholar 

  18. Kularathne SD, Dissanayake RB, Samarasinghe ND, Premalal LPG, Premaratne SC (2017) Customer behavior analysis for social media. IJAEMS 3(1). ISSN: 2454-1311

  19. Lane P, Clarke D, Hender P (2012) On developing robust models for favourability analysis: model choice, feature sets and imbalanced data. Decis Support Syst 53:712–718

    Article  Google Scholar 

  20. Loper E, Bird S (2002) NLTK: The Natural Language Toolkit. arXiv:cs/0205028

  21. Mart n MT, Martínez E, Perea JM, Ureña LA (2013) Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches: Expert Syst Appl

  22. Mart nez E, Mart n MT, Perea JM, Urena~ LA (2011) Tecnicas de clasificacion de opiniones aplicadas a un corpus en Español. Procesamiento del Lenguaje Natural :163–170

  23. Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain Shams Engineering Journal: 2090–4479

  24. Mehra N, Khandelwal S, Patel P (2002) Sentiment identification using maximum entropy analysis of movie reviews. Stanford University, USA

    Google Scholar 

  25. Moraes R, Valiati JF, Gaviao WP (2013) ~. Document-level sentiment classification: an empirical comparison between SVM and ANN. Expert Syst Appl 40:621–633

    Article  Google Scholar 

  26. Moreo A, Romero M, Castro JL, Zurita JM (2012) Lexicon-based comments-oriented news sentiment analyzer system. Decis Support Syst 53:704–711

    Article  Google Scholar 

  27. Nogueira dos Santos C, Gatti M (2014) Deep convolutional neural networks for sentiment analysis of short texts: international conference on computational linguistics

  28. Padro L (2011) Analizadores Multilingües en FreeLing. Linguamática: ISSN 1647–0818

  29. Patel D, Saxena S, Verma T (2016) Sentiment analysis using maximum entropy algorithm in big data: International Journal of Innovative Research in Science, Engineering and Technology ISSN: 2319–8753

  30. Petz G et al (2012) On text preprocessing for opinion mining outside of laboratory environments. In: Huang R, Ghorbani A, Pasi G, Yamaguchi T, Yen and Neily, Jin, Beijing (eds) Active media technology, lecture notes in computer science, LNCS 7669. Springer, Berlin Heidelberg, pp 618–629

    Google Scholar 

  31. Petz G et al. (2013) Opinion mining on the web 2.0 - characteristics of user generated content and their impacts. Lecture notes in computer science LNCS 7947. Heidelberg, Berlin Springer :35–46

  32. Petz G et al (2015) Computational approaches for mining user's opinions on the web 2.0. Inf Process Manag 51(4)

  33. Pla F, Hurtado LF (2015) ELiRF-UPV en TASS 2015: Análisis de Sentimientos en Twitter. TASS :75–79

  34. Qu Y, Shanahan J, Wiebe J (2004) Exploring attitude and affect in text: Theories and applications. AAAI Spring Symposium. Technical report SS-04-07. AAAI Press, Menlo Park, CA

  35. Ramirez M, Carrillo M, Sanchez A (2015) Combinación de clasificadores para el análisis de sentimientos. Research in Computing Science :193–206

  36. Rehling JA, Dignan TG (2013) Detailed sentiment analysis. U.S. Patent No. 8,463,595

  37. Rill S, Reinel D, Scheidt J, Zicari RV (2014) Politwi: early detection of emerging political topics on twitter and the impact on concept-level sentiment analysis. Knowl-Based Syst 69:24–33

    Article  Google Scholar 

  38. Seo YS, Huh JH (2019) Automatic emotion-based music classification for supporting intelligent IoT applications. Electronics (2019) 8(2):164

    Article  Google Scholar 

  39. Stojanovski D, Strezoski G, Madjarov G, Dimitrovski I (2015) Twitter sentiment analysis using deep convolutional neural network: HAIS 2015, Bilbao, Spain

  40. Suresh A, Bharathi CR (2016) Sentiment classification using decision tree based feature selection. IJCTA 9(36):419–425

    Google Scholar 

  41. Walker MA, Anand P, Abbott R, Fox JE, Martell C, King J (2012) That is your evidence?: classifying stance in online political debate. Decis Support Syst 53:719–729

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the Ministerio de Economía y Competitividad under contract TIN2017-84553-C2-2-R. IT, JV, JM, JR and FS are members of the research group 2017-SGR363, funded by the Generalitat de Catalunya. Besides, this research is partly supported by the European Union FEDER (CAPAP-H6 network TIN2016-81840-REDT). The research leading to these results has received funding from RecerCaixa.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francesc Solsona.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Balaguer, P., Teixidó, I., Vilaplana, J. et al. CatSent: a Catalan sentiment analysis website. Multimed Tools Appl 78, 28137–28155 (2019). https://doi.org/10.1007/s11042-019-07877-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-07877-7

Keywords

Navigation