Abstract
In this paper we investigate, analyze and compare sentimental analysis methodologies in Catalan tweets. The main goal is to develop a high-performance Catalan classifier. There are three main steps: Catalan language preprocessing tool, classification model and corpus training. The preprocessing tool is used for cleaning and extracting features from a document (or tweet). This is a key step due to the great morphological complexity of the Catalan language. The tool will remove empty words from the text and find the roots of other words. The classification algorithm will divide the tweet into “positive” and “negative” classes. To choose the best algorithm, five models are compared: Naïve Bayes, Maximum Entropy, Support Vector Machine, Decision Tree and Neural Networks. Finally, the corpus will be used for training and testing these methods. There is no known public corpus in Catalan, so we created one using a lexicon-based approach. This work aims to enable the tools to carry out sentiment analysis studies in the Catalan language. The last step is to develop a public web service with the best classification model achieved where users will be able to check its effectiveness.
Similar content being viewed by others
Notes
GitHup repository: https://github.com/pbalaguer19/catalan-sentiment-analysis
API repository: https://github.com/pbalaguer19/catsent-api/
NumPy: http://www.numpy.org
Pandas: http://pandas.pydata.org
TensorFlow: https://www.tensorflow.org
Language detection service: https://detectlanguage.com
Project on Github: https://github.com/pbalaguer19/catalan-sentiment-analysis
Ranks NL: http://www.ranks.nl/stopwords/catalan
In geometry, a hyperplane is a division of a space into two parts
RESTful Web services are one way of providing interoperability between computer systems on the Internet.
References
N.A. Abdulla, N.A. Ahmed, M.A. Shehab, M. Al-Ayyoub. (2013) Arabic sentiment analysis: Lexiconbased and corpus-based. 2013 IEEE Jordan conference on applied electrical engineering and computing technologies (AEECT) IEEE
Aparicio J, Taule M, Mart MA (2008) AnCora-verb: two large-scale verbal lexicons for Catalan and Spanish. Proceedings of the XIII EURALEX international congress: ISBN 978–84–96742-67-3
Barnes J, Lambert P, Badia T (2018) MultiBooked: A Corpus of Basque and Catalan Hotel Reviews Annotated for Aspect-level Sentiment Classification. CoRR, abs/1803.08614
Bosco C, Lai M, Patti V, Pardo F, Rosso P (2016) Tweeting in the debate about Catalan elections. Proceedings of the tenth international conference on language resources and evaluation (LREC 2016)
Buitinck L et al. (2013) API design for machine learning software: experiences from the scikit learn project. arXiv:1309.0238
van de Camp M, van den Bosch A (2012) The socialist network. Decis Support Syst 53:761–769
Chen CC, Tseng YD (2011) Quality evaluation of product reviews using an information quality framework. Decis Support Syst 50:755–768
Chen T, Xu R, He Y, Wang X (2017) Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN. Expert Syst Appl 72:221–230. ISSN 0957-4174
Cruz FL, Troyano JA, Pontes B, Ortega FJ (2014) ML-SentiCon: Un lexicon multilingüe de polaridades semánticas a nivel de lemas. Procesamiento del Lenguaje Natural :113–120
Dubiau L, Ale JM (2013) Analisis de Sentimientos sobre un Corpus en Español: Experimentación con un Caso de Estudio. ASAI :1850–2784
Duric A, Song F (2012) Feature selection for sentiment analysis based on content and syntax models. Expert Syst Appl 39:9166–9180
Feixa C, Rubio C, Ganau J, Solsona F (2015) L'Emigrant 2.0 : emigració juvenil, nous moviments socials i xarxes digitals. (Col·leccio Estudis ; 35), ISBN 9788439395348
Goeldi A (2011) Website network and advertisement analysis using analytic measurement of online social media content. U.S. patent no. 7,974,983
Huh JH (2018) Big data analysis for personalized health activities: machine learning processing for automatic keyword extraction approach. Symmetry (2018) 10(4):93
Kang H, Yoo SJ, Han D (2012) Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews. Expert Syst Appl 39:6000–6010
Kim Y (2014) Convolutional neural networks for sentence classification: arXiv:1408.5882
Kralj P, Smailovic J, Sluban M (2015) Sentiment of Emojis. PLoS One 10(12):e0144296
Kularathne SD, Dissanayake RB, Samarasinghe ND, Premalal LPG, Premaratne SC (2017) Customer behavior analysis for social media. IJAEMS 3(1). ISSN: 2454-1311
Lane P, Clarke D, Hender P (2012) On developing robust models for favourability analysis: model choice, feature sets and imbalanced data. Decis Support Syst 53:712–718
Loper E, Bird S (2002) NLTK: The Natural Language Toolkit. arXiv:cs/0205028
Mart n MT, Martínez E, Perea JM, Ureña LA (2013) Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches: Expert Syst Appl
Mart nez E, Mart n MT, Perea JM, Urena~ LA (2011) Tecnicas de clasificacion de opiniones aplicadas a un corpus en Español. Procesamiento del Lenguaje Natural :163–170
Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain Shams Engineering Journal: 2090–4479
Mehra N, Khandelwal S, Patel P (2002) Sentiment identification using maximum entropy analysis of movie reviews. Stanford University, USA
Moraes R, Valiati JF, Gaviao WP (2013) ~. Document-level sentiment classification: an empirical comparison between SVM and ANN. Expert Syst Appl 40:621–633
Moreo A, Romero M, Castro JL, Zurita JM (2012) Lexicon-based comments-oriented news sentiment analyzer system. Decis Support Syst 53:704–711
Nogueira dos Santos C, Gatti M (2014) Deep convolutional neural networks for sentiment analysis of short texts: international conference on computational linguistics
Padro L (2011) Analizadores Multilingües en FreeLing. Linguamática: ISSN 1647–0818
Patel D, Saxena S, Verma T (2016) Sentiment analysis using maximum entropy algorithm in big data: International Journal of Innovative Research in Science, Engineering and Technology ISSN: 2319–8753
Petz G et al (2012) On text preprocessing for opinion mining outside of laboratory environments. In: Huang R, Ghorbani A, Pasi G, Yamaguchi T, Yen and Neily, Jin, Beijing (eds) Active media technology, lecture notes in computer science, LNCS 7669. Springer, Berlin Heidelberg, pp 618–629
Petz G et al. (2013) Opinion mining on the web 2.0 - characteristics of user generated content and their impacts. Lecture notes in computer science LNCS 7947. Heidelberg, Berlin Springer :35–46
Petz G et al (2015) Computational approaches for mining user's opinions on the web 2.0. Inf Process Manag 51(4)
Pla F, Hurtado LF (2015) ELiRF-UPV en TASS 2015: Análisis de Sentimientos en Twitter. TASS :75–79
Qu Y, Shanahan J, Wiebe J (2004) Exploring attitude and affect in text: Theories and applications. AAAI Spring Symposium. Technical report SS-04-07. AAAI Press, Menlo Park, CA
Ramirez M, Carrillo M, Sanchez A (2015) Combinación de clasificadores para el análisis de sentimientos. Research in Computing Science :193–206
Rehling JA, Dignan TG (2013) Detailed sentiment analysis. U.S. Patent No. 8,463,595
Rill S, Reinel D, Scheidt J, Zicari RV (2014) Politwi: early detection of emerging political topics on twitter and the impact on concept-level sentiment analysis. Knowl-Based Syst 69:24–33
Seo YS, Huh JH (2019) Automatic emotion-based music classification for supporting intelligent IoT applications. Electronics (2019) 8(2):164
Stojanovski D, Strezoski G, Madjarov G, Dimitrovski I (2015) Twitter sentiment analysis using deep convolutional neural network: HAIS 2015, Bilbao, Spain
Suresh A, Bharathi CR (2016) Sentiment classification using decision tree based feature selection. IJCTA 9(36):419–425
Walker MA, Anand P, Abbott R, Fox JE, Martell C, King J (2012) That is your evidence?: classifying stance in online political debate. Decis Support Syst 53:719–729
Acknowledgements
This work was supported by the Ministerio de Economía y Competitividad under contract TIN2017-84553-C2-2-R. IT, JV, JM, JR and FS are members of the research group 2017-SGR363, funded by the Generalitat de Catalunya. Besides, this research is partly supported by the European Union FEDER (CAPAP-H6 network TIN2016-81840-REDT). The research leading to these results has received funding from RecerCaixa.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Balaguer, P., Teixidó, I., Vilaplana, J. et al. CatSent: a Catalan sentiment analysis website. Multimed Tools Appl 78, 28137–28155 (2019). https://doi.org/10.1007/s11042-019-07877-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-07877-7