Abstract
This paper deals with automatic two class document-level sentiment classification. We retrieved textual documents with political, business, economic and financial content from five Slovenian web media. By annotating a sample of 10,427 documents, we obtained a labelled corpus in the Slovenian language. Five classifiers were evaluated on this corpus: multinomial naïve Bayes, support vector machines, random forest, k-nearest neighbour and naïve Bayes, out of which the first three were used also in the assessment of the pre-processing options. Among the selected classifiers, multinomial naïve Bayes outperforms the naïve Bayes, k-nearest neighbour, random forest and support vector machines classifier in terms of classification accuracy. The best selection of pre-processing options achieves more than 95 % classification accuracy with Naïve Bayes Multinomial and more than 85 % with support vector machines and random forest classifier.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aha, D.W., Kibler, D., Albert, M.A.: Instance-based learning algorithms. Mach. Learn. 6, 37–66 (1991)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Cortes, C., Vapnik, V.: Support vector networks. Mach. Learn. 20, 273–297 (1995)
Cronbach, L.J.: Coefficient alpha and the internal structure of tests. Psychometrika 16, 297–334 (1951)
Das, S.R., Chen, M.Y.: Yahoo! for amazon: Extracting market sentiment from stock message boards. In: Proceedings of the Asia Pacific Finance Association Annual Conference (APFA) (2001)
Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In: Proceedings of the World Wide Web Conference (2003)
Godbole, N., Srinivasaiah, M., Skiena, S.: Large-scale sentiment analysis for news and blogs. Proc. Int. Conf. Weblogs Soc. Media 2, 1–4 (2007)
Hatzivassiloglou, V., McKeown, K.R.: Predicting the semantic orientation of adjectives. In: Proceedings of the 8th Conference on European Chapter of the Association for Computational Linguistics, pp. 174–181 (1997)
Hrala, M., Král, P.: Evaluation of the document classification approaches. In: Proceedings of the 8th International Conference on Computer Recognition Systems CORES, pp. 877–885 (2013)
Internet World Stats, World Internet Users and 2014 Population Stats (2014), http://www.internetworldstats.com/stats.htm. Accessed 10 Mar 2015
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Proceedings of the European Conference on Machine Learning, pp. 137–142 (1998)
Lewis, D.D.: Naïe (Bayes) at forty: the independent assumption in information retrieval. Mach. Learn.: ECML 98, 4–15 (1998)
Likert, R.: A technique for the measurement of attitudes. Arch. Psychol. 22, 1–55 (1932)
Liu, B.: Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, pp. 469–492. Springer, New York (2011)
McCallum, A., Nigam, K.: A comparison of event models for Naïve Bayes text classification. In: AAAI-98 Workshop on Learning for Text Categorization, vol. 752, pp. 41–48 (1998)
Paliouras, G., Papatheodorou, C., Karkaletsis, V., Spyropoulos, C.: Discovering user communities on the internet using unsupervised machine learning techniques. Interact. Comput. 14, 761–791 (2002)
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2, 1–135 (2008)
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, pp. 79–86 (2002)
Smailović, J., Grčar, M., Lavrač, N., Žnidaršič, M.: Predictive sentiment analysis of tweets: A stock market application. In: Human-Computer Interaction and Knowledge Discovery in Complex, Unstructured, Big Data, pp. 77–88 (2013)
Stone, G.C., Grusin, E.: Network TV as the Bad News Bearer. Journal. Q. 61, 517 (1984)
Turney, P.D.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp. 417–424 (2002)
Web Technology Surveys Usage of content languages for websites (2011), http://w3techs.com/technologies/overview/content_language/all. Accessed 08 Mar 2015
Wright, A.: Mining the Web for Feelings, Not Facts. New York Times 24 (2009)
Acknowledgments
Work supported by Creative Core FISNM-3330-13-500033 ‘Simulations’ project funded by the European Union, The European Regional Development Fund and Young Researcher Programme by Slovenian Research Agency. The operation is carried out within the framework of the Operational Programme for Strengthening Regional Development Potentials for the period 2007–2013, Development Priority 1: Competitiveness and research excellence, Priority Guideline 1.1: Improving the competitive skills and research excellence.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Bučar, J., Povh, J., Žnidaršič, M. (2016). Sentiment Classification of the Slovenian News Texts. In: Burduk, R., Jackowski, K., Kurzyński, M., Woźniak, M., Żołnierek, A. (eds) Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015. Advances in Intelligent Systems and Computing, vol 403. Springer, Cham. https://doi.org/10.1007/978-3-319-26227-7_73
Download citation
DOI: https://doi.org/10.1007/978-3-319-26227-7_73
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26225-3
Online ISBN: 978-3-319-26227-7
eBook Packages: EngineeringEngineering (R0)