Sentiment Classification of the Slovenian News Texts

Bučar, Jože; Povh, Janez; Žnidaršič, Martin

doi:10.1007/978-3-319-26227-7_73

Jože Bučar⁷,
Janez Povh⁷ &
Martin Žnidaršič⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 403))

977 Accesses
1 Citations

Abstract

This paper deals with automatic two class document-level sentiment classification. We retrieved textual documents with political, business, economic and financial content from five Slovenian web media. By annotating a sample of 10,427 documents, we obtained a labelled corpus in the Slovenian language. Five classifiers were evaluated on this corpus: multinomial naïve Bayes, support vector machines, random forest, k-nearest neighbour and naïve Bayes, out of which the first three were used also in the assessment of the pre-processing options. Among the selected classifiers, multinomial naïve Bayes outperforms the naïve Bayes, k-nearest neighbour, random forest and support vector machines classifier in terms of classification accuracy. The best selection of pre-processing options achieves more than 95 % classification accuracy with Naïve Bayes Multinomial and more than 85 % with support vector machines and random forest classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aha, D.W., Kibler, D., Albert, M.A.: Instance-based learning algorithms. Mach. Learn. 6, 37–66 (1991)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article MATH Google Scholar
Cortes, C., Vapnik, V.: Support vector networks. Mach. Learn. 20, 273–297 (1995)
MATH Google Scholar
Cronbach, L.J.: Coefficient alpha and the internal structure of tests. Psychometrika 16, 297–334 (1951)
Article Google Scholar
Das, S.R., Chen, M.Y.: Yahoo! for amazon: Extracting market sentiment from stock message boards. In: Proceedings of the Asia Pacific Finance Association Annual Conference (APFA) (2001)
Google Scholar
Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In: Proceedings of the World Wide Web Conference (2003)
Google Scholar
Godbole, N., Srinivasaiah, M., Skiena, S.: Large-scale sentiment analysis for news and blogs. Proc. Int. Conf. Weblogs Soc. Media 2, 1–4 (2007)
Google Scholar
Hatzivassiloglou, V., McKeown, K.R.: Predicting the semantic orientation of adjectives. In: Proceedings of the 8th Conference on European Chapter of the Association for Computational Linguistics, pp. 174–181 (1997)
Google Scholar
Hrala, M., Král, P.: Evaluation of the document classification approaches. In: Proceedings of the 8th International Conference on Computer Recognition Systems CORES, pp. 877–885 (2013)
Google Scholar
Internet World Stats, World Internet Users and 2014 Population Stats (2014), http://www.internetworldstats.com/stats.htm. Accessed 10 Mar 2015
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Proceedings of the European Conference on Machine Learning, pp. 137–142 (1998)
Google Scholar
Lewis, D.D.: Naïe (Bayes) at forty: the independent assumption in information retrieval. Mach. Learn.: ECML 98, 4–15 (1998)
Google Scholar
Likert, R.: A technique for the measurement of attitudes. Arch. Psychol. 22, 1–55 (1932)
Google Scholar
Liu, B.: Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, pp. 469–492. Springer, New York (2011)
Book MATH Google Scholar
McCallum, A., Nigam, K.: A comparison of event models for Naïve Bayes text classification. In: AAAI-98 Workshop on Learning for Text Categorization, vol. 752, pp. 41–48 (1998)
Google Scholar
Paliouras, G., Papatheodorou, C., Karkaletsis, V., Spyropoulos, C.: Discovering user communities on the internet using unsupervised machine learning techniques. Interact. Comput. 14, 761–791 (2002)
Article Google Scholar
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2, 1–135 (2008)
Article Google Scholar
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, pp. 79–86 (2002)
Google Scholar
Smailović, J., Grčar, M., Lavrač, N., Žnidaršič, M.: Predictive sentiment analysis of tweets: A stock market application. In: Human-Computer Interaction and Knowledge Discovery in Complex, Unstructured, Big Data, pp. 77–88 (2013)
Google Scholar
Stone, G.C., Grusin, E.: Network TV as the Bad News Bearer. Journal. Q. 61, 517 (1984)
Article Google Scholar
Turney, P.D.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp. 417–424 (2002)
Google Scholar
Web Technology Surveys Usage of content languages for websites (2011), http://w3techs.com/technologies/overview/content_language/all. Accessed 08 Mar 2015
Wright, A.: Mining the Web for Feelings, Not Facts. New York Times 24 (2009)
Google Scholar

Download references

Acknowledgments

Work supported by Creative Core FISNM-3330-13-500033 ‘Simulations’ project funded by the European Union, The European Regional Development Fund and Young Researcher Programme by Slovenian Research Agency. The operation is carried out within the framework of the Operational Programme for Strengthening Regional Development Potentials for the period 2007–2013, Development Priority 1: Competitiveness and research excellence, Priority Guideline 1.1: Improving the competitive skills and research excellence.

Author information

Authors and Affiliations

Faculty of Information Studies, Laboratory of Data Technologies, Ulica talcev 3, 8000, Novo mesto, Slovenia
Jože Bučar & Janez Povh
Department of Knowledge Technologies, Jožef Stefan Institute, Jamova cesta 39, 1000, Ljubljana, Slovenia
Martin Žnidaršič

Authors

Jože Bučar
View author publications
You can also search for this author in PubMed Google Scholar
Janez Povh
View author publications
You can also search for this author in PubMed Google Scholar
Martin Žnidaršič
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jože Bučar .

Editor information

Editors and Affiliations

Department of Systems, Wrocław University of Technology, Wroclaw, Poland
Robert Burduk
Department of Systems and Computer, Wrocław University of Technology, Wroclaw, Poland
Konrad Jackowski
Department of Systems and Computer, Wrocław University of Technology, Wroclaw, Poland
Marek Kurzyński
Dept. of Systems and Computer Networks, Wrocław University of Technology, Wroclaw, Poland
Michał Woźniak
Department of Systems, Wrocław University of Technology, Wroclaw, Poland
Andrzej Żołnierek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bučar, J., Povh, J., Žnidaršič, M. (2016). Sentiment Classification of the Slovenian News Texts. In: Burduk, R., Jackowski, K., Kurzyński, M., Woźniak, M., Żołnierek, A. (eds) Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015. Advances in Intelligent Systems and Computing, vol 403. Springer, Cham. https://doi.org/10.1007/978-3-319-26227-7_73

Download citation

DOI: https://doi.org/10.1007/978-3-319-26227-7_73
Published: 05 March 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26225-3
Online ISBN: 978-3-319-26227-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics