Skip to main content

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 403))

Abstract

This paper deals with automatic two class document-level sentiment classification. We retrieved textual documents with political, business, economic and financial content from five Slovenian web media. By annotating a sample of 10,427 documents, we obtained a labelled corpus in the Slovenian language. Five classifiers were evaluated on this corpus: multinomial naïve Bayes, support vector machines, random forest, k-nearest neighbour and naïve Bayes, out of which the first three were used also in the assessment of the pre-processing options. Among the selected classifiers, multinomial naïve Bayes outperforms the naïve Bayes, k-nearest neighbour, random forest and support vector machines classifier in terms of classification accuracy. The best selection of pre-processing options achieves more than 95 % classification accuracy with Naïve Bayes Multinomial and more than 85 % with support vector machines and random forest classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aha, D.W., Kibler, D., Albert, M.A.: Instance-based learning algorithms. Mach. Learn. 6, 37–66 (1991)

    Google Scholar 

  2. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  3. Cortes, C., Vapnik, V.: Support vector networks. Mach. Learn. 20, 273–297 (1995)

    MATH  Google Scholar 

  4. Cronbach, L.J.: Coefficient alpha and the internal structure of tests. Psychometrika 16, 297–334 (1951)

    Article  Google Scholar 

  5. Das, S.R., Chen, M.Y.: Yahoo! for amazon: Extracting market sentiment from stock message boards. In: Proceedings of the Asia Pacific Finance Association Annual Conference (APFA) (2001)

    Google Scholar 

  6. Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In: Proceedings of the World Wide Web Conference (2003)

    Google Scholar 

  7. Godbole, N., Srinivasaiah, M., Skiena, S.: Large-scale sentiment analysis for news and blogs. Proc. Int. Conf. Weblogs Soc. Media 2, 1–4 (2007)

    Google Scholar 

  8. Hatzivassiloglou, V., McKeown, K.R.: Predicting the semantic orientation of adjectives. In: Proceedings of the 8th Conference on European Chapter of the Association for Computational Linguistics, pp. 174–181 (1997)

    Google Scholar 

  9. Hrala, M., Král, P.: Evaluation of the document classification approaches. In: Proceedings of the 8th International Conference on Computer Recognition Systems CORES, pp. 877–885 (2013)

    Google Scholar 

  10. Internet World Stats, World Internet Users and 2014 Population Stats (2014), http://www.internetworldstats.com/stats.htm. Accessed 10 Mar 2015

  11. Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Proceedings of the European Conference on Machine Learning, pp. 137–142 (1998)

    Google Scholar 

  12. Lewis, D.D.: Naïe (Bayes) at forty: the independent assumption in information retrieval. Mach. Learn.: ECML 98, 4–15 (1998)

    Google Scholar 

  13. Likert, R.: A technique for the measurement of attitudes. Arch. Psychol. 22, 1–55 (1932)

    Google Scholar 

  14. Liu, B.: Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, pp. 469–492. Springer, New York (2011)

    Book  MATH  Google Scholar 

  15. McCallum, A., Nigam, K.: A comparison of event models for Naïve Bayes text classification. In: AAAI-98 Workshop on Learning for Text Categorization, vol. 752, pp. 41–48 (1998)

    Google Scholar 

  16. Paliouras, G., Papatheodorou, C., Karkaletsis, V., Spyropoulos, C.: Discovering user communities on the internet using unsupervised machine learning techniques. Interact. Comput. 14, 761–791 (2002)

    Article  Google Scholar 

  17. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2, 1–135 (2008)

    Article  Google Scholar 

  18. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, pp. 79–86 (2002)

    Google Scholar 

  19. Smailović, J., Grčar, M., Lavrač, N., Žnidaršič, M.: Predictive sentiment analysis of tweets: A stock market application. In: Human-Computer Interaction and Knowledge Discovery in Complex, Unstructured, Big Data, pp. 77–88 (2013)

    Google Scholar 

  20. Stone, G.C., Grusin, E.: Network TV as the Bad News Bearer. Journal. Q. 61, 517 (1984)

    Article  Google Scholar 

  21. Turney, P.D.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp. 417–424 (2002)

    Google Scholar 

  22. Web Technology Surveys Usage of content languages for websites (2011), http://w3techs.com/technologies/overview/content_language/all. Accessed 08 Mar 2015

  23. Wright, A.: Mining the Web for Feelings, Not Facts. New York Times 24 (2009)

    Google Scholar 

Download references

Acknowledgments

Work supported by Creative Core FISNM-3330-13-500033 ‘Simulations’ project funded by the European Union, The European Regional Development Fund and Young Researcher Programme by Slovenian Research Agency. The operation is carried out within the framework of the Operational Programme for Strengthening Regional Development Potentials for the period 2007–2013, Development Priority 1: Competitiveness and research excellence, Priority Guideline 1.1: Improving the competitive skills and research excellence.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jože Bučar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Bučar, J., Povh, J., Žnidaršič, M. (2016). Sentiment Classification of the Slovenian News Texts. In: Burduk, R., Jackowski, K., Kurzyński, M., Woźniak, M., Żołnierek, A. (eds) Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015. Advances in Intelligent Systems and Computing, vol 403. Springer, Cham. https://doi.org/10.1007/978-3-319-26227-7_73

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26227-7_73

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26225-3

  • Online ISBN: 978-3-319-26227-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics