Skip to main content
Log in

Hybrid sentiment analysis framework for a morphologically rich language

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

This paper presents a process of building a Sentiment Analysis Framework for Serbian (SAFOS). We created a hybrid method that uses a sentiment lexicon and Serbian WordNet (SWN) synsets assigned with sentiment polarity scores in the process of feature selection. As the use of stemming for morphologically rich languages (MRLs) may result in loss or giving incorrect sentiment meaning to words, we decided to expand the sentiment lexicon, as well as the lexicon generated using SWN, by adding morphological forms of emotional terms and phrases. It was done using Serbian Morphological Electronic Dictionaries. A new feature reduction method for document-level sentiment polarity classification using maximum entropy modeling is proposed. It is based on mapping of a large number of related feature candidates (sentiment words, phrases and their inflectional forms) to a few concepts and using them as features. Testing was performed on a 10-fold cross validation set and on test sets containing news and movie reviews. The results of all experiments show that sentiment feature mapping for feature set reduction achieves better results over the basic set of features. For both test sets, the best classification accuracy scores were achieved for the combination of unigram and bigram features reduced by sentiment feature mapping (accuracy 78.3 % for movie reviews and 79.2 % for news test set). In 10-fold cross-validation, best average accuracy score of 95.6 % was obtained using unigrams as features, reduced by the mapping procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. bez- and bes- are two alomorphs

  2. smejala,smejati.V:Gpn and smejala,smejati.V:Gsf are two different grammatical forms of one word form smejala: the first one is the plural, neuter gender form of the active past participle, while the second one is the singular, feminine form of the active past participle

  3. Serbian Wordnet (SWN) is available for download for non-commercial use under the CC-BY-NC license

  4. PWN contains 117 000 synsets (https://wordnet.princeton.edu)

  5. http://www.kurir-info.rs/crna-hronika/

  6. http://www.dobrevesti.rs;http://www.svrljig.info/vesti/dobre-vesti;http://www.ilovezrenjanin.com/category/drustvena-odgovornost-2/

  7. http://www.vesti.rs/Dobre-vesti/

  8. http://pistaljka.rs/

  9. http://2kokice.com/

  10. http://www.korpus.matf.bg.ac.rs/index.html

References

  • Aggarwal, C.C., & Zhai, C. (2012). A Survey of Text Classification Algorithms. Mining Text Data, Springer, pp 163–222.

  • Carrillo de Albornoz, J., Plaza, L., & Gervás, P. (2012). In SentiSense: An Easily Scalable Concept-based Affective Lexicon for Sentiment Analysis. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12), pp 3562–3567.

  • Asghar, M., Khan, A., Shakeel, A., & Kundi, F. (2014). A Review of Feature Extraction in Sentiment Analysis. Journal of Basic and Applied Scientific Research, 4 (3), 181–186.

    Google Scholar 

  • Baccianella, S., Esuli, A., & Sebastiani, F. (2010). SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. LREC, vol 10.

  • Baroni, M., & Vegnaduzzo, S. (2004). In Identifying Subjective Adjectives through Web-based Mutual Information. In: Proceedings of the 7th Konferenz zur Verarbeitung Natrlicher Sprache KONVENS’04, pp 613–619.

  • Cambria, E., Havasi, C., & Hussain, A. (2012). SenticNet 2: A Semantic and Affective Resource for Opinion Mining and Sentiment Analysis. FLAIRS Conference, AAAI Press Youngblood, G.M., & McCarthy, P.M. (Eds.)

  • Carenini, G., Ng, R.T., & Zwart, E. (2005). In Extracting Knowledge from Evaluative Text. In: Proceedings of the 3rd International Conference on Knowledge Capture, K-CAP ’05, pp 11–18.

  • ChandraKala, S., & Sindhu, C. (2012). Opinion Mining and Sentiment Classification: A Survey. Ictact Journal on Soft Computing 03 01.

  • Dave, K., Lawrence, S., & Pennock, D.M. (2003). In Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews. In: Proceedings of the 12th International Conference on World Wide Web, ACM, New York, NY, USA, WWW ’03, pp 519–528.

  • Esuli, A., & Sebastiani, F. (2006a). In Determining Term Subjectivity and Term Orientation for Opinion Mining. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL ’06).

  • Esuli, A., & Sebastiani, F. (2006b). In Sentiwordnet: A Publicly Available Lexical Resource for Opinion Mining. In: Proceedings of the 5th Conference on Language Resources and Evaluation, pp 417–422.

  • Gamon, M. (2004). In Sentiment Classification on Customer Feedback Data: Noisy Data, Large Feature Vectors, and the Role of Linguistic Analysis. In: Proceedings of the 20th International Conference on Computational Linguistics, COLING ’04, pp 841–847.

  • Gaudette, L., & Japkowicz, N. (2011). Compact Features for Sentiment Analysis. In: Canadian Conference on AI, Springer, Lecture Notes in Computer Science, 6657, 146–157.

  • Grabner, D., Zanker, M., Fliedl, G., & Fuchs, M. (2012). Classification of Customer Reviews Based on Sentiment Analysis. In: Information and Communication Technologies in Tourism 2012: Proceedings of the International Conference in Helsingborg, Sweden, 460–470.

  • Gross, M., & Perrin, D. (eds.) (1989). Electronic Dictionaries and Automata in Computational Linguistics, Lecture Notes in Computer Science, vol 377, Springer.

  • Hatzivassiloglou, V., & McKeown, K.R. (1997). Predicting the Semantic Orientation of Adjectives. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, ACL, ACL ’98, pp 174–181.

  • Hatzivassiloglou, V., & Wiebe, J.M. (2000). Effects of Adjective Orientation and Gradability on Sentence Subjectivity. In: Proceedings of the 18th Conference on Computational Linguistics - Volume 1, COLING ’00, pp 299–305.

  • Hernndez, S., & Sallis, P. (2011). Sentiment-Preserving Reduction for Social Media Analysis. In: CIARP, Springer, Lecture Notes in Computer Science, 7042, 409–416.

  • Hu, M., & Liu, B. (2004). Mining Opinion Features in Customer Reviews. In: Proceedings of the 19th National Conference on Artifical Intelligence, AAAI Press, AAAI’04, pp 755–760.

  • Hu, X., Tang, J., Gao, H., & Liu, H. (2013). Unsupervised Sentiment Analysis with Emotional Signals. In: Proceedings of the 22nd International Conference on World Wide Web, WWW ’13, pp 607–618.

  • Jeong, H., Shin, D., & Choi, J. (2011). FEROM: Feature Extraction and Refinement for Opinion Mining. ETRI Journal, 33(5), 7112–7122.

    Article  Google Scholar 

  • Jiliang, T., Salem, A., & Huan, L. (2014). Feature Selection for Classification: A Review. In: Data Classification: Algorithms and Applications, CRC Press, 37–64.

  • Jovanovic, R., & Atanackovic, L. (1980). Sistematski recnik srpskohrvatskoga jezika. Matica srpska, Novi Sad.

  • Kennedy, A., & Inkpen, D. (2006). Sentiment Classification of Movie Reviews Using Contextual Valence Shifters. Computational Intelligence, 22(2), 110–125.

    Article  MathSciNet  Google Scholar 

  • Kennedy, A., Kazantseva, A., Inkpen, D., & Szpakowicz, S. (2012). Getting Emotional about News Summarization. In: Proceedings of Advances in Artificial Intelligence - 25th Canadian Conference on Artificial Intelligence, pp 121–132.

  • Khairnar, J., & Kinikar, M. (2013). Machine Learning Algorithms for Opinion Mining and Sentiment Classification. International Journal of Scientific and Research Publications (IJSRP) 3 6.

  • Krstev, C. (2008). Processing of Serbian: Automata, Texts and Electronic Dictionaries. University of Belgrade, Faculty of Philology.

  • Krstev, C., & Vitas, D. (2009). An Effective Method for Developing a Comprehensive Morphological E-dictionary of Compounds. In: Proceedings of The 28th Conference on Lexis and Grammar, pp 204–212.

  • Krstev, C., Pavlovic-Lažetic, G., Vitas, D., & Obradovic, I. (2004). Using Textual and Lexical Resources in Developing Serbian Wordnet. Romanian Journal of Information Science and Technology, 7(1), 147–161.

    Google Scholar 

  • Lin, C., & He, Y. (2009). Joint Sentiment/Topic Model for Sentiment Analysis. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09, pp 375- -384.

  • Liu, B. (2006). Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications). Springer-Verlag New York, Inc., Secaucus, NJ, USA.

  • Liu, B. (2010). Sentiment Analysis and Subjectivity Indurkhya, N, & Damerau, F J (Eds.)

  • Liu, B. (2012). Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers.

  • Liu, B., Hu, M., & Cheng, J. (2005). Opinion Observer: Analyzing and Comparing Opinions on the Web. In: Proceedings of the 14th International Conference on World Wide Web, WWW ’05, pp 342–351.

  • Manco, G., Masciari, E., Ruffolo, M., & Tagarelli, A. (2002). Towards An Adaptive Mail Classifier. Tech. rep., Italian Association for Artificial Intelligence Workshop Su Apprendimento Automatico: Metodi Ed Applicazioni.

  • Manning, C.D., Raghavan, P., & Schútze, H. (2008). Introduction to Information Retrieval. New York, NY, USA: Cambridge University Press.

    Book  MATH  Google Scholar 

  • Martineau, J, & Finin, T (2009). Delta TFIDF: An Improved Feature Space for Sentiment Analysis. In: Proceedings of the Third AAAI Internatonal Conference on Weblogs and Social Media, The AAAI Press, San Jose, CA.(poster paper).

  • Mehra, N., Khandelwal, S., & Patel, P. (2002). Sentiment Identification Using Maximum Entropy Analysis of Movie Reviews. Working paper. http://web.stanford.edu/class/cs276a/projects/reports/nmehra-kshashi-priyank9.pdf.

  • Missen, M.M.S., Boughanem, M., & Cabanac, G. (2009). Challenges for Sentence Level Opinion Detection in Blogs. In: Proceedings of the 2009 Eigth IEEE/ACIS International Conference on Computer and Information Science, ICIS ’09, pp 347–351.

  • Mohammad, S.M., & Turney, P.D. (2013). Crowdsourcing a Word-Emotion Association Lexicon. Computational Intelligence, 29(3), 436–465.

    Article  MathSciNet  Google Scholar 

  • Moilanen, K., & Pulman, S. (2008). The Good, the Bad, and the Unknown: Morphosyllabic Sentiment Tagging of Unseen Words. In: Proceedings of ACL-08: HLT, Short Papers, Association for Computational Linguistics 109–112.

  • Mullen, T., & Collier, N. (2004). Sentiment Analysis Using Support Vector Machines with Diverse Information Sources. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, pp 412–418.

  • Ng, V., Dasgupta, S., & Arifin S.M. Niaz (2006). Examining the Role of Linguistic Knowledge Sources in the Automatic Identification and Classification of Reviews. In: Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pp 611–618.

  • Nigam, K., Lafferty, J., & McCallum, A. (1999). Using Maximum Entropy for Text Classification. In: IJCAI-99 Workshop on Machine Learning for Information Filtering, pp 61–67.

  • Ohana, B., & Turney, B. (2009). Sentiment Classification of Reviews Using SentiWordNet. In: 9th IT & T Conference, Dublin Institute of Technology, Dublin, Ireland.

  • Pak, A., & Paroubek, P. (2010). Twitter as a Corpus for Sentiment Analysis and Opinion Mining. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), European Language Resources Association (ELRA).

  • Pang, B., & Lee, L. (2005). Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL ’05, pp 115–124.

  • Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment Classification using Machine Learning Techniques. In: PROCEEDINGS OF EMNLP, pp 79–86.

  • Plutchik, R., & Conte, R.H. (1997). Circumplex Models of Personality and Emotions. American Psychological Association.

  • Popescu, A., Yates, A., & Etzioni, O. (2004). Class Extraction from the World Wide Web. In: Proceedings of AAAI 2004 Workshop on Adaptive Text Extraction and Mining (ATEM’04), pp 68–73.

  • Potts, C. (2011). Sentiment Analysis Tutorial. Sentiment Analysis Symposium, San Francisco, Nov 8-9. http://sentiment.christopherpotts.net/stemming.html.

  • van Rijsbergen, C J. (1979). Information Retrieval, 2nd edn. USA: Butterworth-Heinemann, Newton, MA.

    MATH  Google Scholar 

  • Riloff, E., Wiebe, J., & Wilson, T. (2003). Learning Subjective Nouns Using Extraction Pattern Bootstrapping. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 - Volume 4, CONLL ’03, pp 25–32.

  • Riloff, E., Patwardhan, S., & Wiebe, J. (2006). Feature Subsumption for Opinion Analysis. In: EMNLP, ACL, COLING ’04, pp 440–448.

  • Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010). Automatic Keyword Extraction from Individual Documents Berry, M.W., & Kogan, J. (Eds.)

  • Saif, H., He, Y., & Alani, H. (2012). Semantic Sentiment Analysis of Twitter. In: Proceedings of the 11th International Conference on The Semantic Web - Volume Part I, ISWC’12, pp 508–524.

  • Savoy, J., & Gaussier, E. (2010). Information Retrieval Indurkhya, N., & Damerau, F.J. (Eds.)

  • Strapparava, C., & Mihalcea, R. (2008). Learning to Identify Emotions in Text. In: Proceedings of the 2008 ACM Symposium on Applied Computing, SAC ’08, pp 1556–1560.

  • Strapparava, C., & Valitutti, A. (2004). Wordnet-affect: An Affective Extension of Wordnet. In: Proceedings of the 4th International Conference on Language Resources and Evaluation, pp 1083–1086.

  • Surendran, A.C., Platt, J.C, & Renshaw, E. (2005). Automatic Discovery of Personal Topics to Organize Email. In: CEAS 2005 - Second Conference on Email and Anti-Spam, Stanford University, California, USA, pp 1–6, http://www.ceas.cc/papers-2005/145.pdf.

  • Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-based Methods for Sentiment Analysis. Computational Linguistics, 37(2), 267–307.

    Article  Google Scholar 

  • Tan, S., Cheng, X., Wang, Y., & Xu, H. (2009). Adapting Naive Bayes to Domain Adaptation for Sentiment Analysis. In: Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval, Springer-Verlag, pp 337–349.

  • Tsarfaty, R., Seddah, D., Goldberg, Y., Kubler, S., Candito, M., Foster, J., Versley, Y., Rehbein, I., & Tounsi, L. (2010). Statistical Parsing of Morphologically Rich Languages (SPMRL): What, How and Whither. In: Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages, SPMRL ’10, pp 1–12.

  • Turney, P.D. (2002). Thumbs Up or Thumbs Down?: Semantic Orientation Applied to Unsupervised Classification of Reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, pp 417–424.

  • Turney, P.D., & Littman, M.L. (2003). Measuring Praise and Criticism: Inference of Semantic Orientation from Association. ACM Trans Inf Syst, 21(4), 315–346.

    Article  Google Scholar 

  • Utvic, M. (2014). Liste ucestanosti korpusa savremenog srpskog jezika [Corpus of Contemporary Serbian Frequency Lists]. In: Naucni sastanak slavista u Vukove dane, pp 241–262.

  • Vinodhini, G., & Chandrasekaran, R.M. (2013). Effect of Feature Reduction in Sentiment Analysis of Online Reviews. International Journal of Advanced Research in Computer Engineering & Technology, 2(6), 2165–2172.

    Google Scholar 

  • Vitas, D., & Krstev, C. (2013). Derivational Morphology in E-Dictionaries of Serbian Baptista, J., & Monteleone, M. (Eds.)

  • Wang, B., & Wang, H. (2008). Bootstrapping Both Product Features and Opinion Words from Chinese Customer Reviews with Cross-Inducing. In: Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I, pp 289–295.

  • Wang, S., Li, D., Wei, Y., & Li, H. (2009). A Feature Selection Method Based on Fisher’s Discriminant Ratio for Text Sentiment Classification. In: Web Information Systems and Mining, Springer, Lecture Notes in Computer Science, 88–97.

  • Whitelaw, C., Garg, N., & Argamon, S. (2005). Using Appraisal Groups for Sentiment Analysis. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, CIKM ’05, pp 625–631.

  • Wiebe, J. (2000). Learning Subjective Adjectives from Corpora. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, AAAI Press, pp 735–740.

  • Wiebe, J., Wilson, T., & Bell, M. (2001). Identifying Collocations for Recognizing Opinions. In: Proceedings of the ACL-01 Workshop on Collocation, Computational Extraction, Analysis, and Exploitation, 24–31.

  • Wiebe, J., Wilson, T., Bruce, R., Bell, M., & Martin, M. (2004). Learning Subjective Language. Computational Linguistics, 30(3), 277–308.

    Article  Google Scholar 

  • Wilson, T., Wiebe, J., & Hwa, R. (2004). Just How Mad Are You? Finding Strong and Weak Opinion Clauses. In: Proceedings of AAAI-04, 21st Conference of the American Association for Artificial Intelligence, pp 761–769.

  • Yousefpour, A., Ibrahim, R., Nuzly, H., & Hamed, A. (2014). A Novel Feature Reduction Method in Sentiment Analysis. International Journal of Innovative Computing, 4(1), 34–40.

    Google Scholar 

  • Zhai, Z., Liu, B., Xu, H., & Jia, P. (2010). Grouping Product Features Using Semi-Supervised Learning with Soft-Constraints. In: Proceedings of the 23rd International Conference on Computational Linguistics, ACL, COLING ’10, pp 1272–1280.

  • Zhang, W., Xu, H., & Wan, W. (2012). Weakness Finder: Find Product Weakness from Chinese Reviews by Using Aspects Based Sentiment Analysis. Expert Systems with Applications 39(11):10, 291, 283–10.

    Google Scholar 

  • Zhang, Z., & Li, X. (2010). Controversy is Marketing: Mining Sentiments in Social Media. In: HICSS, IEEE Computer Society, pp 1–10.

  • Zhao, P., Li, X., & Wang, K. (2013). Feature Extraction from Micro-blogs for Comparison of Products and Services, In: WISE (1), Springer, Lecture Notes in Computer Science, 8180, 82–91.

  • Zou, F., Wang, F.L., Deng, X., Han, S., & Wang, L.S. (2006). Automatic Construction of Chinese Stop Word List. In: Proceedings of the 5th WSEAS International Conference on Applied Computer Science, ACOS’06, pp 1009–1014.

Download references

Acknowledgments

This research was partially supported by the project III 47003, financed by the Serbian Ministry of Education, Science and Technological Development.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Miljana Mladenović.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mladenović, M., Mitrović, J., Krstev, C. et al. Hybrid sentiment analysis framework for a morphologically rich language. J Intell Inf Syst 46, 599–620 (2016). https://doi.org/10.1007/s10844-015-0372-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-015-0372-5

Keywords

Navigation