Skip to main content

A Comparison Study on Ensemble Strategies and Feature Sets for Sentiment Analysis

  • Conference paper
  • First Online:
Information Sciences and Systems 2015

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 363))

Abstract

This paper is devoted to the comparison of different common base and ensemble classifiers for sentiment classification of reviews. It is also aimed to generate different feature sets and to observe their contribution to the classification accuracy. In detail, these feature sets are formed in an hierarchical manner, which is accomplished by first forming part-of-speech (POS) based word groups and then utilizing feature frequencies, SentiWordNet scores and their combination to obtain feature sets. In addition, several common base classifiers, namely Multinominal Naive Bayes (MNB), Support Vector Machine (SVM), Voted Perceptron (VP), K-Nearest Neighbor (k-NN), as well as common ensemble strategies, Random Forests (RFs), Stacking and Random Subspace (RSS) are each tested on the generated feature sets. Also, the Behavior-Knowledge Space (BKS) method has been derived to be applied on the set of outcomes for different algorithm and feature set combinations. Furthermore, a probability based meta-classifier technique has been tested on this set of outcomes. Finally, Information Gain (IG) feature selection technique has been applied to reduce the feature spaces. The experiments are conducted on a widely used movie review dataset and an equally common multi-domain review dataset. The results indicate that the probabilistic ensemble method generally gives comparatively better results than the other algorithms tested on the chosen datasets and that IG method can be utilized to save computational time while maintaining allowable accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2(1–2), 1–135 (2008)

    Article  Google Scholar 

  2. Kim, H., Ganesan, K., Sondhi, P., Zhai, C.: Comprehensive review of opinion summarization (survey) (2011)

    Google Scholar 

  3. Cambria, E., Schuller, B., Xia, Y., Havasi, C.: New avenues in opinion mining and sentiment analysis. IEEE Intell. Syst. 28(2), 15–21 (2013)

    Article  MATH  Google Scholar 

  4. Li, S., Zhang, H., Xu, W., Chen, G., Guo, J.: Exploiting combined multi-level model for document sentiment analysis. In: International Conference on Pattern Recognition, IEEE Computer Society Washington, pp. 4141–4144 (2010)

    Google Scholar 

  5. Pang, B., Lee, L.: A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. ACL ’04, Stroudsburg, Association for Computational Linguistics (2004)

    Google Scholar 

  6. Koncz, P., Paralic, J.: An approach to feature selection for sentiment analysis. In: International Conference on Intelligent Engineering Systems, Poprad, Slovakia (2011). June 23–25, 2011

    Google Scholar 

  7. Varma, S.: Cross-product sentiment analysis via ensemble svm classifiers. In: International Conference on Advancements in Information Technology (2011). Dec 17–18, 2011

    Google Scholar 

  8. Jiang, L., Yu, M., Zhou, M., Liu, X., Zhao, T.: Target-dependent twitter sentiment classification. In: Association for Computational Linguistics (ACL) (2011)

    Google Scholar 

  9. Bermingham, A., Smeaton, A.: Classifying sentiment in microblogs:is brevity an advantage? In: International Conference on Information and Knowledge Management, pp. 1833–1836 (2010)

    Google Scholar 

  10. Claster, W.B., Hung, D.Q., Shanmuganathan, S.: Unsupervised artificial neural nets for modeling movie sentiment. In: Second International Conference on Computational Intelligence (2010)

    Google Scholar 

  11. Li, G., Hoi, S.C.H., Chang, K., Jain, R.: Micro-blogging sentiment detection by collaborative online learning. In: IEEE International Conference on Data Mining (2010)

    Google Scholar 

  12. Chenlo, J., Losada, D.E.: An empirical study of sentence features for subjectivity and polarity classification. Inf. Sci. 280, 275–288 (2014)

    Article  Google Scholar 

  13. Xia, R., Zong, C., Li, S.: Ensemble of feature sets and classification algorithms for sentiment classification. Inf. Sci. 181(6), 1138–1152 (2011)

    Article  Google Scholar 

  14. Huang, P., Wang, G., Qin, S.: Boosting for transfer learning from multiple data sources. Pattern Recognit. Lett. 33(5), 568–579 (2012)

    Article  Google Scholar 

  15. Li, W., Wang, W., Chen, Y.: Heterogeneous ensemble learning for chinese sentiment classification. J. Inf. Comput. Sci. 9(15), 4551–4558 (2012)

    Google Scholar 

  16. Zhang, Z., Miao, D., Wei, Z., Wang, L.: Document-level sentiment classification based on behavior-knowledge space method. Adv. Data Min. Appl. Lect. Notes Comput. Sci. 7713(15), 330–339 (2012)

    Google Scholar 

  17. Baccianella, S., Esuli, A., Sebastiani, F.: Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), Valletta, Malta, European Language Resources Association (ELRA) (2010)

    Google Scholar 

  18. McCallum, A., Nigam, K.: A comparison of event models for naive Bayes text classification. In: Learning for Text Categorization: Papers from the 1998 AAAI Workshop, pp. 41–48 (1998)

    Google Scholar 

  19. Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel Methods—Support Vector Learning, MIT Press (1998)

    Google Scholar 

  20. Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992)

    MathSciNet  MATH  Google Scholar 

  21. Su, Y., Zhang, Y., Ji, D., Wang, Y., Wu, H.: Ensemble learning for sentiment classification. Lect. Notes Comput. Sci. 7717, 84–93 (2013)

    Article  Google Scholar 

  22. Oza, N.C.: Online Ensemble Learning. Ph.D. thesis, The University of California, Berkeley (2001)

    Google Scholar 

  23. Dietterich, T.G.: Ensemble methods in machine learning. In: Proceedings of the First International Workshop on Multiple Classifier Systems. MCS ’00, Springer, London, pp. 1–15 (2000)

    Google Scholar 

  24. Huang, Y.S., Suen, C.Y.: A method of combining multiple experts for the recognition of unconstrained handwritten numerals. IEEE Trans. Pattern Anal. Mach. Intell. 17(1), 90–94 (1995)

    Article  Google Scholar 

  25. Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In: Proceedings of the Association for Computational Linguistics (ACL) (2007)

    Google Scholar 

  26. Li, S., Xia, R., Zong, C., Huang, C.: A framework of feature selection methods for text categorization. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2- Volume 2. ACL ’09, Association for Computational Linguistics, Stroudsburg, pp. 692–700 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deniz Aldogan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Aldogan, D., Yaslan, Y. (2016). A Comparison Study on Ensemble Strategies and Feature Sets for Sentiment Analysis. In: Abdelrahman, O., Gelenbe, E., Gorbil, G., Lent, R. (eds) Information Sciences and Systems 2015. Lecture Notes in Electrical Engineering, vol 363. Springer, Cham. https://doi.org/10.1007/978-3-319-22635-4_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-22635-4_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-22634-7

  • Online ISBN: 978-3-319-22635-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics