A Comparison Study on Ensemble Strategies and Feature Sets for Sentiment Analysis

Aldogan, Deniz; Yaslan, Yusuf

doi:10.1007/978-3-319-22635-4_33

Deniz Aldogan⁵ &
Yusuf Yaslan⁶

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 363))

706 Accesses
1 Citations

Abstract

This paper is devoted to the comparison of different common base and ensemble classifiers for sentiment classification of reviews. It is also aimed to generate different feature sets and to observe their contribution to the classification accuracy. In detail, these feature sets are formed in an hierarchical manner, which is accomplished by first forming part-of-speech (POS) based word groups and then utilizing feature frequencies, SentiWordNet scores and their combination to obtain feature sets. In addition, several common base classifiers, namely Multinominal Naive Bayes (MNB), Support Vector Machine (SVM), Voted Perceptron (VP), K-Nearest Neighbor (k-NN), as well as common ensemble strategies, Random Forests (RFs), Stacking and Random Subspace (RSS) are each tested on the generated feature sets. Also, the Behavior-Knowledge Space (BKS) method has been derived to be applied on the set of outcomes for different algorithm and feature set combinations. Furthermore, a probability based meta-classifier technique has been tested on this set of outcomes. Finally, Information Gain (IG) feature selection technique has been applied to reduce the feature spaces. The experiments are conducted on a widely used movie review dataset and an equally common multi-domain review dataset. The results indicate that the probabilistic ensemble method generally gives comparatively better results than the other algorithms tested on the chosen datasets and that IG method can be utilized to save computational time while maintaining allowable accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2(1–2), 1–135 (2008)
Article Google Scholar
Kim, H., Ganesan, K., Sondhi, P., Zhai, C.: Comprehensive review of opinion summarization (survey) (2011)
Google Scholar
Cambria, E., Schuller, B., Xia, Y., Havasi, C.: New avenues in opinion mining and sentiment analysis. IEEE Intell. Syst. 28(2), 15–21 (2013)
Article MATH Google Scholar
Li, S., Zhang, H., Xu, W., Chen, G., Guo, J.: Exploiting combined multi-level model for document sentiment analysis. In: International Conference on Pattern Recognition, IEEE Computer Society Washington, pp. 4141–4144 (2010)
Google Scholar
Pang, B., Lee, L.: A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. ACL ’04, Stroudsburg, Association for Computational Linguistics (2004)
Google Scholar
Koncz, P., Paralic, J.: An approach to feature selection for sentiment analysis. In: International Conference on Intelligent Engineering Systems, Poprad, Slovakia (2011). June 23–25, 2011
Google Scholar
Varma, S.: Cross-product sentiment analysis via ensemble svm classifiers. In: International Conference on Advancements in Information Technology (2011). Dec 17–18, 2011
Google Scholar
Jiang, L., Yu, M., Zhou, M., Liu, X., Zhao, T.: Target-dependent twitter sentiment classification. In: Association for Computational Linguistics (ACL) (2011)
Google Scholar
Bermingham, A., Smeaton, A.: Classifying sentiment in microblogs:is brevity an advantage? In: International Conference on Information and Knowledge Management, pp. 1833–1836 (2010)
Google Scholar
Claster, W.B., Hung, D.Q., Shanmuganathan, S.: Unsupervised artificial neural nets for modeling movie sentiment. In: Second International Conference on Computational Intelligence (2010)
Google Scholar
Li, G., Hoi, S.C.H., Chang, K., Jain, R.: Micro-blogging sentiment detection by collaborative online learning. In: IEEE International Conference on Data Mining (2010)
Google Scholar
Chenlo, J., Losada, D.E.: An empirical study of sentence features for subjectivity and polarity classification. Inf. Sci. 280, 275–288 (2014)
Article Google Scholar
Xia, R., Zong, C., Li, S.: Ensemble of feature sets and classification algorithms for sentiment classification. Inf. Sci. 181(6), 1138–1152 (2011)
Article Google Scholar
Huang, P., Wang, G., Qin, S.: Boosting for transfer learning from multiple data sources. Pattern Recognit. Lett. 33(5), 568–579 (2012)
Article Google Scholar
Li, W., Wang, W., Chen, Y.: Heterogeneous ensemble learning for chinese sentiment classification. J. Inf. Comput. Sci. 9(15), 4551–4558 (2012)
Google Scholar
Zhang, Z., Miao, D., Wei, Z., Wang, L.: Document-level sentiment classification based on behavior-knowledge space method. Adv. Data Min. Appl. Lect. Notes Comput. Sci. 7713(15), 330–339 (2012)
Google Scholar
Baccianella, S., Esuli, A., Sebastiani, F.: Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), Valletta, Malta, European Language Resources Association (ELRA) (2010)
Google Scholar
McCallum, A., Nigam, K.: A comparison of event models for naive Bayes text classification. In: Learning for Text Categorization: Papers from the 1998 AAAI Workshop, pp. 41–48 (1998)
Google Scholar
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel Methods—Support Vector Learning, MIT Press (1998)
Google Scholar
Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992)
MathSciNet MATH Google Scholar
Su, Y., Zhang, Y., Ji, D., Wang, Y., Wu, H.: Ensemble learning for sentiment classification. Lect. Notes Comput. Sci. 7717, 84–93 (2013)
Article Google Scholar
Oza, N.C.: Online Ensemble Learning. Ph.D. thesis, The University of California, Berkeley (2001)
Google Scholar
Dietterich, T.G.: Ensemble methods in machine learning. In: Proceedings of the First International Workshop on Multiple Classifier Systems. MCS ’00, Springer, London, pp. 1–15 (2000)
Google Scholar
Huang, Y.S., Suen, C.Y.: A method of combining multiple experts for the recognition of unconstrained handwritten numerals. IEEE Trans. Pattern Anal. Mach. Intell. 17(1), 90–94 (1995)
Article Google Scholar
Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In: Proceedings of the Association for Computational Linguistics (ACL) (2007)
Google Scholar
Li, S., Xia, R., Zong, C., Huang, C.: A framework of feature selection methods for text categorization. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2- Volume 2. ACL ’09, Association for Computational Linguistics, Stroudsburg, pp. 692–700 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

B3Lab, Department of Cloud Computing and Big Data Analysis Systems, Information Technologies Institute, Bilgem, Tubitak
Deniz Aldogan
Faculty of Computer and Informatics, Department of Computer Engineering, Istanbul Technical University, Istanbul, Turkey
Yusuf Yaslan

Authors

Deniz Aldogan
View author publications
You can also search for this author in PubMed Google Scholar
Yusuf Yaslan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deniz Aldogan .

Editor information

Editors and Affiliations

Department of Electrical, Imperial College, London, United Kingdom
Omer H. Abdelrahman
Department of Electrical and Electronic Engineering, Imperial College, London, United Kingdom
Erol Gelenbe
Department of Electrical and Electronic Engineering, Imperial College, London, United Kingdom
Gokce Gorbil
Department of Engineering Technology, University of Houston, Houston, Texas, USA
Ricardo Lent

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aldogan, D., Yaslan, Y. (2016). A Comparison Study on Ensemble Strategies and Feature Sets for Sentiment Analysis. In: Abdelrahman, O., Gelenbe, E., Gorbil, G., Lent, R. (eds) Information Sciences and Systems 2015. Lecture Notes in Electrical Engineering, vol 363. Springer, Cham. https://doi.org/10.1007/978-3-319-22635-4_33

Download citation

DOI: https://doi.org/10.1007/978-3-319-22635-4_33
Published: 04 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22634-7
Online ISBN: 978-3-319-22635-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics