A Performance Comparison of Feature Extraction Methods for Sentiment Analysis

Hung, Lai Po; Alfred, Rayner

doi:10.1007/978-3-319-56660-3_33

Lai Po Hung⁵ &
Rayner Alfred⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 710))

Included in the following conference series:

Asian Conference on Intelligent Information and Database Systems

1380 Accesses
6 Citations
1 Altmetric

Abstract

Sentiment analysis is the task of classifying documents according to their sentiment polarity. Before classification of sentiment documents, plain text documents need to be transformed into workable data for the system. This step is known as feature extraction. Feature extraction produces text representations that are enriched with information in order to have better classification results. The experiment in this work aims to investigate the effects of applying different sets of features extracted and to discuss the behavior of the features in sentiment analysis. These features extraction methods include unigrams, bigrams, trigrams, Part-Of-Speech (POS) and Sentiwordnet methods. The unigrams, part-of-speech and Sentiwordnet features are word based features, whereas bigrams and trigrams are phrase-based features. From the results of the experiment obtained, phrase based features are more effective for sentiment analysis as the accuracies produced are much higher than word based features. This might be due to the fact that word based features disregards the sentence structure and sequence of original text and thus distorting the original meaning of the text. Bigrams and trigrams features retain some sequence of the sentences thus contributing to better representations of the text.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Study of Feature Extraction Techniques for Sentiment Analysis

A Comparative Study of Persian Sentiment Analysis Based on Different Feature Combinations

A Literature Review on Text Classification and Sentiment Analysis Approaches

References

Agarwal, B., Mittal, N.: Categorical probability proportion difference (CPPD): a feature selection method for sentiment classification. In: Proceedings of the 2nd Workshop on Sentiment Analysis where AI Meets Psychology (SAAIP 2012), pp. 17–26. Mumbai (2012)
Google Scholar
Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: ACL (2007)
Google Scholar
Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of WWW, pp. 519–528 (2003)
Google Scholar
Esuli, A., Sebastiani, F.: SentiWordNet: a publicly available lexical resource for opinion mining. In: Proceedings of Language Resources and Evaluation (LREC) (2006)
Google Scholar
Hassan, A., Abbasi, A., Zeng, D.: Twitter sentiment analysis: a bootstrap ensemble framework. In: International Conference of Social Computing (SocialCom), pp. 357–364 (2013)
Google Scholar
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the 10th European Conference on Machine Learning, pp. 137–142 (1998)
Google Scholar
Lai, P.H., Alfred, R., Hijazi, M.H.: A review on feature selection methods for sentiment analysis. Adv. Sci. Lett. 21, 2952–2956 (2015)
Article Google Scholar
Lai, P.H., et al.: A review on the ensemble framework for sentiment analysis. Adv. Sci. Lett. 21, 2957–2962 (2015)
Article Google Scholar
Li, J., Sun, M.: Experimental study on sentiment classification of Chinese review using machine learning techniques. In: Proceedings of International Conference of Natural Language Processing and Knowledge Engineering, NLP-KE (2007)
Google Scholar
Li, S., Zong, C., Wang, X.: Sentiment classification through combining classifiers with multiple feature sets. In: NLPKE, pp. 135–140 (2007)
Google Scholar
Liao, C., Alpha, S., Dixon, P.: Feature preparation in text categorization. ADM03 workshop (2003)
Google Scholar
Narayanan, V., Arora, I., Bhatia, A.: Fast and accurate sentiment classification using an enhanced naive bayes model. In: Intelligent Data Engineering and Automated Learning–IDEAL 2013, pp. 194–201. Springer (2013)
Google Scholar
O’Keefe, T., Koprinska, I.: Feature selection and weighting methods in sentiment analysis. In: Proceedings of the 14th Australasian Document Computing Symposium, Sydney, Australia, 4 Dec 2009
Google Scholar
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? sentiment classification using machine learning techniques. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, Association of Computational Linguistics, pp. 79–86, July 2002
Google Scholar
Sarkar, S.D., Goswami, S.: Empirical study on filter based feature selection methods for text classification. Int. J. Comput. Appl. 81(6), 0975–8887 (2013)
Google Scholar
Sharma, A., Dey, S.: Performance investigation of feature selection methods and sentiment lexicons for sentiment analysis. In: IJCA Special Issue on Advanced Computing and Comm Technologies for HPC Applications, vol. 3, pp. 15–20 (2012)
Google Scholar
Sharma, A., Dey, S.: A boosted svm based ensemble classifier for sentiment analysis of online reviews. SIGAPP Appl. Comput. Rev. 13 (2013)
Google Scholar
Simeon, M., Hilderman, R.: Categorical proportional difference: a feature selection method for text categorization. In: Proceedings of the 7th Australasian Data Mining Conference, vol. 87, pp. 201–208 (2008)
Google Scholar
Xia, R., Zong, C., Li, S.: Ensemble of feature sets and classification algorithms for sentiment classification. Information Sciences, vol. 181, pp. 1138–1152 (2011)
Google Scholar
Zheng, Z., Wu, X., Srihari, R.: Feature selection for text categorization on imbalanced data. SIGKDD Explor. 6(1), 80–89 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computing and Informatics, Universiti Malaysia Sabah, Kota Kinabalu, Sabah, Malaysia
Lai Po Hung & Rayner Alfred

Authors

Lai Po Hung
View author publications
You can also search for this author in PubMed Google Scholar
Rayner Alfred
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lai Po Hung .

Editor information

Editors and Affiliations

Faculty of Computer Sci. and Management, Wroclaw Univ. of Science and Technology Faculty of Computer Sci. and Management, Wroclaw, Poland
Dariusz Król
Faculty of Computer Sci. and Management, Wroclaw Univ of Science and Technology Faculty of Computer Sci. and Management, Wroclaw, Poland
Ngoc Thanh Nguyen
School of Advanced Sci. and Technology, Japan Advanced Inst. of Sci. and Tech School of Advanced Sci. and Technology, Ishikawa, Japan
Kiyoaki Shirai

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hung, L.P., Alfred, R. (2017). A Performance Comparison of Feature Extraction Methods for Sentiment Analysis. In: Król, D., Nguyen, N., Shirai, K. (eds) Advanced Topics in Intelligent Information and Database Systems. ACIIDS 2017. Studies in Computational Intelligence, vol 710. Springer, Cham. https://doi.org/10.1007/978-3-319-56660-3_33

Download citation

DOI: https://doi.org/10.1007/978-3-319-56660-3_33
Published: 23 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56659-7
Online ISBN: 978-3-319-56660-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics