Skip to main content
Log in

Sentimental feature selection for sentiment analysis of Chinese online reviews

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

With the growing availability and popularity of online reviews, the sentiment analysis arises in response to the requirement of organizing useful information in speed. Feature selection directly affects the representation of online reviews and brings a lot of challenges to the domain of sentiment analysis. However, little attention has been paid to feature selection of Chinese online reviews so far. Therefore, we are motivated to explore the effects of feature selection on sentiment analysis of Chinese online reviews. Firstly, N-char-grams and N-POS-grams are selected as the potential sentimental features. Then, the improved Document Frequency method is used to select feature subsets, and the Boolean Weighting method is adopted to calculate feature weight. At last, experiments based on online reviews of mobile phone are conducted, and Chi-square test is carried out to test the significance of experimental results. The results suggest that sentiment analysis of Chinese online reviews obtains higher accuracy when taking 4-POS-grams as features. Besides that, low order N-char-grams can achieve a better performance than high order N-char-grams when taking N-char-grams as features. Furthermore, the improved document frequency achieves significant improvement in sentiment analysis of Chinese online reviews.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Li X, Xie H, Chen L, Wang J, Deng X (2014) News impact on stock price return via sentiment analysis. Knowl Based Syst 69:14–23

    Article  Google Scholar 

  2. Forman C, Ghose A, Wiesenfeld B (2008) Examining the relationship between reviews and sales: the role of reviewer identity disclosure in electronic markets. Inf Syst Res 19(3):291–313

    Article  Google Scholar 

  3. Greaves F, Ramirez D, Millett C, Darzi A, Donaldson L (2013) Harnessing the cloud of patient experience: using social media to detect poor quality healthcare. BMJ Qual Saf 22(3):251–255

    Article  Google Scholar 

  4. Yang L, Xu LD, Shi ZZ (2012) An enhanced dynamic hash trie algorithm for lexicon search. Enterpr Inf Syst 6(4):419–432

    Article  Google Scholar 

  5. Li HX, Xu LD, Wang JY, Mo ZW (2003) Feature space theory in data mining: transformations between extensions and intensions in knowledge representation. Expert Syst 20(2):60–71

    Article  Google Scholar 

  6. Ye Q, Lin B, Li YJ (2005) Sentiment classification for chinese reviews: a comparison between SVM and semantic approaches. In: proceedings of the 4th international conference on machine learning and cybernetics. NY, USA: IEEE Press, pp 2341–2346

  7. Xie ZX, Xu Y (2014) Sparse group LASSO based uncertain feature selection. Int J Mach Learn Cybern 5(2):201–210

    Article  Google Scholar 

  8. Subrahmanya N, Shin YC (2013) A variational bayesian framework for group feature selection. Int J Mach Learn Cybern 4(6):609–619

    Article  Google Scholar 

  9. Wei P, Ma PJ, Hu QH, Su XH (2014) Comparative analysis on margin based feature selection algorithms. Int J Mach Learn Cybern 5(3):339–367

    Article  Google Scholar 

  10. Abbasi A, Chen H, Salem A (2008) Sentiment analysis in multiple languages: feature selection for opinion classification in web forums. ACM Trans Inf Syst (TOIS) 26(3):12–21

    Article  Google Scholar 

  11. Huang C (1997) Word segmentation issues in chinese information processing. Applied linguistics (in Chinese), p 1

  12. Zhao H, Huang C, Li M (2006) An improved chinese word segmentation system with conditional random field. In: proceedings of the 5th SIGNAN workshop on Chinese language processing. Sydney, Australia, pp 162–165

  13. Gao J, Li M, Wu A, Huang C (2005) Chinese word segmentation and named entity recognition: a pragmatic approach. Comput Linguist 31(4):531–574

    Article  MATH  Google Scholar 

  14. Zhang D (2013) An evolutionary approach to automatic chinese text segmentation. In: ninth international conference on natural computation

  15. Abbasi A, Chen H, Thoms S, Fu T (2008) Affect analysis of web forums and blogs using correlation ensembles. IEEE Trans Knowl Data Eng 20(9):1168–1180

    Article  Google Scholar 

  16. Ghiassi M, Skinner J, Zimbra D (2013) Twitter brand sentiment analysis: a hybrid system using N-gram analysis and dynamic artificial neural network. Expert Syst Appl 40(16):6266–6282

    Article  Google Scholar 

  17. Remus R, Rill S (2013) Data-driven vs. dictionary-based word n-gram feature induction for sentiment analysis. In: 25th international conference of the German-Society-for -Computational-Linguistics-and-Language-Technology (GSCL). Darmstadt, Germany, pp 25–27

  18. Pang B, Lee L, Vaithyanathan S (2002) Sentiment classification using machine learning techniques. In: proceedings of the conference on empirical methods in natural language processing, Philadelphia, US, pp 79–86

  19. Cui H, Mittal V, Datar M (2006) Comparative experiments on sentiment classification for online product reviews. In: proceedings of the 21st national conference on artificial intelligence (AAAI-06), Boston, USA, pp 1265–1270

  20. Ng V, Dasgupta S, Arifin N (2006) Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews. In: proceedings of the COLING/ACL main conference poster sessions, Association for Computational Linguistics, Morristown, NJ, USA, pp 611–618

  21. Turney P (2002) Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of review. In: proceedings of the 40th annual meeting of the association for computational linguistics, Association for Computational Linguistics, Morristown, NJ, USA, pp 417–424

  22. Mullen T, Collier N (2004) Sentiment analysis using support vector machines with diverse information sources. In: proceedings of the 2004 conference on empirical methods in natural language processing, Barcelona, Spain, pp 412–418

  23. Ng V, Dasgupta S, Arifin SMN (2006) Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews.In: proceedings conference computational linguistics, association for computational linguistics, pp 611–618

  24. Ng HT, Goh WB, Low KL (1997) Feature selection, perceptron learning and a usability case study for text categorization. In: proceedings of the 20th annual Int’l ACM SIGIR conference on research and development in information retrieval, pp 67–73

  25. Liu X (2011) Sentiment polarity classification on chinese reviews based on statistic natural language. Master’s Degree Thesis, Tongji University

  26. Wang HW, Yin P, Yao JN (2013) Text feature selection for sentiment classification of chinese online reviews. J Exp Theor Artif Intell 25(4):425–439

    Article  Google Scholar 

  27. Rückstieß T, Osendorfer C, Smagt PVD (2013) Minimizing data consumption with sequential online feature selection. Int J Mach Learn Cybern 4(3):235–243

    Article  Google Scholar 

  28. Xia HS, Peng LY (2009) SVM-based comments classification and mining of virtual community: for case of sentiment classification of hotel reviews. In: proceedings of the Int’l symposium on intelligent information systems and applications, pp 507–511

  29. Phienthrakul T, Kijsirikul B, Takamura H, Okumura M (2009) Sentiment classification with support vector machines and multiple kernel functions. Lect Notes Computer Sci 58:583–592

    Article  Google Scholar 

  30. Ye Q, Zhang ZQ, Law R (2009) Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Expert Syst Appl 36(3):6527–6535

    Article  Google Scholar 

  31. Moraes R, Valiati JF, Gaviao N, Wilson P (2013) Document-level sentiment classification: an empirical comparison between SVM and ANN. Expert Syst Appl 40(2):621–633

    Article  Google Scholar 

  32. Wan X (2011) Bilingual co-training for sentiment classification of chinese product reviews. Comput Linguist 37(3):587–616

    Article  Google Scholar 

Download references

Acknowledgments

This work is partially supported by the NSFC Grant 70971099 and 71371144, the fundamental research funds for the Central Universities (1200219198), and Shanghai Philosophy and Social Science Planning Projects (2013BGL004).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongwei Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, L., Wang, H. & Gao, S. Sentimental feature selection for sentiment analysis of Chinese online reviews. Int. J. Mach. Learn. & Cyber. 9, 75–84 (2018). https://doi.org/10.1007/s13042-015-0347-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-015-0347-4

Keywords

Navigation