Skip to main content
Log in

FCE-SVM: a new cluster based ensemble method for opinion mining from social media

  • Original Article
  • Published:
Information Systems and e-Business Management Aims and scope Submit manuscript

Abstract

Opinion mining aiming to automatically detect subjective information has raised more and more interests from both academic and industry fields in recent years. In order to enhance the performance of opinion mining, some ensemble methods have been investigated and proven to be effective theoretically and empirically. However, cluster based ensemble method is paid less attention to in the area of opinion mining. In this paper, a new cluster based ensemble method, FCE-SVM, is proposed for opinion mining from social media. Based on the philosophy of divide and conquer, FCE-SVM uses fuzzy clustering module to generate different training sub datasets in the first stage. Then, base learners are trained based on different training datasets in the second stage. Finally, fusion module is employed to combine the results of based learners. Moreover, the multi-domain opinion datasets were investigated to verify the effectiveness of proposed method. Empirical results reveal that FCE-SVM gets the best performance through reducing bias and variance simultaneously. These results illustrate that FCE-SVM can be used as a viable method for opinion mining.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Abbasi A, Chen H, Salem A (2008a) Sentiment analysis in multiple languages: feature selection for opinion classification in web forums. ACM Trans Inf Syst (TOIS) 26(3):12

    Article  Google Scholar 

  • Abbasi A, Chen H, Thoms S, Fu T (2008b) Affect analysis of web forums and blogs using correlation ensembles. IEEE Trans Knowl Data Eng 20(9):1168–1180

    Article  Google Scholar 

  • Blitzer J, Dredze M, Pereira F (2007) Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. ACL 31(2):440–447

  • Boiy E, Moens M-F (2009) A machine learning approach to sentiment analysis in multilingual web texts. Inf Retr 12(5):526–558

    Article  Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    Google Scholar 

  • Cambria E, Schuller B, Xia Y, Havasi C (2013) New avenues in opinion mining and sentiment analysis. IEEE Intell Syst 28(2):15–21

    Article  Google Scholar 

  • Chen H, Yang C (2011) Special issue on social media analytics: understanding the pulse of the society. Syst Man Cybern Part A Syst Hum IEEE Trans 41(5):826–827

    Article  Google Scholar 

  • Chern C-C, Wei C-P, Shen F-Y, Fan Y-N (2015) A sales forecasting model for consumer products based on the influence of online word-of-mouth. Inf Syst E-Bus Manag 13(3):445–473

    Article  Google Scholar 

  • Chiu SL (1994) Fuzzy model identification based on cluster estimation. J Intell Fuzzy syst 2(3):267–278

    Article  Google Scholar 

  • Cover T, Hart P (1967) Nearest neighbor pattern classification. Inf Theory IEEE Trans 13(1):21–27

    Article  Google Scholar 

  • Dang Y, Zhang Y, Chen H (2010) A lexicon-enhanced method for sentiment classification: an experiment on online product reviews. Intell Syst IEEE 25(4):46–53

    Article  Google Scholar 

  • Dave K, Lawrence S, Pennock DM (2003) Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th international conference on world wide web. ACM, pp 519–528

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    Google Scholar 

  • Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn 40(2):139–157

    Article  Google Scholar 

  • García-Pedrajas N (2009) Constructing ensembles of classifiers by means of weighted instance selection. Neural Netw IEEE Trans 20(2):258–277

    Article  Google Scholar 

  • Iman RL, Davenport JM (1980) Approximations of the critical region of the fbietkan statistic. Commun Stat Theory Methods 9(6):571–595

    Article  Google Scholar 

  • Isa D, Lee LH, Kallimani VP, Rajkumar R (2008) Text document preprocessing with the Bayes formula for classification using the support vector machine. IEEE Trans Knowl Data Eng 20(9):1264–1272

    Article  Google Scholar 

  • Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Machine learning (ECML-98), pp 137–142

  • Kim S-M, Hovy E (2004) Determining the sentiment of opinions. In: Proceedings of the 20th international conference on computational linguistics. Association for Computational Linguistics, p 1367

  • Kohavi R, Wolpert DH (1996) Bias plus variance decomposition for zero-one loss functions. In: Proceedings of the 13th international conference on machine learning, pp 275–283

  • Leopold E, Kindermann J (2002) Text categorization with support vector machines. how to represent texts in input space? Mach Learn 46(1):423–444

    Article  Google Scholar 

  • Li W, WANG W, Chen Y (2012) Heterogeneous ensemble learning for chinese sentiment classification. J Inf Comput Sci 9(15):4551–4558

    Google Scholar 

  • Liu L, Zsu MT (2009) Encyclopedia of database systems. Springer, Berlin

    Book  Google Scholar 

  • Lu B, Tsou BK (2010) Combining a large sentiment lexicon and machine learning for subjectivity classification. In: Proceedings of the IEEE 2010 international conference on machine learning and cybernetics, pp 3311–3316

  • Pal NR, Bezdek JC (1995) On cluster validity for the fuzzy C-means model. Fuzzy Syst IEEE Trans 3(3):370–379

    Article  Google Scholar 

  • Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135

    Article  Google Scholar 

  • Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 79–86

  • Polikar R (2006) Ensemble based systems in decision making. Circuits Syst Mag IEEE 6(3):21–45

    Article  Google Scholar 

  • Prabowo R, Thelwall M (2009) sentiment analysis: a combined approach. J Informetr 3(2):143–157

    Article  Google Scholar 

  • Quinlan JR (1993) C4. 5: programs for machine learning. Morgan Kaufmann Press, San Mateo, CA, United States

    Google Scholar 

  • Rish I (2001) An empirical study of the naive Bayes classifier, pp 41–46

  • Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47

    Article  Google Scholar 

  • Su Y, Zhang Y, Ji D, Wang Y, Wu H (2013) Ensemble learning for sentiment classification. In: Ji D, Xiao G (eds) Chinese lexical semantics. Springer, Berlin, Heidelberg, pp 84–93

  • Subrahmanian VS, Reforgiato D (2008) Ava: adjective-verb-adverb combinations for sentiment analysis. Intell Syst IEEE 23(4):43–50

    Article  Google Scholar 

  • Thelwall M, Buckley K (2013) Topic—based sentiment analysis for the social web: the role of mood and issue—related words. J Am Soc Inf Sci Technol 64(8):1608–1617

    Article  Google Scholar 

  • Thelwall M, Buckley K, Paltoglou G (2012) Sentiment strength detection for the social web. J Am Soc Inf Sci Technol 63(1):163–173

    Article  Google Scholar 

  • Thet TT, Na J-C, Khoo CS (2010) Aspect-based sentiment analysis of movie reviews on discussion boards. J Inf Sci 36(6):823–848

    Article  Google Scholar 

  • Tsutsumi K, Shimada K, Endo T (2007) Movie review classification based on a multiple classifier. In: Proceedings of the 21th Pacific Asia conference on language, information and computation, pp 481–488

  • Turney PD (2002) Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 417–424

  • Vapnik VN (2000) The nature of statistical learning theory. Springer, NY, United States

    Book  Google Scholar 

  • Wang G, Hao J, Ma J, Jiang H (2011a) A comparative assessment of ensemble learning for credit scoring. Expert Syst Appl 38(1):223–230

    Article  Google Scholar 

  • Wang G, Ma J, Yang S (2011b) Igf-bagging: information gain based feature selection for bagging. Int J Innov Comput Inf Control 7(11):6247–6259

    Google Scholar 

  • Wang G, Sun J, Ma J, Xu K, Gu J (2014) Sentiment classification: the contribution of ensemble learning. Decis Support Syst 57(1):77–93

    Article  Google Scholar 

  • Whitehead M, Yaeger L (2010) Sentiment mining using ensemble classification models. In: Sobh T (ed) Innovations and advances in computer sciences and engineering. Springer, Berlin, pp 509–514

  • Wilson T, Wiebe J, Hwa R (2006) Recognizing strong and weak opinion clauses. Comput Intell 22(2):73–99

    Article  Google Scholar 

  • Windeatt T, Ardeshir G (2004) Decision tree simplification for classifier ensembles. Int J Pattern Recognit Artif Intell 18(5):749–776

    Article  Google Scholar 

  • Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques. Morgan Kaufmann Press, Cambridge, MA, United States

    Google Scholar 

  • Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82

    Article  Google Scholar 

  • Xia R, Zong C, Li S (2011) Ensemble of feature sets and classification algorithms for sentiment classification. Inf Sci 181(6):1138–1152

    Article  Google Scholar 

  • Yang C-S, Chen C-H, Chang P-C (2015) Harnessing consumer reviews for marketing intelligence: a domain-adapted sentiment classification approach. Inf Syst E-Bus Manag 13(3):403–419

    Article  Google Scholar 

  • Yi J, Nasukawa T, Bunescu R, Niblack W (2003) sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques. In: Third IEEE international conference on data mining ICDM 2003, pp 427–434

  • Zhang C, Zeng D, Li J, Wang FY, Zuo W (2009) Sentiment analysis of chinese documents: from sentence to document level. J Am Soc Inf Sci Technol 60(12):2474–2487

    Article  Google Scholar 

  • Zhou Z-H (2012) Ensemble methods: foundations and algorithms. Chapman & Hall/CRC Press, NY, United States

    Book  Google Scholar 

Download references

Acknowledgements

This work is partially supported by the National Natural Science Foundation of China (Nos. 71101042, 71471054), the National Program on Key Basic Research Project (973 Program) (No. 2013CB329603), Specialized Research Fund for the Doctoral Program of Higher Education (20110111120014), the China Postdoctoral Science Foundation (2011M501041, 2013T60611).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daqing Zheng.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, G., Zheng, D., Yang, S. et al. FCE-SVM: a new cluster based ensemble method for opinion mining from social media. Inf Syst E-Bus Manage 16, 721–742 (2018). https://doi.org/10.1007/s10257-017-0352-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10257-017-0352-0

Keywords

Navigation