Abstract
Opinion mining aiming to automatically detect subjective information has raised more and more interests from both academic and industry fields in recent years. In order to enhance the performance of opinion mining, some ensemble methods have been investigated and proven to be effective theoretically and empirically. However, cluster based ensemble method is paid less attention to in the area of opinion mining. In this paper, a new cluster based ensemble method, FCE-SVM, is proposed for opinion mining from social media. Based on the philosophy of divide and conquer, FCE-SVM uses fuzzy clustering module to generate different training sub datasets in the first stage. Then, base learners are trained based on different training datasets in the second stage. Finally, fusion module is employed to combine the results of based learners. Moreover, the multi-domain opinion datasets were investigated to verify the effectiveness of proposed method. Empirical results reveal that FCE-SVM gets the best performance through reducing bias and variance simultaneously. These results illustrate that FCE-SVM can be used as a viable method for opinion mining.
Similar content being viewed by others
References
Abbasi A, Chen H, Salem A (2008a) Sentiment analysis in multiple languages: feature selection for opinion classification in web forums. ACM Trans Inf Syst (TOIS) 26(3):12
Abbasi A, Chen H, Thoms S, Fu T (2008b) Affect analysis of web forums and blogs using correlation ensembles. IEEE Trans Knowl Data Eng 20(9):1168–1180
Blitzer J, Dredze M, Pereira F (2007) Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. ACL 31(2):440–447
Boiy E, Moens M-F (2009) A machine learning approach to sentiment analysis in multilingual web texts. Inf Retr 12(5):526–558
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Cambria E, Schuller B, Xia Y, Havasi C (2013) New avenues in opinion mining and sentiment analysis. IEEE Intell Syst 28(2):15–21
Chen H, Yang C (2011) Special issue on social media analytics: understanding the pulse of the society. Syst Man Cybern Part A Syst Hum IEEE Trans 41(5):826–827
Chern C-C, Wei C-P, Shen F-Y, Fan Y-N (2015) A sales forecasting model for consumer products based on the influence of online word-of-mouth. Inf Syst E-Bus Manag 13(3):445–473
Chiu SL (1994) Fuzzy model identification based on cluster estimation. J Intell Fuzzy syst 2(3):267–278
Cover T, Hart P (1967) Nearest neighbor pattern classification. Inf Theory IEEE Trans 13(1):21–27
Dang Y, Zhang Y, Chen H (2010) A lexicon-enhanced method for sentiment classification: an experiment on online product reviews. Intell Syst IEEE 25(4):46–53
Dave K, Lawrence S, Pennock DM (2003) Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th international conference on world wide web. ACM, pp 519–528
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn 40(2):139–157
García-Pedrajas N (2009) Constructing ensembles of classifiers by means of weighted instance selection. Neural Netw IEEE Trans 20(2):258–277
Iman RL, Davenport JM (1980) Approximations of the critical region of the fbietkan statistic. Commun Stat Theory Methods 9(6):571–595
Isa D, Lee LH, Kallimani VP, Rajkumar R (2008) Text document preprocessing with the Bayes formula for classification using the support vector machine. IEEE Trans Knowl Data Eng 20(9):1264–1272
Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Machine learning (ECML-98), pp 137–142
Kim S-M, Hovy E (2004) Determining the sentiment of opinions. In: Proceedings of the 20th international conference on computational linguistics. Association for Computational Linguistics, p 1367
Kohavi R, Wolpert DH (1996) Bias plus variance decomposition for zero-one loss functions. In: Proceedings of the 13th international conference on machine learning, pp 275–283
Leopold E, Kindermann J (2002) Text categorization with support vector machines. how to represent texts in input space? Mach Learn 46(1):423–444
Li W, WANG W, Chen Y (2012) Heterogeneous ensemble learning for chinese sentiment classification. J Inf Comput Sci 9(15):4551–4558
Liu L, Zsu MT (2009) Encyclopedia of database systems. Springer, Berlin
Lu B, Tsou BK (2010) Combining a large sentiment lexicon and machine learning for subjectivity classification. In: Proceedings of the IEEE 2010 international conference on machine learning and cybernetics, pp 3311–3316
Pal NR, Bezdek JC (1995) On cluster validity for the fuzzy C-means model. Fuzzy Syst IEEE Trans 3(3):370–379
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 79–86
Polikar R (2006) Ensemble based systems in decision making. Circuits Syst Mag IEEE 6(3):21–45
Prabowo R, Thelwall M (2009) sentiment analysis: a combined approach. J Informetr 3(2):143–157
Quinlan JR (1993) C4. 5: programs for machine learning. Morgan Kaufmann Press, San Mateo, CA, United States
Rish I (2001) An empirical study of the naive Bayes classifier, pp 41–46
Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47
Su Y, Zhang Y, Ji D, Wang Y, Wu H (2013) Ensemble learning for sentiment classification. In: Ji D, Xiao G (eds) Chinese lexical semantics. Springer, Berlin, Heidelberg, pp 84–93
Subrahmanian VS, Reforgiato D (2008) Ava: adjective-verb-adverb combinations for sentiment analysis. Intell Syst IEEE 23(4):43–50
Thelwall M, Buckley K (2013) Topic—based sentiment analysis for the social web: the role of mood and issue—related words. J Am Soc Inf Sci Technol 64(8):1608–1617
Thelwall M, Buckley K, Paltoglou G (2012) Sentiment strength detection for the social web. J Am Soc Inf Sci Technol 63(1):163–173
Thet TT, Na J-C, Khoo CS (2010) Aspect-based sentiment analysis of movie reviews on discussion boards. J Inf Sci 36(6):823–848
Tsutsumi K, Shimada K, Endo T (2007) Movie review classification based on a multiple classifier. In: Proceedings of the 21th Pacific Asia conference on language, information and computation, pp 481–488
Turney PD (2002) Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 417–424
Vapnik VN (2000) The nature of statistical learning theory. Springer, NY, United States
Wang G, Hao J, Ma J, Jiang H (2011a) A comparative assessment of ensemble learning for credit scoring. Expert Syst Appl 38(1):223–230
Wang G, Ma J, Yang S (2011b) Igf-bagging: information gain based feature selection for bagging. Int J Innov Comput Inf Control 7(11):6247–6259
Wang G, Sun J, Ma J, Xu K, Gu J (2014) Sentiment classification: the contribution of ensemble learning. Decis Support Syst 57(1):77–93
Whitehead M, Yaeger L (2010) Sentiment mining using ensemble classification models. In: Sobh T (ed) Innovations and advances in computer sciences and engineering. Springer, Berlin, pp 509–514
Wilson T, Wiebe J, Hwa R (2006) Recognizing strong and weak opinion clauses. Comput Intell 22(2):73–99
Windeatt T, Ardeshir G (2004) Decision tree simplification for classifier ensembles. Int J Pattern Recognit Artif Intell 18(5):749–776
Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques. Morgan Kaufmann Press, Cambridge, MA, United States
Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82
Xia R, Zong C, Li S (2011) Ensemble of feature sets and classification algorithms for sentiment classification. Inf Sci 181(6):1138–1152
Yang C-S, Chen C-H, Chang P-C (2015) Harnessing consumer reviews for marketing intelligence: a domain-adapted sentiment classification approach. Inf Syst E-Bus Manag 13(3):403–419
Yi J, Nasukawa T, Bunescu R, Niblack W (2003) sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques. In: Third IEEE international conference on data mining ICDM 2003, pp 427–434
Zhang C, Zeng D, Li J, Wang FY, Zuo W (2009) Sentiment analysis of chinese documents: from sentence to document level. J Am Soc Inf Sci Technol 60(12):2474–2487
Zhou Z-H (2012) Ensemble methods: foundations and algorithms. Chapman & Hall/CRC Press, NY, United States
Acknowledgements
This work is partially supported by the National Natural Science Foundation of China (Nos. 71101042, 71471054), the National Program on Key Basic Research Project (973 Program) (No. 2013CB329603), Specialized Research Fund for the Doctoral Program of Higher Education (20110111120014), the China Postdoctoral Science Foundation (2011M501041, 2013T60611).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, G., Zheng, D., Yang, S. et al. FCE-SVM: a new cluster based ensemble method for opinion mining from social media. Inf Syst E-Bus Manage 16, 721–742 (2018). https://doi.org/10.1007/s10257-017-0352-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10257-017-0352-0