Skip to main content
Log in

Sentiment analysis based on clustering: a framework in improving accuracy and recognizing neutral opinions

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Clustering-based sentiment analysis is a novel approach for analyzing opinions expressed in reviews, comments or blogs. In contrast to the two traditional mainstream approaches (supervised learning and symbolic techniques), the clustering-based approach is able to produce basically accurate analysis results without any human participation, linguist knowledge or training time.

This paper introduces new techniques designed to extend the capability of the clustering-based sentiment analysis approach in two aspects: firstly by applying opposite opinion contents processing and non-opinion contents processing techniques to further enhance accuracy; and secondly by using a modified voting mechanism and distance measurement method to conduct fine-grained (three classes) sentiment analysis. According to the experiment results, the clustering-based approach is proven to be useful in performing high quality sentiment analysis result, and suitable for recognizing neutral opinions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Algorithm 2
Fig. 3
Algorithm 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Once the TF-IDF weights are calculated by using frequency of data, the weigh values are also able to be applied on presence of data.

  2. In this paper, documents with large proportions of objective content are not regarded as neutral documents. The object of the study is opinion expressing documents, though they usually involved small proportions of objective content.

References

  1. Hitlin PLR (2004) The use of online reputation and rating systems. In: Pew Internet & American Life Project Memo. doi:10.1016/j.dss.2005.05.019

    Google Scholar 

  2. Group ctK (2007) Online consumer-generated reviews have significant impact on offline purchase behavior. http://www.comscore.com/Press_Events/Press_Releases/2007/11/Online_Consumer_Reviews_Impact_Offline_Purchasing_Behavior

  3. Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135. doi:10.1561/1500000011

    Article  Google Scholar 

  4. Chiu C-M (2004) Towards a hypermedia-enabled and web-based data analysis framework. J Inf Sci 30(1):60. doi:10.1177/0165551504041679

    Article  MathSciNet  Google Scholar 

  5. Tang H, Tan S, Cheng X (2009) A survey on sentiment detection of reviews. Expert Syst Appl 36(7):10760–10773. doi:10.1016/j.eswa.2009.02.063

    Article  Google Scholar 

  6. Boiy E, Hens P, Deschacht K, Moens M-F (2007) Automatic sentiment analysis in on-line text. In: International conference on electronic publishing pages, Vienna, Austria, pp 349–360

    Google Scholar 

  7. Li G, Liu F (2012) Application of a clustering method on sentiment analysis. J Inf Sci 38(2):127–139. doi:10.1177/0165551511432670

    Article  Google Scholar 

  8. Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine learning techniques. In: Conference on empirical methods in natural language processing (EMNLP), Philadelphia, Pennsylvania, USA, p 79. doi:10.3115/1118693.1118704

    Chapter  Google Scholar 

  9. Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd annual meeting on association for Computational Linguistics, Stroudsburg, PA, USA. Association for Computational Linguistics, p 271. doi:10.3115/1218955.1218990

    Google Scholar 

  10. Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 115–124. doi:10.3115/1219840.1219855

    Google Scholar 

  11. Cesarano C, Dorr B, Picariello A, Reforgiato D, Sagoff A, Subrahmanian VS (2004) Oasys: an opinion analysis system. In: AAAI spring symposium on computational approaches to analyzing weblogs

    Google Scholar 

  12. Kamps J, Marx M, Mokken RJ, De Rijke M (2004) Using wordnet to measure semantic orientations of adjectives. Paper presented at the International conference on language resources and evaluation

  13. Turney PD (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: 40th annual meeting of the association for computational linguistics (ACL), Philadelphia, Pennsylvania, USA, p 417. doi:10.3115/1073083.1073153

    Google Scholar 

  14. Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620. doi:10.1145/361219.361220

    Article  MATH  Google Scholar 

  15. Andrews NO, Fox EA (2007) Recent developments in document clustering. Computer Science, Virginia Tech, Tech Rep

  16. Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval∗1. Inf Process Manag 24(5):513–523

    Article  Google Scholar 

  17. Al-Harbi S, Rayward-Smith V (2006) Adapting k-means for supervised clustering. Appl Intell 24(3):219–226

    Article  Google Scholar 

  18. Tan PN, Steinbach M, Kumar V (2006) Introduction to data mining. Pearson Addison Wesley, Boston

    Google Scholar 

  19. Xia R, Zong C, Li S (2011) Ensemble of feature sets and classification algorithms for sentiment classification. Inf Sci 181(6):1138–1152

    Article  Google Scholar 

  20. Tan S (2008) An improved centroid classifier for text categorization. Expert Syst Appl 35(1):279–285

    Article  Google Scholar 

  21. Bai X (2011) Predicting consumer sentiments from online text. Decis Support Syst 50(4):732–742

    Article  Google Scholar 

  22. Goldberg AB, Zhu X (2006) Seeing stars when there aren’t many stars: graph-based semi-supervised learning for sentiment categorization. Association for Computational Linguistics, pp 45–52

  23. Chaovalit P, Zhou L (2005) Movie review mining: a comparison between supervised and unsupervised classification approaches. Paper presented at the Proceedings of the 38th Hawaii international conference on system sciences

  24. Shi K, Li L (2012) High performance genetic algorithm based text clustering using parts of speech and outlier elimination. Appl Intell:1–9

  25. Laszlo M, Mukherjee S (2007) A genetic algorithm that exchanges neighboring centers for 〈ik〈/i〉-means clustering. Pattern Recognit Lett 28(16):2359–2366

    Article  Google Scholar 

  26. Poomagal S, Hamsapriya T (2011) Optimized k-means clustering with intelligent initial centroid selection for web search using URL and tag contents. In: Proceedings of the international conference on web intelligence, mining and semantics. ACM, New York, p 65

    Google Scholar 

  27. Menéndez H, Camacho D (2012) A genetic graph-based clustering algorithm. In: Intelligent data engineering and automated learning-IDEAL 2012. Springer, Berlin, pp 216–225

    Chapter  Google Scholar 

  28. Hong T-P, Lin C-W, Yang K-T, Wang S-L (2012) Using TF-IDF to hide sensitive itemsets. Appl Intell:1–9

  29. Manthey B, Röglin H (2009) Improved smoothed analysis of the k-means method. In: Society for industrial and applied mathematics, pp 461–470

    Google Scholar 

  30. Yu H, Hatzivassiloglou V (2003) Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. In: Conference on empirical methods in natural language processing, Stroudsburg, PA, USA. Association for Computational Linguistics, p 129. doi:10.3115/1119355.1119372

    Google Scholar 

  31. Hatzivassiloglou V, Klavans JL, Holcombe ML, Barzilay R, Kan MY, McKeown KR (2001) Simfinder: a flexible clustering tool for summarization. In: Citeseer

    Google Scholar 

  32. Larsen B, Aone C (1999) Fast and effective text mining using linear-time document clustering. In: KDD-99. ACM, New York, pp 16–22. doi:10.1145/312129.312186

    Chapter  Google Scholar 

  33. Koppel M, Schler J (2006) The importance of neutral examples for learning sentiment. Comput Intell 22(2):100–109. doi:10.1111/j.1467-8640.2006.00276.x

    Article  MathSciNet  Google Scholar 

  34. Kleinberg J, Tardos E (1999) Approximation algorithms for classification problems with pairwise relationships: metric labeling and Markov random fields. In: IEEE, pp 14–23. doi:10.1109/SFFCS.1999.814572

    Google Scholar 

  35. Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416

    Article  MathSciNet  Google Scholar 

  36. Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304

    Article  Google Scholar 

  37. Yang M-S, Lai C-Y, Lin C-Y (2012) A robust EM clustering algorithm for Gaussian mixture models. Pattern Recognit

  38. Yokoyama S, Nakayama A, Okada A (2009) One-mode three-way overlapping cluster analysis. Comput Stat 24(1):165–179

    Article  MATH  MathSciNet  Google Scholar 

  39. Bello-Orgaz G, Menéndez HD, Camacho D (2012) Adaptive k-means algorithm for overlapped graph clustering. Int J Neural Syst 22(05)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gang Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, G., Liu, F. Sentiment analysis based on clustering: a framework in improving accuracy and recognizing neutral opinions. Appl Intell 40, 441–452 (2014). https://doi.org/10.1007/s10489-013-0463-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-013-0463-3

Keywords

Navigation