Sentiment analysis based on clustering: a framework in improving accuracy and recognizing neutral opinions

Li, Gang; Liu, Fei

doi:10.1007/s10489-013-0463-3

Sentiment analysis based on clustering: a framework in improving accuracy and recognizing neutral opinions

Published: 15 September 2013

Volume 40, pages 441–452, (2014)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Gang Li¹ &
Fei Liu¹

2079 Accesses
30 Citations
Explore all metrics

Abstract

Clustering-based sentiment analysis is a novel approach for analyzing opinions expressed in reviews, comments or blogs. In contrast to the two traditional mainstream approaches (supervised learning and symbolic techniques), the clustering-based approach is able to produce basically accurate analysis results without any human participation, linguist knowledge or training time.

This paper introduces new techniques designed to extend the capability of the clustering-based sentiment analysis approach in two aspects: firstly by applying opposite opinion contents processing and non-opinion contents processing techniques to further enhance accuracy; and secondly by using a modified voting mechanism and distance measurement method to conduct fine-grained (three classes) sentiment analysis. According to the experiment results, the clustering-based approach is proven to be useful in performing high quality sentiment analysis result, and suitable for recognizing neutral opinions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on sentiment analysis methods, applications, and challenges

Article 07 February 2022

A review on sentiment analysis and emotion detection from text

Article 28 August 2021

Sentiment Analysis in the Age of Generative AI

Article Open access 05 March 2024

Notes

Once the TF-IDF weights are calculated by using frequency of data, the weigh values are also able to be applied on presence of data.
In this paper, documents with large proportions of objective content are not regarded as neutral documents. The object of the study is opinion expressing documents, though they usually involved small proportions of objective content.

References

Hitlin PLR (2004) The use of online reputation and rating systems. In: Pew Internet & American Life Project Memo. doi:10.1016/j.dss.2005.05.019
Google Scholar
Group ctK (2007) Online consumer-generated reviews have significant impact on offline purchase behavior. http://www.comscore.com/Press_Events/Press_Releases/2007/11/Online_Consumer_Reviews_Impact_Offline_Purchasing_Behavior
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135. doi:10.1561/1500000011
Article Google Scholar
Chiu C-M (2004) Towards a hypermedia-enabled and web-based data analysis framework. J Inf Sci 30(1):60. doi:10.1177/0165551504041679
Article MathSciNet Google Scholar
Tang H, Tan S, Cheng X (2009) A survey on sentiment detection of reviews. Expert Syst Appl 36(7):10760–10773. doi:10.1016/j.eswa.2009.02.063
Article Google Scholar
Boiy E, Hens P, Deschacht K, Moens M-F (2007) Automatic sentiment analysis in on-line text. In: International conference on electronic publishing pages, Vienna, Austria, pp 349–360
Google Scholar
Li G, Liu F (2012) Application of a clustering method on sentiment analysis. J Inf Sci 38(2):127–139. doi:10.1177/0165551511432670
Article Google Scholar
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine learning techniques. In: Conference on empirical methods in natural language processing (EMNLP), Philadelphia, Pennsylvania, USA, p 79. doi:10.3115/1118693.1118704
Chapter Google Scholar
Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd annual meeting on association for Computational Linguistics, Stroudsburg, PA, USA. Association for Computational Linguistics, p 271. doi:10.3115/1218955.1218990
Google Scholar
Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 115–124. doi:10.3115/1219840.1219855
Google Scholar
Cesarano C, Dorr B, Picariello A, Reforgiato D, Sagoff A, Subrahmanian VS (2004) Oasys: an opinion analysis system. In: AAAI spring symposium on computational approaches to analyzing weblogs
Google Scholar
Kamps J, Marx M, Mokken RJ, De Rijke M (2004) Using wordnet to measure semantic orientations of adjectives. Paper presented at the International conference on language resources and evaluation
Turney PD (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: 40th annual meeting of the association for computational linguistics (ACL), Philadelphia, Pennsylvania, USA, p 417. doi:10.3115/1073083.1073153
Google Scholar
Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620. doi:10.1145/361219.361220
Article MATH Google Scholar
Andrews NO, Fox EA (2007) Recent developments in document clustering. Computer Science, Virginia Tech, Tech Rep
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval∗1. Inf Process Manag 24(5):513–523
Article Google Scholar
Al-Harbi S, Rayward-Smith V (2006) Adapting k-means for supervised clustering. Appl Intell 24(3):219–226
Article Google Scholar
Tan PN, Steinbach M, Kumar V (2006) Introduction to data mining. Pearson Addison Wesley, Boston
Google Scholar
Xia R, Zong C, Li S (2011) Ensemble of feature sets and classification algorithms for sentiment classification. Inf Sci 181(6):1138–1152
Article Google Scholar
Tan S (2008) An improved centroid classifier for text categorization. Expert Syst Appl 35(1):279–285
Article Google Scholar
Bai X (2011) Predicting consumer sentiments from online text. Decis Support Syst 50(4):732–742
Article Google Scholar
Goldberg AB, Zhu X (2006) Seeing stars when there aren’t many stars: graph-based semi-supervised learning for sentiment categorization. Association for Computational Linguistics, pp 45–52
Chaovalit P, Zhou L (2005) Movie review mining: a comparison between supervised and unsupervised classification approaches. Paper presented at the Proceedings of the 38th Hawaii international conference on system sciences
Shi K, Li L (2012) High performance genetic algorithm based text clustering using parts of speech and outlier elimination. Appl Intell:1–9
Laszlo M, Mukherjee S (2007) A genetic algorithm that exchanges neighboring centers for 〈i〉k〈/i〉-means clustering. Pattern Recognit Lett 28(16):2359–2366
Article Google Scholar
Poomagal S, Hamsapriya T (2011) Optimized k-means clustering with intelligent initial centroid selection for web search using URL and tag contents. In: Proceedings of the international conference on web intelligence, mining and semantics. ACM, New York, p 65
Google Scholar
Menéndez H, Camacho D (2012) A genetic graph-based clustering algorithm. In: Intelligent data engineering and automated learning-IDEAL 2012. Springer, Berlin, pp 216–225
Chapter Google Scholar
Hong T-P, Lin C-W, Yang K-T, Wang S-L (2012) Using TF-IDF to hide sensitive itemsets. Appl Intell:1–9
Manthey B, Röglin H (2009) Improved smoothed analysis of the k-means method. In: Society for industrial and applied mathematics, pp 461–470
Google Scholar
Yu H, Hatzivassiloglou V (2003) Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. In: Conference on empirical methods in natural language processing, Stroudsburg, PA, USA. Association for Computational Linguistics, p 129. doi:10.3115/1119355.1119372
Google Scholar
Hatzivassiloglou V, Klavans JL, Holcombe ML, Barzilay R, Kan MY, McKeown KR (2001) Simfinder: a flexible clustering tool for summarization. In: Citeseer
Google Scholar
Larsen B, Aone C (1999) Fast and effective text mining using linear-time document clustering. In: KDD-99. ACM, New York, pp 16–22. doi:10.1145/312129.312186
Chapter Google Scholar
Koppel M, Schler J (2006) The importance of neutral examples for learning sentiment. Comput Intell 22(2):100–109. doi:10.1111/j.1467-8640.2006.00276.x
Article MathSciNet Google Scholar
Kleinberg J, Tardos E (1999) Approximation algorithms for classification problems with pairwise relationships: metric labeling and Markov random fields. In: IEEE, pp 14–23. doi:10.1109/SFFCS.1999.814572
Google Scholar
Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
Article MathSciNet Google Scholar
Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304
Article Google Scholar
Yang M-S, Lai C-Y, Lin C-Y (2012) A robust EM clustering algorithm for Gaussian mixture models. Pattern Recognit
Yokoyama S, Nakayama A, Okada A (2009) One-mode three-way overlapping cluster analysis. Comput Stat 24(1):165–179
Article MATH MathSciNet Google Scholar
Bello-Orgaz G, Menéndez HD, Camacho D (2012) Adaptive k-means algorithm for overlapped graph clustering. Int J Neural Syst 22(05)

Download references

Author information

Authors and Affiliations

Department of Computer Science and Computer Engineering, La Trobe University, Melbourne, Australia
Gang Li & Fei Liu

Authors

Gang Li
View author publications
You can also search for this author in PubMed Google Scholar
Fei Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gang Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, G., Liu, F. Sentiment analysis based on clustering: a framework in improving accuracy and recognizing neutral opinions. Appl Intell 40, 441–452 (2014). https://doi.org/10.1007/s10489-013-0463-3

Download citation

Published: 15 September 2013
Issue Date: April 2014
DOI: https://doi.org/10.1007/s10489-013-0463-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sentiment analysis based on clustering: a framework in improving accuracy and recognizing neutral opinions

Abstract

Access this article

Similar content being viewed by others

A survey on sentiment analysis methods, applications, and challenges

A review on sentiment analysis and emotion detection from text

Sentiment Analysis in the Age of Generative AI

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sentiment analysis based on clustering: a framework in improving accuracy and recognizing neutral opinions

Abstract

Access this article

Similar content being viewed by others

A survey on sentiment analysis methods, applications, and challenges

A review on sentiment analysis and emotion detection from text

Sentiment Analysis in the Age of Generative AI

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation