Abstract
More people are buying products online and expressing their opinions on these products through online reviews. Sentiment analysis can be used to extract valuable information from reviews, and the results can benefit both consumers and manufacturers. This research shows a study which compares two well known machine learning algorithms namely, dynamic language model and naïve Bayes classifier. Experiments have been carried out to determine the consistency of results when the datasets are of different sizes and also the effect of a balanced or unbalanced dataset. The experimental results indicate that both the algorithms over a realistic unbalanced dataset can achieve better results than the balanced datasets commonly used in research.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Alias-I 2008, LingPipe 3.9.2, http://www.alias-i.com/lingpipe (March 1, 2010)
Blitzer, J., Dredze, M., Pereira, F.: Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. In: 45th Annual Meeting of the Association for Computational Linguistics, pp. 440–447. ACL, Prague (2007)
Carpenter, B.: Scaling High-Order Character Language Models to Gigabytes. In: Workshop on Software, pp. 86–99. Association for Computational Linguistics, Morristown (2005)
Dave, K., Lawrence, S., Pennock, D.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: 12th International Conference on World Wide Web, pp. 519–528. ACM, New York (2003)
Ding, X., Liu, B.: The Utility of Linguistic Rules in Opinion Mining. In: 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 811–812. ACM, New York (2007)
Ding, X., Liu, B., Yu, P.S.: A holistic lexicon-based approach to opinion mining. In: International Conference on Web Search and Web Data Mining, pp. 231–240. ACM, New York (2008)
Hu, M., Liu, B.: Mining opinion features in customer reviews. In: 19th National Conference on Artificial Intelligence, pp. 755–760. AAAI Press / The MIT Press (2004b)
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177. ACM, New York (2004a)
Jelinek, F., Merialdo, B., Roukos, S., Strauss, M.: A dynamic language model for speech recognition. In: Workshop on Speech and Natural Language, pp. 293–295. Association for Computational Linguistics, Morristown (1991)
Liu, B., Hu, M., Cheng, J.: Opinion Observer: Analyzing and Comparing Opinions on the Web. In: Proceedings of the 14th International Conference on World Wide Web, pp. 342–351. ACM, New York (2005)
Pang, B., Lee, L.: A Sentimental Education: Sentiment Analysis using subjectivity summarization based on minimum cuts. In: 42nd Annual Meeting on Association for Computational Linguistics, pp. 271–278. Association for Computational Linguistics, Morristown (2004)
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment Classification using Machine Learning Techniques. In: ACL 2002 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics, Morristown (2002)
Potter, M.: February 1st-last update, European online retail sales up (2010), http://uk.reuters.com/article/2010/02/01/uk-europe-retail-online-idUKTRE61000G20100201 (March 1, 2011)
Thelen, M., Riloff, E.: A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts. In: ACL 2002 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 214–221. Association for Computational Linguistics, Morristown (2002)
Turney, P.D.: Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In: Proceedings of the 40th Annual Meeting of the ACL, pp. 417–424. Association for Computational Linguistics, Morristown (2002)
Ye, Q., Zhang, Z., Law, R.: Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Expert Systems with Applications: An International Journal 36(3), 6527–6535 (2009)
Yu, H., Hatzivassiloglou, V.: Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 129–136. Association for Computational Linguistics, Morristown (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Burns, N., Bi, Y., Wang, H., Anderson, T. (2011). Sentiment Analysis of Customer Reviews: Balanced versus Unbalanced Datasets. In: König, A., Dengel, A., Hinkelmann, K., Kise, K., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based and Intelligent Information and Engineering Systems. KES 2011. Lecture Notes in Computer Science(), vol 6881. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23851-2_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-23851-2_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23850-5
Online ISBN: 978-3-642-23851-2
eBook Packages: Computer ScienceComputer Science (R0)