Skip to main content

Sentiment Analysis of Customer Reviews: Balanced versus Unbalanced Datasets

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6881))

Abstract

More people are buying products online and expressing their opinions on these products through online reviews. Sentiment analysis can be used to extract valuable information from reviews, and the results can benefit both consumers and manufacturers. This research shows a study which compares two well known machine learning algorithms namely, dynamic language model and naïve Bayes classifier. Experiments have been carried out to determine the consistency of results when the datasets are of different sizes and also the effect of a balanced or unbalanced dataset. The experimental results indicate that both the algorithms over a realistic unbalanced dataset can achieve better results than the balanced datasets commonly used in research.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alias-I 2008, LingPipe 3.9.2, http://www.alias-i.com/lingpipe (March 1, 2010)

  2. Blitzer, J., Dredze, M., Pereira, F.: Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. In: 45th Annual Meeting of the Association for Computational Linguistics, pp. 440–447. ACL, Prague (2007)

    Google Scholar 

  3. Carpenter, B.: Scaling High-Order Character Language Models to Gigabytes. In: Workshop on Software, pp. 86–99. Association for Computational Linguistics, Morristown (2005)

    Google Scholar 

  4. Dave, K., Lawrence, S., Pennock, D.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: 12th International Conference on World Wide Web, pp. 519–528. ACM, New York (2003)

    Google Scholar 

  5. Ding, X., Liu, B.: The Utility of Linguistic Rules in Opinion Mining. In: 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 811–812. ACM, New York (2007)

    Google Scholar 

  6. Ding, X., Liu, B., Yu, P.S.: A holistic lexicon-based approach to opinion mining. In: International Conference on Web Search and Web Data Mining, pp. 231–240. ACM, New York (2008)

    Chapter  Google Scholar 

  7. Hu, M., Liu, B.: Mining opinion features in customer reviews. In: 19th National Conference on Artificial Intelligence, pp. 755–760. AAAI Press / The MIT Press (2004b)

    Google Scholar 

  8. Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177. ACM, New York (2004a)

    Chapter  Google Scholar 

  9. Jelinek, F., Merialdo, B., Roukos, S., Strauss, M.: A dynamic language model for speech recognition. In: Workshop on Speech and Natural Language, pp. 293–295. Association for Computational Linguistics, Morristown (1991)

    Google Scholar 

  10. Liu, B., Hu, M., Cheng, J.: Opinion Observer: Analyzing and Comparing Opinions on the Web. In: Proceedings of the 14th International Conference on World Wide Web, pp. 342–351. ACM, New York (2005)

    Chapter  Google Scholar 

  11. Pang, B., Lee, L.: A Sentimental Education: Sentiment Analysis using subjectivity summarization based on minimum cuts. In: 42nd Annual Meeting on Association for Computational Linguistics, pp. 271–278. Association for Computational Linguistics, Morristown (2004)

    Google Scholar 

  12. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment Classification using Machine Learning Techniques. In: ACL 2002 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics, Morristown (2002)

    Google Scholar 

  13. Potter, M.: February 1st-last update, European online retail sales up (2010), http://uk.reuters.com/article/2010/02/01/uk-europe-retail-online-idUKTRE61000G20100201 (March 1, 2011)

  14. Thelen, M., Riloff, E.: A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts. In: ACL 2002 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 214–221. Association for Computational Linguistics, Morristown (2002)

    Google Scholar 

  15. Turney, P.D.: Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In: Proceedings of the 40th Annual Meeting of the ACL, pp. 417–424. Association for Computational Linguistics, Morristown (2002)

    Google Scholar 

  16. Ye, Q., Zhang, Z., Law, R.: Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Expert Systems with Applications: An International Journal 36(3), 6527–6535 (2009)

    Article  Google Scholar 

  17. Yu, H., Hatzivassiloglou, V.: Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 129–136. Association for Computational Linguistics, Morristown (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Burns, N., Bi, Y., Wang, H., Anderson, T. (2011). Sentiment Analysis of Customer Reviews: Balanced versus Unbalanced Datasets. In: König, A., Dengel, A., Hinkelmann, K., Kise, K., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based and Intelligent Information and Engineering Systems. KES 2011. Lecture Notes in Computer Science(), vol 6881. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23851-2_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23851-2_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23850-5

  • Online ISBN: 978-3-642-23851-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics