Skip to main content

Normalization of Term Weighting Scheme for Sentiment Analysis

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8387))

Abstract

N-gram models with a binary (or tf-idf) weighting scheme and SVM classifiers are commonly used together as a baseline approach in lots of research studies on sentiment analysis and opinion mining. Other advanced methods are used on top of this model to improve the classification accuracy, such as generation of additional features or using supplementary linguistic resources. In this paper, we show how a simple technique can improve both the overall classification accuracy and the classification of minor reviews by normalizing the terms weights in the basic bag-of-words method. Any other term selection scheme may also benefit from this improved weighting scheme, if it is based on the n-gram model. We have tested our approach on the movie review and the product review datasets in English and show that our normalization technique enhances the classification accuracy of the traditional weighting schemes. The question whether we would observe similar performance increases for other language families is still to be investigated, but our weighting scheme can easily address any other language, since it does not use any language specific resource apart from a training corpus.

After his PhD. at LIMSI-CNRS in 2012, Alexander Pak has joined Google in Zürich (alexpak@google.com)

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The Internet Movie Database: http://imdb.com.

  2. 2.

    http://twitter.com

  3. 3.

    http://facebook.com

  4. 4.

    The set of documents expressing opinions about the same opinion target (the same subject), for example, all reviews about the AVATAR movie represent an Opinion Entity Document Set.

  5. 5.

    http://www.cs.cornell.edu/people/pabo/movie-review-data/

  6. 6.

    http://www.cs.jhu.edu/~mdredze/datasets/sentiment/

  7. 7.

    Computed over the Opinion Entity Document Set, i.e. the set of documents expressing opinions about the same opinion target, for example, all reviews about the AVATAR movie.

References

  1. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, EMNLP ’02, pp. 79–86. Association for Computational Linguistics, Morristown (2002)

    Google Scholar 

  2. Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, ACL ’04. Association for Computational Linguistics, Stroudsburg (2004)

    Google Scholar 

  3. Whitelaw, C., Garg, N., Argamon, S.: Using appraisal groups for sentiment analysis. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, CIKM ’05, pp. 625–631. ACM, New York (2005)

    Google Scholar 

  4. Matsumoto, S., Takamura, H., Okumura, M.: Sentiment classification using word sub-sequences and dependency sub-trees. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 301–311 (2005)

    Google Scholar 

  5. Martineau, J., Finin, T.: Delta TFIDF: an improved feature space for sentiment analysis. In: Proceedings of the Third AAAI Internatonal Conference on Weblogs and Social Media. AAAI Press, San Jose (2009)

    Google Scholar 

  6. Paltoglou, G., Thelwall, M.: A study of information retrieval weighting schemes for sentiment analysis. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL ’10, pp. 1386–1395. Association for Computational Linguistics, Morristown (2010)

    Google Scholar 

  7. Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 440–447. Association for Computational Linguistics, Prague (2007)

    Google Scholar 

  8. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)

    MATH  Google Scholar 

  9. Paroubek, P., Pak, A., Mostefa, D.: Annotations for opinion mining evaluation in the industrial context of the doxa project. In: Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC). ELDA, Valetta (2010)

    Google Scholar 

  10. Pak, A.: Automatic, Adaptive, and Applicative Sentiment Analysis. Ph.D. thesis, Thèse de l’École Doctorale d’Informatique de l’Université Paris-Sud, Orsay, June 2012

    Google Scholar 

  11. Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the ACL, pp. 142–150. ACL, Portland (2011)

    Google Scholar 

  12. Francopoulo, G., Demay, F.: A deep ontology for named entities. In: Proceedings of the International Conference on Computational Semantics, Interoperable Semantic Annotation Workshop, ACL (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick Paroubek .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Pak, A., Paroubek, P., Fraisse, A., Francopoulo, G. (2014). Normalization of Term Weighting Scheme for Sentiment Analysis. In: Vetulani, Z., Mariani, J. (eds) Human Language Technology Challenges for Computer Science and Linguistics. LTC 2011. Lecture Notes in Computer Science(), vol 8387. Springer, Cham. https://doi.org/10.1007/978-3-319-08958-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08958-4_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08957-7

  • Online ISBN: 978-3-319-08958-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics