Normalization of Term Weighting Scheme for Sentiment Analysis

Pak, Alexander; Paroubek, Patrick; Fraisse, Amel; Francopoulo, Gil

doi:10.1007/978-3-319-08958-4_10

Normalization of Term Weighting Scheme for Sentiment Analysis

Alexander Pak⁶,
Patrick Paroubek⁶,
Amel Fraisse⁶ &
…
Gil Francopoulo⁷

Conference paper
First Online: 01 January 2014

900 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8387))

Abstract

N-gram models with a binary (or tf-idf) weighting scheme and SVM classifiers are commonly used together as a baseline approach in lots of research studies on sentiment analysis and opinion mining. Other advanced methods are used on top of this model to improve the classification accuracy, such as generation of additional features or using supplementary linguistic resources. In this paper, we show how a simple technique can improve both the overall classification accuracy and the classification of minor reviews by normalizing the terms weights in the basic bag-of-words method. Any other term selection scheme may also benefit from this improved weighting scheme, if it is based on the n-gram model. We have tested our approach on the movie review and the product review datasets in English and show that our normalization technique enhances the classification accuracy of the traditional weighting schemes. The question whether we would observe similar performance increases for other language families is still to be investigated, but our weighting scheme can easily address any other language, since it does not use any language specific resource apart from a training corpus.

After his PhD. at LIMSI-CNRS in 2012, Alexander Pak has joined Google in Zürich (alexpak@google.com)

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The Internet Movie Database: http://imdb.com.
2.
http://twitter.com
3.
http://facebook.com
4.
The set of documents expressing opinions about the same opinion target (the same subject), for example, all reviews about the AVATAR movie represent an Opinion Entity Document Set.
5.
http://www.cs.cornell.edu/people/pabo/movie-review-data/
6.
http://www.cs.jhu.edu/~mdredze/datasets/sentiment/
7.
Computed over the Opinion Entity Document Set, i.e. the set of documents expressing opinions about the same opinion target, for example, all reviews about the AVATAR movie.

References

Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, EMNLP ’02, pp. 79–86. Association for Computational Linguistics, Morristown (2002)
Google Scholar
Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, ACL ’04. Association for Computational Linguistics, Stroudsburg (2004)
Google Scholar
Whitelaw, C., Garg, N., Argamon, S.: Using appraisal groups for sentiment analysis. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, CIKM ’05, pp. 625–631. ACM, New York (2005)
Google Scholar
Matsumoto, S., Takamura, H., Okumura, M.: Sentiment classification using word sub-sequences and dependency sub-trees. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 301–311 (2005)
Google Scholar
Martineau, J., Finin, T.: Delta TFIDF: an improved feature space for sentiment analysis. In: Proceedings of the Third AAAI Internatonal Conference on Weblogs and Social Media. AAAI Press, San Jose (2009)
Google Scholar
Paltoglou, G., Thelwall, M.: A study of information retrieval weighting schemes for sentiment analysis. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL ’10, pp. 1386–1395. Association for Computational Linguistics, Morristown (2010)
Google Scholar
Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 440–447. Association for Computational Linguistics, Prague (2007)
Google Scholar
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
MATH Google Scholar
Paroubek, P., Pak, A., Mostefa, D.: Annotations for opinion mining evaluation in the industrial context of the doxa project. In: Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC). ELDA, Valetta (2010)
Google Scholar
Pak, A.: Automatic, Adaptive, and Applicative Sentiment Analysis. Ph.D. thesis, Thèse de l’École Doctorale d’Informatique de l’Université Paris-Sud, Orsay, June 2012
Google Scholar
Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the ACL, pp. 142–150. ACL, Portland (2011)
Google Scholar
Francopoulo, G., Demay, F.: A deep ontology for named entities. In: Proceedings of the International Conference on Computational Semantics, Interoperable Semantic Annotation Workshop, ACL (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

LIMSI-CNRS, Université Paris-Sud, 133, 91403, Orsay Cedex, France
Alexander Pak, Patrick Paroubek & Amel Fraisse
TAGMATICA, 126 rue de Picpus, 75012, Paris, France
Gil Francopoulo

Authors

Alexander Pak
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Paroubek
View author publications
You can also search for this author in PubMed Google Scholar
Amel Fraisse
View author publications
You can also search for this author in PubMed Google Scholar
Gil Francopoulo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Patrick Paroubek .

Editor information

Editors and Affiliations

Adam Mickiewicz University, Poznań, Poland
Zygmunt Vetulani
IMMI-CNRS, Orsay, France
Joseph Mariani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pak, A., Paroubek, P., Fraisse, A., Francopoulo, G. (2014). Normalization of Term Weighting Scheme for Sentiment Analysis. In: Vetulani, Z., Mariani, J. (eds) Human Language Technology Challenges for Computer Science and Linguistics. LTC 2011. Lecture Notes in Computer Science(), vol 8387. Springer, Cham. https://doi.org/10.1007/978-3-319-08958-4_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-08958-4_10
Published: 26 July 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08957-7
Online ISBN: 978-3-319-08958-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics