ABSTRACT
This paper describes preliminary work on feature selection for classification of review text by both sentiment rating and topic. The premise stems from the notion that one size does not fit all; that feature sets for sentiment analysis should be tailored to the topic of a text. Thus it naturally follows that for this to be effective it is also necessary to first determine the topic of a text. Following successful work on classification of texts by author demographics, a corpus of review texts labelled with attributed rating, topic area, and user demographics has been compiled. This collection was divided for this work into different topic groups in order to automatically classify between both text topic and subjective rating. By using a single supervised statistical approach to feature selection, it is shown that improvements can be made to classification accuracy using topic tuned features sets over more generic features.
- C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.Google Scholar
- H. Chen. Sentiment and affect analysis of dark web forums: Measuring radicalization on the internet. In J. Hajic and Y. Matsumoto, editors, IEEE International Conference on Intelligence and Security Informatics, pages 104--109, Taipei, June 2008. IEEE.Google ScholarCross Ref
- K. T. Durant and M. D. Smith. Predicting the political sentiment of web log posts using supervised machine learning techniques coupled with feature selection. In Advances in Web Mining and Web Usage Analysis, pages 187--206. Springer, Berlin/Heidelberg, 2007. Google ScholarDigital Library
- D. Estival, T. Gaustad, B. Hutchinson, S. B. Pham, and W. Radford. Author profiling for english emails. In Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics (PACLING 2007), pages 263--272, Melbourne, Australia, 2007.Google Scholar
- M. Gamon and A. Aue. Automatic identification of sentiment vocabulary: exploiting low association with known sentiment terms. In Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing, pages 57--64, Ann Arbor, MI, 2005. Google ScholarDigital Library
- S.-M. Kim and E. Hovy. Determining the sentiment of opinions. In Proceedings of Coling 2004, pages 1367--1373, Geneva, Switzerland, Aug 23--Aug 27 2004. Google ScholarDigital Library
- M. Koppel and J. Schler. The importance of neutral examples in learning sentiment. Computational Intelligence, 22(2):100--109, 2006.Google ScholarCross Ref
- R. Malouf and T. Mullen. Taking sides: User classification for informal online political discourse. Internet Research, 18:177--190, 2008.Google ScholarCross Ref
- Q. Mei, X. Ling, M. Wondra, H. Su, and C. Zhai. Topic sentiment mixture: Modeling facets and opinions in weblogs. In Proceedings of WWW 2007, Banff, May 2007. Google ScholarDigital Library
- P. Melville, W. Gryc,, and R. Lawrence. Sentiment analysis of blogs by combining lexical knowledge with text classification. In Proceedings of the 15th Conference on Knowledge Discovery and Data Mining (KDD-09), Paris, June 2009. Google ScholarDigital Library
- R. Mihalcea, C. Banea, and J. Weibe. Learning multilingual subjective language via cross-lingual projections. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 976--983, Prague, June 2007.Google Scholar
- G. Mishne. Experiments with mood classification in blog posts. In Proceedings of ACM SIGIR 2005 Workshop on Stylistic Analysis of Text for Information Access, Salvador, Bahia, Brazil, August 2005.Google Scholar
- G. Mishne and N. Glance. Predicting movie sales from blogger sentiment. In AAAI 2006 Spring Symposium on Computational Approaches to Analysing Weblogs, Palo Alto, CA, 2006.Google Scholar
- S. Nowson. The Language of Weblogs: A study of genre and individual differences. PhD thesis, University of Edinburgh, 2006.Google Scholar
- S. Nowson and J. Oberlander. The identity of bloggers: Openness and gender in personal weblogs. AAAI Spring Symposium, Computational Approaches to Analysing Weblogs, Stanford University., 2006.Google Scholar
- S. Nowson and J. Oberlander. Identifying more bloggers. In Proceedings of the International Conference on Weblogs and Social Media, Boulder, CO, 2007.Google Scholar
- J. Oberlander and A. J. Gill. Language with character: A stratified corpus comparison of individual differences in e-mail communication. Discourse Processes, 42(3):239--270, 2006.Google ScholarCross Ref
- J. Oberlander and S. Nowson. Whose thumb is it anyway? Classifying author personality from weblog text. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics and 21st International Conference on Computational Linguistics, Sydney, Australia, 2006. Google ScholarDigital Library
- B. Pang and L. Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the ACL, pages 115--124, June 2005. Google ScholarDigital Library
- B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up?: sentiment classification using machine learning techniques. In EMNLP '02: Proceedings of the ACL-02 conference on Empirical methods in natural language processing, pages 79--86, Morristown, NJ, USA, 2002. Association for Computational Linguistics. Google ScholarDigital Library
- J. W. Pennebaker and L. King. Linguistic styles: Language use as an individual difference. Journal of Personality and Social Psychology, 77:1296--1312, 1999.Google ScholarCross Ref
- R. W. Picard. Affective Computing. MIT Press, Cambridge, Ma., 1997. Google ScholarDigital Library
- P. Rayson and R. Garside. Comparing corpora using frequency profiling. In J. Hajic and Y. Matsumoto, editors, Workshop on Comparing Corpora, held in conjunction with the 38th annual meeting of the Association for Computational Linguistics (ACL 2000), pages 1--6, Hong Kong, October 2000. Google ScholarDigital Library
- P. Rayson, G. Leech, and M. Hodges. Social differentiation in the use of english vocabulary: some analyses of the conversational component of the british national corpus. International Journal of Corpus Linguistics, 2(1):133--152, 1997.Google ScholarCross Ref
- J. Schler, M. Koppel, S. Argamon, and J. W. Pennebaker. Effects of age and gender on blogging. In Proceedings of 2006 AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs, 2006.Google Scholar
- S. Somasundaran, T. Wilson, J. Wiebe, and V. Stoyanov. Qa with attitude: Exploiting opinion type analysis for improving question answering in on-line discussions and the news. In Proceedings of the International Conference on Weblogs and Social Media, Boulder, CO, 2007.Google Scholar
- S. O. Sood and L. Vasserman. Esse: Exploring mood on the web. In Proceedings of International Conference on Weblogs and Social Media, Seattle, WA, May 2009.Google Scholar
- P. D. Turney. Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting of the ACL, pages 417--424, Philadelphia, July 2002. Google ScholarDigital Library
- J. Wiebe, T. Wilson, and C. Cardie. Annotating expressions of opinions and emotions in language. Language Resources and Evaluation, 39(2--3):165--210, 2005.Google Scholar
- T. Wilson, J. Wiebe, and P. Hoffmann. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pages 347--354, Vancouver, British Columbia, Canada, October 2005. Association for Computational Linguistics. Google ScholarDigital Library
Index Terms
- Scary films good, scary flights bad: topic driven feature selection for classification of sentiment
Recommendations
A comparative study of feature selection and machine learning techniques for sentiment analysis
RACS '12: Proceedings of the 2012 ACM Research in Applied Computation SymposiumSentiment analysis is performed to extract opinion and subjectivity knowledge from user generated text content. This is contextually different from traditional topic based text classification since it involves classifying opinionated text according to ...
Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums
The Internet is frequently used as a medium for exchange of information and opinions, as well as propaganda dissemination. In this study the use of sentiment analysis methodologies is proposed for classification of Web forum opinions in multiple ...
Comparison of feature selection methods for sentiment analysis
AI'10: Proceedings of the 23rd Canadian conference on Advances in Artificial IntelligenceSentiment analysis is a sub-field of Natural Language Processing and involves automatically classifying input text according to the sentiment expressed in it Sentiment analysis is similar to topical text classification but has a significant contextual ...
Comments