skip to main content
10.1145/1651461.1651465acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Scary films good, scary flights bad: topic driven feature selection for classification of sentiment

Published:06 November 2009Publication History

ABSTRACT

This paper describes preliminary work on feature selection for classification of review text by both sentiment rating and topic. The premise stems from the notion that one size does not fit all; that feature sets for sentiment analysis should be tailored to the topic of a text. Thus it naturally follows that for this to be effective it is also necessary to first determine the topic of a text. Following successful work on classification of texts by author demographics, a corpus of review texts labelled with attributed rating, topic area, and user demographics has been compiled. This collection was divided for this work into different topic groups in order to automatically classify between both text topic and subjective rating. By using a single supervised statistical approach to feature selection, it is shown that improvements can be made to classification accuracy using topic tuned features sets over more generic features.

References

  1. C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.Google ScholarGoogle Scholar
  2. H. Chen. Sentiment and affect analysis of dark web forums: Measuring radicalization on the internet. In J. Hajic and Y. Matsumoto, editors, IEEE International Conference on Intelligence and Security Informatics, pages 104--109, Taipei, June 2008. IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  3. K. T. Durant and M. D. Smith. Predicting the political sentiment of web log posts using supervised machine learning techniques coupled with feature selection. In Advances in Web Mining and Web Usage Analysis, pages 187--206. Springer, Berlin/Heidelberg, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Estival, T. Gaustad, B. Hutchinson, S. B. Pham, and W. Radford. Author profiling for english emails. In Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics (PACLING 2007), pages 263--272, Melbourne, Australia, 2007.Google ScholarGoogle Scholar
  5. M. Gamon and A. Aue. Automatic identification of sentiment vocabulary: exploiting low association with known sentiment terms. In Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing, pages 57--64, Ann Arbor, MI, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S.-M. Kim and E. Hovy. Determining the sentiment of opinions. In Proceedings of Coling 2004, pages 1367--1373, Geneva, Switzerland, Aug 23--Aug 27 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Koppel and J. Schler. The importance of neutral examples in learning sentiment. Computational Intelligence, 22(2):100--109, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  8. R. Malouf and T. Mullen. Taking sides: User classification for informal online political discourse. Internet Research, 18:177--190, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  9. Q. Mei, X. Ling, M. Wondra, H. Su, and C. Zhai. Topic sentiment mixture: Modeling facets and opinions in weblogs. In Proceedings of WWW 2007, Banff, May 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Melville, W. Gryc,, and R. Lawrence. Sentiment analysis of blogs by combining lexical knowledge with text classification. In Proceedings of the 15th Conference on Knowledge Discovery and Data Mining (KDD-09), Paris, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Mihalcea, C. Banea, and J. Weibe. Learning multilingual subjective language via cross-lingual projections. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 976--983, Prague, June 2007.Google ScholarGoogle Scholar
  12. G. Mishne. Experiments with mood classification in blog posts. In Proceedings of ACM SIGIR 2005 Workshop on Stylistic Analysis of Text for Information Access, Salvador, Bahia, Brazil, August 2005.Google ScholarGoogle Scholar
  13. G. Mishne and N. Glance. Predicting movie sales from blogger sentiment. In AAAI 2006 Spring Symposium on Computational Approaches to Analysing Weblogs, Palo Alto, CA, 2006.Google ScholarGoogle Scholar
  14. S. Nowson. The Language of Weblogs: A study of genre and individual differences. PhD thesis, University of Edinburgh, 2006.Google ScholarGoogle Scholar
  15. S. Nowson and J. Oberlander. The identity of bloggers: Openness and gender in personal weblogs. AAAI Spring Symposium, Computational Approaches to Analysing Weblogs, Stanford University., 2006.Google ScholarGoogle Scholar
  16. S. Nowson and J. Oberlander. Identifying more bloggers. In Proceedings of the International Conference on Weblogs and Social Media, Boulder, CO, 2007.Google ScholarGoogle Scholar
  17. J. Oberlander and A. J. Gill. Language with character: A stratified corpus comparison of individual differences in e-mail communication. Discourse Processes, 42(3):239--270, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  18. J. Oberlander and S. Nowson. Whose thumb is it anyway? Classifying author personality from weblog text. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics and 21st International Conference on Computational Linguistics, Sydney, Australia, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Pang and L. Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the ACL, pages 115--124, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up?: sentiment classification using machine learning techniques. In EMNLP '02: Proceedings of the ACL-02 conference on Empirical methods in natural language processing, pages 79--86, Morristown, NJ, USA, 2002. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. W. Pennebaker and L. King. Linguistic styles: Language use as an individual difference. Journal of Personality and Social Psychology, 77:1296--1312, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  22. R. W. Picard. Affective Computing. MIT Press, Cambridge, Ma., 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Rayson and R. Garside. Comparing corpora using frequency profiling. In J. Hajic and Y. Matsumoto, editors, Workshop on Comparing Corpora, held in conjunction with the 38th annual meeting of the Association for Computational Linguistics (ACL 2000), pages 1--6, Hong Kong, October 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. Rayson, G. Leech, and M. Hodges. Social differentiation in the use of english vocabulary: some analyses of the conversational component of the british national corpus. International Journal of Corpus Linguistics, 2(1):133--152, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  25. J. Schler, M. Koppel, S. Argamon, and J. W. Pennebaker. Effects of age and gender on blogging. In Proceedings of 2006 AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs, 2006.Google ScholarGoogle Scholar
  26. S. Somasundaran, T. Wilson, J. Wiebe, and V. Stoyanov. Qa with attitude: Exploiting opinion type analysis for improving question answering in on-line discussions and the news. In Proceedings of the International Conference on Weblogs and Social Media, Boulder, CO, 2007.Google ScholarGoogle Scholar
  27. S. O. Sood and L. Vasserman. Esse: Exploring mood on the web. In Proceedings of International Conference on Weblogs and Social Media, Seattle, WA, May 2009.Google ScholarGoogle Scholar
  28. P. D. Turney. Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting of the ACL, pages 417--424, Philadelphia, July 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Wiebe, T. Wilson, and C. Cardie. Annotating expressions of opinions and emotions in language. Language Resources and Evaluation, 39(2--3):165--210, 2005.Google ScholarGoogle Scholar
  30. T. Wilson, J. Wiebe, and P. Hoffmann. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pages 347--354, Vancouver, British Columbia, Canada, October 2005. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Scary films good, scary flights bad: topic driven feature selection for classification of sentiment
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              TSA '09: Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion
              November 2009
              94 pages
              ISBN:9781605588056
              DOI:10.1145/1651461
              • General Chairs:
              • Maojin Jiang,
              • Bei Yu,
              • Program Chair:
              • Bei Yu

              Copyright © 2009 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 6 November 2009

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Upcoming Conference

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader