research-article

Scary films good, scary flights bad: topic driven feature selection for classification of sentiment

Author:
Scott Nowson

Appen Pty. Ltd., Sydney, Australia

Appen Pty. Ltd., Sydney, Australia
View Profile

TSA '09: Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinionNovember 2009Pages 17–24https://doi.org/10.1145/1651461.1651465

Published:06 November 2009Publication History

TSA '09: Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion

Pages 17–24

ABSTRACT

This paper describes preliminary work on feature selection for classification of review text by both sentiment rating and topic. The premise stems from the notion that one size does not fit all; that feature sets for sentiment analysis should be tailored to the topic of a text. Thus it naturally follows that for this to be effective it is also necessary to first determine the topic of a text. Following successful work on classification of texts by author demographics, a corpus of review texts labelled with attributed rating, topic area, and user demographics has been compiled. This collection was divided for this work into different topic groups in order to automatically classify between both text topic and subjective rating. By using a single supervised statistical approach to feature selection, it is shown that improvements can be made to classification accuracy using topic tuned features sets over more generic features.

References

C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.Google Scholar
H. Chen. Sentiment and affect analysis of dark web forums: Measuring radicalization on the internet. In J. Hajic and Y. Matsumoto, editors, IEEE International Conference on Intelligence and Security Informatics, pages 104--109, Taipei, June 2008. IEEE.Google ScholarCross Ref
K. T. Durant and M. D. Smith. Predicting the political sentiment of web log posts using supervised machine learning techniques coupled with feature selection. In Advances in Web Mining and Web Usage Analysis, pages 187--206. Springer, Berlin/Heidelberg, 2007. Google ScholarDigital Library
D. Estival, T. Gaustad, B. Hutchinson, S. B. Pham, and W. Radford. Author profiling for english emails. In Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics (PACLING 2007), pages 263--272, Melbourne, Australia, 2007.Google Scholar
M. Gamon and A. Aue. Automatic identification of sentiment vocabulary: exploiting low association with known sentiment terms. In Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing, pages 57--64, Ann Arbor, MI, 2005. Google ScholarDigital Library
S.-M. Kim and E. Hovy. Determining the sentiment of opinions. In Proceedings of Coling 2004, pages 1367--1373, Geneva, Switzerland, Aug 23--Aug 27 2004. Google ScholarDigital Library
M. Koppel and J. Schler. The importance of neutral examples in learning sentiment. Computational Intelligence, 22(2):100--109, 2006.Google ScholarCross Ref
R. Malouf and T. Mullen. Taking sides: User classification for informal online political discourse. Internet Research, 18:177--190, 2008.Google ScholarCross Ref
Q. Mei, X. Ling, M. Wondra, H. Su, and C. Zhai. Topic sentiment mixture: Modeling facets and opinions in weblogs. In Proceedings of WWW 2007, Banff, May 2007. Google ScholarDigital Library
P. Melville, W. Gryc,, and R. Lawrence. Sentiment analysis of blogs by combining lexical knowledge with text classification. In Proceedings of the 15th Conference on Knowledge Discovery and Data Mining (KDD-09), Paris, June 2009. Google ScholarDigital Library
R. Mihalcea, C. Banea, and J. Weibe. Learning multilingual subjective language via cross-lingual projections. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 976--983, Prague, June 2007.Google Scholar
G. Mishne. Experiments with mood classification in blog posts. In Proceedings of ACM SIGIR 2005 Workshop on Stylistic Analysis of Text for Information Access, Salvador, Bahia, Brazil, August 2005.Google Scholar
G. Mishne and N. Glance. Predicting movie sales from blogger sentiment. In AAAI 2006 Spring Symposium on Computational Approaches to Analysing Weblogs, Palo Alto, CA, 2006.Google Scholar
S. Nowson. The Language of Weblogs: A study of genre and individual differences. PhD thesis, University of Edinburgh, 2006.Google Scholar
S. Nowson and J. Oberlander. The identity of bloggers: Openness and gender in personal weblogs. AAAI Spring Symposium, Computational Approaches to Analysing Weblogs, Stanford University., 2006.Google Scholar
S. Nowson and J. Oberlander. Identifying more bloggers. In Proceedings of the International Conference on Weblogs and Social Media, Boulder, CO, 2007.Google Scholar
J. Oberlander and A. J. Gill. Language with character: A stratified corpus comparison of individual differences in e-mail communication. Discourse Processes, 42(3):239--270, 2006.Google ScholarCross Ref
J. Oberlander and S. Nowson. Whose thumb is it anyway? Classifying author personality from weblog text. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics and 21st International Conference on Computational Linguistics, Sydney, Australia, 2006. Google ScholarDigital Library
B. Pang and L. Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the ACL, pages 115--124, June 2005. Google ScholarDigital Library
B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up?: sentiment classification using machine learning techniques. In EMNLP '02: Proceedings of the ACL-02 conference on Empirical methods in natural language processing, pages 79--86, Morristown, NJ, USA, 2002. Association for Computational Linguistics. Google ScholarDigital Library
J. W. Pennebaker and L. King. Linguistic styles: Language use as an individual difference. Journal of Personality and Social Psychology, 77:1296--1312, 1999.Google ScholarCross Ref
R. W. Picard. Affective Computing. MIT Press, Cambridge, Ma., 1997. Google ScholarDigital Library
P. Rayson and R. Garside. Comparing corpora using frequency profiling. In J. Hajic and Y. Matsumoto, editors, Workshop on Comparing Corpora, held in conjunction with the 38th annual meeting of the Association for Computational Linguistics (ACL 2000), pages 1--6, Hong Kong, October 2000. Google ScholarDigital Library
P. Rayson, G. Leech, and M. Hodges. Social differentiation in the use of english vocabulary: some analyses of the conversational component of the british national corpus. International Journal of Corpus Linguistics, 2(1):133--152, 1997.Google ScholarCross Ref
J. Schler, M. Koppel, S. Argamon, and J. W. Pennebaker. Effects of age and gender on blogging. In Proceedings of 2006 AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs, 2006.Google Scholar
S. Somasundaran, T. Wilson, J. Wiebe, and V. Stoyanov. Qa with attitude: Exploiting opinion type analysis for improving question answering in on-line discussions and the news. In Proceedings of the International Conference on Weblogs and Social Media, Boulder, CO, 2007.Google Scholar
S. O. Sood and L. Vasserman. Esse: Exploring mood on the web. In Proceedings of International Conference on Weblogs and Social Media, Seattle, WA, May 2009.Google Scholar
P. D. Turney. Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting of the ACL, pages 417--424, Philadelphia, July 2002. Google ScholarDigital Library
J. Wiebe, T. Wilson, and C. Cardie. Annotating expressions of opinions and emotions in language. Language Resources and Evaluation, 39(2--3):165--210, 2005.Google Scholar
T. Wilson, J. Wiebe, and P. Hoffmann. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pages 347--354, Vancouver, British Columbia, Canada, October 2005. Association for Computational Linguistics. Google ScholarDigital Library

Index Terms

Scary films good, scary flights bad: topic driven feature selection for classification of sentiment
1. Applied computing
  1. Document management and text processing
  2. Law, social and behavioral sciences
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches
      1. Classification and regression trees

Index terms have been assigned to the content through auto-classification.

Recommendations

A comparative study of feature selection and machine learning techniques for sentiment analysis
RACS '12: Proceedings of the 2012 ACM Research in Applied Computation Symposium

Sentiment analysis is performed to extract opinion and subjectivity knowledge from user generated text content. This is contextually different from traditional topic based text classification since it involves classifying opinionated text according to ...
Read More
Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums

The Internet is frequently used as a medium for exchange of information and opinions, as well as propaganda dissemination. In this study the use of sentiment analysis methodologies is proposed for classification of Web forum opinions in multiple ...
Read More
Comparison of feature selection methods for sentiment analysis
AI'10: Proceedings of the 23rd Canadian conference on Advances in Artificial Intelligence

Sentiment analysis is a sub-field of Natural Language Processing and involves automatically classifying input text according to the sentiment expressed in it Sentiment analysis is similar to topical text classification but has a significant contextual ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
TSA '09: Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion
November 2009
94 pages
ISBN:9781605588056
DOI:10.1145/1651461
General Chairs:
Maojin Jiang
Illinois Institute of Technology, USA
,
Bei Yu
Syracuse University, USA
,
Program Chair:
Bei Yu
Syracuse University, USA
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 November 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
feature selection
sentiment analysis
text classification
topic
Qualifiers
- research-article
Conference
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 337
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Scary films good, scary flights bad: topic driven feature selection for classification of sentiment

TSA '09: Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion

ABSTRACT

References

Cited By

Index Terms

Recommendations

A comparative study of feature selection and machine learning techniques for sentiment analysis

Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums

Comparison of feature selection methods for sentiment analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Scary films good, scary flights bad: topic driven feature selection for classification of sentiment

TSA '09: Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion

ABSTRACT

References

Cited By

Index Terms

Recommendations

A comparative study of feature selection and machine learning techniques for sentiment analysis

Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums

Comparison of feature selection methods for sentiment analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media