research-article

Content vs. context for sentiment analysis: a comparative analysis over microblogs

Authors:

George Papadakis,

Konstantinos Tserpes,

Theodora VarvarigouAuthors Info & Claims

HT '12: Proceedings of the 23rd ACM conference on Hypertext and social media

Pages 187 - 196

https://doi.org/10.1145/2309996.2310028

Published: 25 June 2012 Publication History

Abstract

Microblog content poses serious challenges to the applicability of traditional sentiment analysis and classification methods, due to its inherent characteristics. To tackle them, we introduce a method that relies on two orthogonal, but complementary sources of evidence: content-based features captured by n-gram graphs and context-based ones captured by polarity ratio. Both are language-neutral and noise-tolerant, guaranteeing high effectiveness and robustness in the settings we are considering. To ensure our approach can be integrated into practical applications with large volumes of data, we also aim at enhancing its time efficiency: we propose alternative sets of features with low extraction cost, explore dimensionality reduction and discretization techniques and experiment with multiple classification algorithms. We then evaluate our methods over a large, real-world data set extracted from Twitter, with the outcomes indicating significant improvements over the traditional techniques.

References

[1]

L. Barbosa and J. Feng. Robust sentiment detection on twitter from biased and noisy data. In COLING, pages 36--44, 2010.

Digital Library

[2]

J. Bollen, H. Mao, and X. Zeng. Twitter mood predicts the stock market. Journal of Computational Science, 2011.

[3]

J. Bollen, A. Pepe, and H. Mao. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In ICWSM, 2011.

[4]

N. Christakis and J. Fowler. Connected: The surprising power of our social networks and how they shape our lives. Little, Brown and Company, 2009.

[5]

D. Davidov, O. Tsur, and A. Rappoport. Enhanced sentiment learning using twitter hashtags and smileys. In COLING, 2010.

Digital Library

[6]

J. Eisenstein, B. O'Connor, N. A. Smith, and E. P. Xing. A latent variable model for geographic lexical variation. In EMNLP, pages 1277--1287, 2010.

Digital Library

[7]

H. Escalante, T. Solorio, and M. Montes-y Gómez. Local histograms of character n-grams for authorship attribution. In ACL, pages 288--298, 2011.

Digital Library

[8]

R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin. Liblinear: A library for large linear classification. JMLR, 9:1871--1874, 2008.

Digital Library

[9]

G. Giannakopoulos, V. Karkaletsis, G. A. Vouros, and P. Stamatopoulos. Summarization system evaluation revisited: N-gram graphs. TSLP, 5(3), 2008.

Digital Library

[10]

G. Giannakopoulos and T. Palpanas. Content and type as orthogonal modeling features. International Journal of Advances on Networks and Services, 3(2), 2010.

[11]

A. Go, R. Bhayani, and L. Huang. Twitter sentiment classification using distant supervision. Processing, pages 1--6, 2009.

[12]

M. Hall. Correlation-based feature selection for machine learning. PhD thesis, University of Waikato, 1999.

[13]

M. Hurst, M. Siegler, and N. Glance. On estimating the geographic distribution of social media. In ICWSM, 2007.

[14]

L. Jiang, M. Yu, M. Zhou, X. Liu, and T. Zhao. Target-dependent Twitter sentiment classification. In COLING, 2011.

Digital Library

[15]

I. Kanaris, K. Kanaris, I. Houvardas, and E. Stamatatos. Words versus character n-grams for anti-spam filtering. IJAIT, 16(6):1047, 2007.

[16]

L. Kurgan and K. Cios. Caim discretization algorithm. IEEE TKDE, pages 145--153, 2004.

Digital Library

[17]

H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or a news media? In WWW, 2010.

Digital Library

[18]

B. Liu. Web data mining. Springer, 2007.

[19]

C. Manning, P. Raghavan, and H. Schütze. Introduction to information retrieval. Cambridge University Press, 2008.

Digital Library

[20]

T. Nasukawa and J. Yi. Sentiment analysis: capturing favorability using natural language processing. In K-CAP, 2003.

Digital Library

[21]

B. O'Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith. From tweets to polls: Linking text sentiment to public opinion time series. In ICWSM, 2010.

[22]

A. Pak and P. Paroubek. Twitter as a corpus for sentiment analysis and opinion mining. In LREC, 2010.

[23]

B. Pang and L. Lee. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2008.

Digital Library

[24]

G. Shao. Understanding the appeal of user-generated media: a uses and gratification perspective. Internet Research, 2009.

[25]

G. Somprasertsri, P. Lalitrojwong, and P. Lalitrojwong. Mining feature-opinion in online customer reviews for opinion summarization. Journal of Univ. Comp. Science, 2010.

[26]

M. Speriosu, N. Sudan, S. Upadhyay, and J. Baldridge. Twitter polarity classification with label propagation over lexical links and the follower graph. In EMNLP, pages 53--63, 2011.

Digital Library

[27]

C. Tan, L. Lee, J. Tang, L. Jiang, M. Zhou, and P. Li. User-level sentiment analysis incorporating social networks. In KDD, 2011.

Digital Library

[28]

H. Tang, S. Tan, and X. Cheng. A survey on sentiment detection of reviews. Expert Systems with Applications, 2009.

Digital Library

[29]

K. Tsagkalidou, V. Koutsonikola, A. Vakali, and K. Kafetsios. Emotional aware clustering on micro-blogging sources. In ACII, 2011.

Digital Library

[30]

M. Tsytsarau and T. Palpanas. Survey on mining subjective data on the web. Data Mining and Knowledge Discovery Journal, 2011.

Digital Library

[31]

A. Tumasjan, T. O. Sprenger, P. G. Sandner, and I. M. Welpe. Predicting elections with twitter: What 140 characters reveal about political sentiment. In ICWSM, 2010.

[32]

J. Weng, E.-P. Lim, J. Jiang, and Q. He. Twitterrank: finding topic-sensitive influential twitterers. In WSDM, 2010.

Digital Library

[33]

T. Wilson and S. Raaijmakers. Comparing word, character, and phoneme n-grams for subjective utterance recognition. In INTERSPEECH, 2008.

[34]

I. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2005.

Digital Library

[35]

J. Yang and J. Leskovec. Patterns of temporal variation in online media. In WSDM, pages 177--186, 2011.

Digital Library

Cited By

Zhang YDu JMa XWen HFortino G(2021)Aspect-Based Sentiment Analysis for User ReviewsCognitive Computation10.1007/s12559-021-09855-413:5(1114-1127)Online publication date: 13-Jul-2021
https://doi.org/10.1007/s12559-021-09855-4
Cortis KDavis B(2021)Over a decade of social opinion mining: a systematic reviewArtificial Intelligence Review10.1007/s10462-021-10030-254:7(4873-4965)Online publication date: 1-Oct-2021
https://dl.acm.org/doi/10.1007/s10462-021-10030-2
Mall RNagpal MSalminen JAlmerekhi HJung SJansen BLamas DSarapuu HŠmorgun IBerget G(2020)Four Types of Toxic People: Characterizing Online Users’ Toxicity over TimeProceedings of the 11th Nordic Conference on Human-Computer Interaction: Shaping Experiences, Shaping Society10.1145/3419249.3420142(1-11)Online publication date: 25-Oct-2020
https://dl.acm.org/doi/10.1145/3419249.3420142
Show More Cited By

Index Terms

Content vs. context for sentiment analysis: a comparative analysis over microblogs
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Document filtering
      2. Information extraction

Recommendations

Social context in sentiment analysis: Formal definition, overview of current trends and framework for comparison
Highlights
- We propose a definition of social context for sentiment analysis
- We provide a ...
Abstract
Sentiment analysis in social media is harder than in other types of text due to limitations such as abbreviations, jargon, and references to existing content or concepts. Nevertheless, social media provides more information beyond text,...
Textual and contextual patterns for sentiment analysis over microblogs
WWW '12 Companion: Proceedings of the 21st International Conference on World Wide Web

Microblog content poses serious challenges to the applicability of sentiment analysis, due to its inherent characteristics. We introduce a novel method relying on content-based and context-based features, guaranteeing high effectiveness and robustness ...
Sentence compression for aspect-based sentiment analysis

Sentiment analysis, which addresses the computational treatment of opinion, sentiment, and subjectivity in text, has received considerable attention in recent years. In contrast to the traditional coarse-grained sentiment analysis tasks, such as document-...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

HT '12: Proceedings of the 23rd ACM conference on Hypertext and social media

June 2012

340 pages

ISBN:9781450313353

DOI:10.1145/2309996

General Chair:
Ethan Munson
University of Wisconsin - Milwaukee, USA
,
Program Chair:
Markus Strohmaier
Graz University of Technology, Austria

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

HT '12

Sponsor:

SIGWEB

HT '12: 23rd ACM Conference on Hypertext and Social Media

June 25 - 28, 2012

Wisconsin, Milwaukee, USA

Acceptance Rates

HT '12 Paper Acceptance Rate 33 of 120 submissions, 28%;

Overall Acceptance Rate 378 of 1,158 submissions, 33%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

32
Total Citations
View Citations
886
Total Downloads

Downloads (Last 12 months)34
Downloads (Last 6 weeks)5

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang YDu JMa XWen HFortino G(2021)Aspect-Based Sentiment Analysis for User ReviewsCognitive Computation10.1007/s12559-021-09855-413:5(1114-1127)Online publication date: 13-Jul-2021
https://doi.org/10.1007/s12559-021-09855-4
Cortis KDavis B(2021)Over a decade of social opinion mining: a systematic reviewArtificial Intelligence Review10.1007/s10462-021-10030-254:7(4873-4965)Online publication date: 1-Oct-2021
https://dl.acm.org/doi/10.1007/s10462-021-10030-2
Mall RNagpal MSalminen JAlmerekhi HJung SJansen BLamas DSarapuu HŠmorgun IBerget G(2020)Four Types of Toxic People: Characterizing Online Users’ Toxicity over TimeProceedings of the 11th Nordic Conference on Human-Computer Interaction: Shaping Experiences, Shaping Society10.1145/3419249.3420142(1-11)Online publication date: 25-Oct-2020
https://dl.acm.org/doi/10.1145/3419249.3420142
Alothali EHayawi KAlashwal H(2020)Characteristics of Similar-Context Trending Hashtags in Twitter: A Case StudyWeb Services – ICWS 202010.1007/978-3-030-59618-7_10(150-163)Online publication date: 19-Sep-2020
https://doi.org/10.1007/978-3-030-59618-7_10
Hao JWang RLaw RYu Y(2020)How do Mainland Chinese tourists perceive Hong Kong in turbulence? A deep learning approach to sentiment analyticsInternational Journal of Tourism Research10.1002/jtr.241923:4(478-490)Online publication date: 21-Oct-2020
https://doi.org/10.1002/jtr.2419
Liebeskind CLiebeskind S(2019)Emoji Prediction for Hebrew Political DomainCompanion Proceedings of The 2019 World Wide Web Conference10.1145/3308560.3316548(468-477)Online publication date: 13-May-2019
https://dl.acm.org/doi/10.1145/3308560.3316548
Pittaras NPapadakis GStamoulis GArgyriou GTaniskidou EThanos EGiannakopoulos GTsekouras LKoubarakis MHung CPapadopoulos G(2019)GeoSensorProceedings of the 34th ACM/SIGAPP Symposium on Applied Computing10.1145/3297280.3297504(2259-2266)Online publication date: 8-Apr-2019
https://dl.acm.org/doi/10.1145/3297280.3297504
Tsamis KKomninos AGarofalakis J(2019)Leveraging Social Media Linguistic Features for Bilingual Microblog Sentiment Classification2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA)10.1109/IISA.2019.8900674(1-8)Online publication date: Jul-2019
https://doi.org/10.1109/IISA.2019.8900674
Kumar AGarg G(2019)Systematic literature review on context-based sentiment analysis in social multimediaMultimedia Tools and Applications10.1007/s11042-019-7346-5Online publication date: 23-Feb-2019
https://doi.org/10.1007/s11042-019-7346-5
Violos JTserpes KVarlamis IVarvarigou T(2018)Text Classification Using the N-Gram Graph Representation Model Over High Frequency Data StreamsFrontiers in Applied Mathematics and Statistics10.3389/fams.2018.000414Online publication date: 11-Sep-2018
https://doi.org/10.3389/fams.2018.00041
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten