skip to main content
10.1145/2309996.2310028acmconferencesArticle/Chapter ViewAbstractPublication PageshtConference Proceedingsconference-collections
research-article

Content vs. context for sentiment analysis: a comparative analysis over microblogs

Published: 25 June 2012 Publication History

Abstract

Microblog content poses serious challenges to the applicability of traditional sentiment analysis and classification methods, due to its inherent characteristics. To tackle them, we introduce a method that relies on two orthogonal, but complementary sources of evidence: content-based features captured by n-gram graphs and context-based ones captured by polarity ratio. Both are language-neutral and noise-tolerant, guaranteeing high effectiveness and robustness in the settings we are considering. To ensure our approach can be integrated into practical applications with large volumes of data, we also aim at enhancing its time efficiency: we propose alternative sets of features with low extraction cost, explore dimensionality reduction and discretization techniques and experiment with multiple classification algorithms. We then evaluate our methods over a large, real-world data set extracted from Twitter, with the outcomes indicating significant improvements over the traditional techniques.

References

[1]
L. Barbosa and J. Feng. Robust sentiment detection on twitter from biased and noisy data. In COLING, pages 36--44, 2010.
[2]
J. Bollen, H. Mao, and X. Zeng. Twitter mood predicts the stock market. Journal of Computational Science, 2011.
[3]
J. Bollen, A. Pepe, and H. Mao. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In ICWSM, 2011.
[4]
N. Christakis and J. Fowler. Connected: The surprising power of our social networks and how they shape our lives. Little, Brown and Company, 2009.
[5]
D. Davidov, O. Tsur, and A. Rappoport. Enhanced sentiment learning using twitter hashtags and smileys. In COLING, 2010.
[6]
J. Eisenstein, B. O'Connor, N. A. Smith, and E. P. Xing. A latent variable model for geographic lexical variation. In EMNLP, pages 1277--1287, 2010.
[7]
H. Escalante, T. Solorio, and M. Montes-y Gómez. Local histograms of character n-grams for authorship attribution. In ACL, pages 288--298, 2011.
[8]
R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin. Liblinear: A library for large linear classification. JMLR, 9:1871--1874, 2008.
[9]
G. Giannakopoulos, V. Karkaletsis, G. A. Vouros, and P. Stamatopoulos. Summarization system evaluation revisited: N-gram graphs. TSLP, 5(3), 2008.
[10]
G. Giannakopoulos and T. Palpanas. Content and type as orthogonal modeling features. International Journal of Advances on Networks and Services, 3(2), 2010.
[11]
A. Go, R. Bhayani, and L. Huang. Twitter sentiment classification using distant supervision. Processing, pages 1--6, 2009.
[12]
M. Hall. Correlation-based feature selection for machine learning. PhD thesis, University of Waikato, 1999.
[13]
M. Hurst, M. Siegler, and N. Glance. On estimating the geographic distribution of social media. In ICWSM, 2007.
[14]
L. Jiang, M. Yu, M. Zhou, X. Liu, and T. Zhao. Target-dependent Twitter sentiment classification. In COLING, 2011.
[15]
I. Kanaris, K. Kanaris, I. Houvardas, and E. Stamatatos. Words versus character n-grams for anti-spam filtering. IJAIT, 16(6):1047, 2007.
[16]
L. Kurgan and K. Cios. Caim discretization algorithm. IEEE TKDE, pages 145--153, 2004.
[17]
H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or a news media? In WWW, 2010.
[18]
B. Liu. Web data mining. Springer, 2007.
[19]
C. Manning, P. Raghavan, and H. Schütze. Introduction to information retrieval. Cambridge University Press, 2008.
[20]
T. Nasukawa and J. Yi. Sentiment analysis: capturing favorability using natural language processing. In K-CAP, 2003.
[21]
B. O'Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith. From tweets to polls: Linking text sentiment to public opinion time series. In ICWSM, 2010.
[22]
A. Pak and P. Paroubek. Twitter as a corpus for sentiment analysis and opinion mining. In LREC, 2010.
[23]
B. Pang and L. Lee. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2008.
[24]
G. Shao. Understanding the appeal of user-generated media: a uses and gratification perspective. Internet Research, 2009.
[25]
G. Somprasertsri, P. Lalitrojwong, and P. Lalitrojwong. Mining feature-opinion in online customer reviews for opinion summarization. Journal of Univ. Comp. Science, 2010.
[26]
M. Speriosu, N. Sudan, S. Upadhyay, and J. Baldridge. Twitter polarity classification with label propagation over lexical links and the follower graph. In EMNLP, pages 53--63, 2011.
[27]
C. Tan, L. Lee, J. Tang, L. Jiang, M. Zhou, and P. Li. User-level sentiment analysis incorporating social networks. In KDD, 2011.
[28]
H. Tang, S. Tan, and X. Cheng. A survey on sentiment detection of reviews. Expert Systems with Applications, 2009.
[29]
K. Tsagkalidou, V. Koutsonikola, A. Vakali, and K. Kafetsios. Emotional aware clustering on micro-blogging sources. In ACII, 2011.
[30]
M. Tsytsarau and T. Palpanas. Survey on mining subjective data on the web. Data Mining and Knowledge Discovery Journal, 2011.
[31]
A. Tumasjan, T. O. Sprenger, P. G. Sandner, and I. M. Welpe. Predicting elections with twitter: What 140 characters reveal about political sentiment. In ICWSM, 2010.
[32]
J. Weng, E.-P. Lim, J. Jiang, and Q. He. Twitterrank: finding topic-sensitive influential twitterers. In WSDM, 2010.
[33]
T. Wilson and S. Raaijmakers. Comparing word, character, and phoneme n-grams for subjective utterance recognition. In INTERSPEECH, 2008.
[34]
I. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2005.
[35]
J. Yang and J. Leskovec. Patterns of temporal variation in online media. In WSDM, pages 177--186, 2011.

Cited By

View all
  • (2021)Aspect-Based Sentiment Analysis for User ReviewsCognitive Computation10.1007/s12559-021-09855-413:5(1114-1127)Online publication date: 13-Jul-2021
  • (2021)Over a decade of social opinion mining: a systematic reviewArtificial Intelligence Review10.1007/s10462-021-10030-254:7(4873-4965)Online publication date: 1-Oct-2021
  • (2020)Four Types of Toxic People: Characterizing Online Users’ Toxicity over TimeProceedings of the 11th Nordic Conference on Human-Computer Interaction: Shaping Experiences, Shaping Society10.1145/3419249.3420142(1-11)Online publication date: 25-Oct-2020
  • Show More Cited By

Index Terms

  1. Content vs. context for sentiment analysis: a comparative analysis over microblogs

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      HT '12: Proceedings of the 23rd ACM conference on Hypertext and social media
      June 2012
      340 pages
      ISBN:9781450313353
      DOI:10.1145/2309996
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 25 June 2012

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. n-gram graphs
      2. sentiment analysis
      3. social context

      Qualifiers

      • Research-article

      Conference

      HT '12
      Sponsor:
      HT '12: 23rd ACM Conference on Hypertext and Social Media
      June 25 - 28, 2012
      Wisconsin, Milwaukee, USA

      Acceptance Rates

      HT '12 Paper Acceptance Rate 33 of 120 submissions, 28%;
      Overall Acceptance Rate 378 of 1,158 submissions, 33%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)34
      • Downloads (Last 6 weeks)5
      Reflects downloads up to 01 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)Aspect-Based Sentiment Analysis for User ReviewsCognitive Computation10.1007/s12559-021-09855-413:5(1114-1127)Online publication date: 13-Jul-2021
      • (2021)Over a decade of social opinion mining: a systematic reviewArtificial Intelligence Review10.1007/s10462-021-10030-254:7(4873-4965)Online publication date: 1-Oct-2021
      • (2020)Four Types of Toxic People: Characterizing Online Users’ Toxicity over TimeProceedings of the 11th Nordic Conference on Human-Computer Interaction: Shaping Experiences, Shaping Society10.1145/3419249.3420142(1-11)Online publication date: 25-Oct-2020
      • (2020)Characteristics of Similar-Context Trending Hashtags in Twitter: A Case StudyWeb Services – ICWS 202010.1007/978-3-030-59618-7_10(150-163)Online publication date: 19-Sep-2020
      • (2020)How do Mainland Chinese tourists perceive Hong Kong in turbulence? A deep learning approach to sentiment analyticsInternational Journal of Tourism Research10.1002/jtr.241923:4(478-490)Online publication date: 21-Oct-2020
      • (2019)Emoji Prediction for Hebrew Political DomainCompanion Proceedings of The 2019 World Wide Web Conference10.1145/3308560.3316548(468-477)Online publication date: 13-May-2019
      • (2019)GeoSensorProceedings of the 34th ACM/SIGAPP Symposium on Applied Computing10.1145/3297280.3297504(2259-2266)Online publication date: 8-Apr-2019
      • (2019)Leveraging Social Media Linguistic Features for Bilingual Microblog Sentiment Classification2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA)10.1109/IISA.2019.8900674(1-8)Online publication date: Jul-2019
      • (2019)Systematic literature review on context-based sentiment analysis in social multimediaMultimedia Tools and Applications10.1007/s11042-019-7346-5Online publication date: 23-Feb-2019
      • (2018)Text Classification Using the N-Gram Graph Representation Model Over High Frequency Data StreamsFrontiers in Applied Mathematics and Statistics10.3389/fams.2018.000414Online publication date: 11-Sep-2018
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media