research-article

Adaptive co-training SVM for sentiment classification on tweets

Authors:
Shenghua Liu

Institute of Computing Technology, CAS, Beijing, China

Institute of Computing Technology, CAS, Beijing, China
View Profile

,
Fuxin Li

Institute of Computing Technology, CAS, Beijing, China

Institute of Computing Technology, CAS, Beijing, China
View Profile

,
Fangtao Li

Google Inc., Mountain View, CA, USA

Google Inc., Mountain View, CA, USA
View Profile

,
Xueqi Cheng

Institute of Computing Technology, CAS, Beijing, China

Institute of Computing Technology, CAS, Beijing, China
View Profile

,
Huawei Shen

Institute of Computing Technology, CAS, Beijing, China

Institute of Computing Technology, CAS, Beijing, China
View Profile

CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge ManagementOctober 2013Pages 2079–2088https://doi.org/10.1145/2505515.2505569

Published:27 October 2013Publication History

CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management

Pages 2079–2088

ABSTRACT

Sentiment classification is an important problem in tweets mining. There lack labeled data and rating mechanism for generating them in Twitter service. And topics in Twitter are more diverse while sentiment classifiers always dedicate themselves to a specific domain or topic. Thus it is a challenge to make sentiment classification adaptive to diverse topics without sufficient labeled data. Therefore we formally propose an adaptive multiclass SVM model which transfers an initial common sentiment classifier to a topic-adaptive one. To tackle the tweet sparsity, non-text features are explored besides the conventional text features, which are intuitively split into two views. An iterative algorithm is proposed for solving this model by alternating among three steps: optimization, unlabeled data selection and adaptive feature expansion steps. The algorithm alternatively minimizes the margins of two independent objectives on different views to learn coefficient matrices, which are collaboratively used for unlabeled tweets selection from the topic that the algorithm is adapting to. And then topic-adaptive sentiment words are expended based on the above selection, in turn to help the first two steps find more confident and unlabeled tweets and boost the final performance. Comparing with the well-known supervised sentiment classifiers and semi-supervised approaches, our algorithm achieves promising increases in accuracy averagely on the 6 topics from public tweet corpus.

References

K. Bennett, A. Demiriz, et al. Semi-supervised support vector machines. Advances in Neural Information processing systems, pages 368--374, 1999. Google ScholarDigital Library
J. Blitzer, M. Dredze, and F. Pereira. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In ACL, volume 7, pages 440--447, 2007.Google Scholar
A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory, pages 92--100. ACM, 1998. Google ScholarDigital Library
M. Collins and Y. Singer. Unsupervised models for named entity classification. In Proceedings of the joint SIGDAT conference on empirical methods in natural language processing and very large corpora, pages 189--196, 1999.Google Scholar
K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machines. The Journal of Machine Learning Research, 2:265--292, 2002. Google ScholarDigital Library
N. A. Diakopoulos and D. A. Shamma. Characterizing debate performance via aggregated twitter sentiment. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 1195--1198. ACM, 2010. Google ScholarDigital Library
G. Fung and O. L. Mangasarian. Semi-superyised support vector machines for unlabeled data classification. Optimization methods and software, 15(1):29--44, 2001.Google Scholar
S. Gao and H. Li. A cross-domain adaptation method for sentiment classification using probabilistic latent analysis. In Proceedings of the 20th ACM international conference on Information and knowledge management, pages 1047--1052. ACM, 2011. Google ScholarDigital Library
A. Go, R. Bhayani, and L. Huang. Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, pages 1--12, 2009.Google Scholar
N. Godbole, M. Srinivasaiah, and S. Skiena. Large-scale sentiment analysis for news and blogs. ICWSM, 7, 2007.Google Scholar
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: an update. ACM SIGKDD Explorations Newsletter, 11(1):10--18, 2009. Google ScholarDigital Library
Y. He, C. Lin, and H. Alani. Automatically extracting polarity-bearing topics for cross-domain sentiment classification. 2011.Google Scholar
M. Hu and B. Liu. Mining opinion features in customer reviews. In AAAI, volume 4, pages 755--760, 2004. Google ScholarDigital Library
O. I., M. C., L. J., and S. I. Overview of the TREC 2011 microblog track. In TREC'11, 2011.Google Scholar
S. I., O. I., and L. J. Overview of the TREC 2012 microblog track. In TREC'12, 2012.Google Scholar
T. Joachims. Transductive inference for text classification using support vector machines. In ICML, volume 99, pages 200--209, 1999. Google ScholarDigital Library
S. Kiritchenko and S. Matwin. Email classification with co-training. In Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research, pages 301--312. IBM Corp., 2011. Google ScholarDigital Library
E. Kouloumpis, T. Wilson, and J. Moore. Twitter sentiment analysis: The good the bad and the omg! In ICWSM, 2011.Google Scholar
O. Kucuktunc, B. B. Cambazoglu, I. Weber, and H. Ferhatosmanoglu. A large-scale sentiment analysis for yahoo! answers. In Proceedings of the fifth ACM international conference on Web search and data mining, pages 633--642. ACM, 2012. Google ScholarDigital Library
F. Li, C. Han, M. Huang, X. Zhu, Y.-J. Xia, S. Zhang, and H. Yu. Structure-aware review mining and summarization. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 653--661. Association for Computational Linguistics, 2010. Google ScholarDigital Library
F. Li, N. Liu, H. Jin, K. Zhao, Q. Yang, and X. Zhu. Incorporating reviewer and product information for review rating prediction. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence-Volume Volume Three, pages 1820--1825. AAAI Press, 2011. Google ScholarDigital Library
F. Li, S. J. Pan, O. Jin, Q. Yang, and X. Zhu. Cross-domain co-extraction of sentiment and topic lexicons. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pages 410--419. Association for Computational Linguistics, 2012. Google ScholarDigital Library
S. Li, C.-R. Huang, G. Zhou, and S. Y. M. Lee. Employing personal/impersonal views in supervised and semi-supervised sentiment classification. In Proceedings of the 48th annual meeting of the association for computational linguistics, pages 414--423. Association for Computational Linguistics, 2010. Google ScholarDigital Library
S. Li, Z. Wang, G. Zhou, and S. Y. M. Lee. Semi-supervised learning for imbalanced sentiment classification. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence-Volume Volume Three, pages 1826--1831. AAAI Press, 2011. Google ScholarDigital Library
K.-L. Liu, W.-J. Li, and M. Guo. Emoticon smoothed language models for twitter sentiment analysis. In AAAI, 2012.Google ScholarDigital Library
R. Mehta, D. Mehta, D. Chheda, C. Shah, and P. M. Chawan. Sentiment analysis and influence tracking using twitter. International Journal of Advanced Research in Computer Science and Electronics Engineering (IJARCSEE), 1(2):pp--72, 2012.Google Scholar
Y. Mejova and P. Srinivasan. Crossing media streams with sentiment: Domain adaptation in blogs, reviews and twitter. In ICWSM, 2012.Google Scholar
A. Mukherjee and B. Liu. Aspect extraction through semi-supervised modeling. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pages 339--348. Association for Computational Linguistics, 2012. Google ScholarDigital Library
L. T. Nguyen, P. Wu, W. Chan, W. Peng, and Y. Zhang. Predicting collective sentiment dynamics from time-series social media. In Proceedings of the First International Workshop on Issues of Sentiment Discovery and Opinion Mining, page 6. ACM, 2012. Google ScholarDigital Library
K. Nigam and R. Ghani. Analyzing the effectiveness and applicability of co-training. In Proceedings of the ninth international conference on Information and knowledge management, pages 86--93. ACM, 2000. Google ScholarDigital Library
K. Nigam and R. Ghani. Understanding the behavior of co-training. In Proceedings of KDD-2000 workshop on text mining, volume 14. Citeseer, 2000.Google Scholar
S. J. Pan, X. Ni, J.-T. Sun, Q. Yang, and Z. Chen. Cross-domain sentiment classification via spectral feature alignment. In Proceedings of the 19th international conference on World wide web, pages 751--760. ACM, 2010. Google ScholarDigital Library
B. Pang and L. Lee. Opinion mining and sentiment analysis. Foundations and trends in information retrieval, 2(1-2):1--135, 2008. Google ScholarDigital Library
M. Thelwall, K. Buckley, and G. Paltoglou. Sentiment in twitter events. Journal of the American Society for Information Science and Technology, 62(2):406--418, 2011. Google ScholarDigital Library
I. Titov and R. McDonald. A joint model of text and aspect ratings for sentiment summarization. Urbana, 51:61801, 2008.Google Scholar
A. Tumasjan, T. O. Sprenger, P. G. Sandner, and I. M. Welpe. Predicting elections with twitter: What 140 characters reveal about political sentiment. ICWSM, 10:178--185, 2010.Google ScholarCross Ref
P. D. Turney. Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 417--424. Association for Computational Linguistics, 2002. Google ScholarDigital Library
V. N. Vapnik. An overview of statistical learning theory. Neural Networks, IEEE Transactions on, 10(5):988--999, 1999. Google ScholarDigital Library
X. Wan. Co-training for cross-lingual sentiment classification. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1, pages 235--243. Association for Computational Linguistics, 2009. Google ScholarDigital Library
N. Yu and S. Kübler. Filling the gap: Semi-supervised learning for opinion detection across domains. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning, pages 200--209. Association for Computational Linguistics, 2011. Google ScholarDigital Library

Index Terms

Adaptive co-training SVM for sentiment classification on tweets
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources

Recommendations

TASC:Topic-Adaptive Sentiment Classification on Dynamic Tweets
Sentiment classification is a topic-sensitive task, i.e., a classifier trained from one topic will perform worse on another. This is especially a problem for the tweets sentiment analysis. Since the topics in Twitter are very diverse, it is impossible to ...
Read More
Sentence-level Sentiment Classification with Weak Supervision
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

Sentence-level sentiment classification is important to understand users' fine-grained opinions. Existing methods for sentence-level sentiment classification are mainly based on supervised learning. However, it is difficult to obtain sentiment labels of ...
Read More
Sentiment Lexicon Enhanced Neural Sentiment Classification
CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management

Sentiment classification is an important task in the sentiment analysis field. Many deep learning based sentiment classification methods have been proposed in recent years. However, these methods usually rely on massive labeled texts to train sentiment ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management
October 2013
2612 pages
ISBN:9781450322638
DOI:10.1145/2505515
General Chairs:
Qi He
LinkedIn, USA
,
Arun Iyengar
IBM T.J. Watson Research Center, USA
,
Program Chairs:
Wolfgang Nejdl
L3S Research Center, Germany
,
Jian Pei
Simon Fraser University, Canada
,
Rajeev Rastogi
Amazon, India
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 October 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
co-training
multiclass svm
semi-supervised learning
sentiment classification
topic-adaptive
tweet sentiment
Qualifiers
- research-article
Conference

Acceptance Rates
CIKM '13 Paper Acceptance Rate143of848submissions,17%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 48
  Total Citations
  View Citations
- 895
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Adaptive co-training SVM for sentiment classification on tweets

CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

TASC:Topic-Adaptive Sentiment Classification on Dynamic Tweets

Sentence-level Sentiment Classification with Weak Supervision

Sentiment Lexicon Enhanced Neural Sentiment Classification