ABSTRACT
Sentiment classification is an important problem in tweets mining. There lack labeled data and rating mechanism for generating them in Twitter service. And topics in Twitter are more diverse while sentiment classifiers always dedicate themselves to a specific domain or topic. Thus it is a challenge to make sentiment classification adaptive to diverse topics without sufficient labeled data. Therefore we formally propose an adaptive multiclass SVM model which transfers an initial common sentiment classifier to a topic-adaptive one. To tackle the tweet sparsity, non-text features are explored besides the conventional text features, which are intuitively split into two views. An iterative algorithm is proposed for solving this model by alternating among three steps: optimization, unlabeled data selection and adaptive feature expansion steps. The algorithm alternatively minimizes the margins of two independent objectives on different views to learn coefficient matrices, which are collaboratively used for unlabeled tweets selection from the topic that the algorithm is adapting to. And then topic-adaptive sentiment words are expended based on the above selection, in turn to help the first two steps find more confident and unlabeled tweets and boost the final performance. Comparing with the well-known supervised sentiment classifiers and semi-supervised approaches, our algorithm achieves promising increases in accuracy averagely on the 6 topics from public tweet corpus.
- K. Bennett, A. Demiriz, et al. Semi-supervised support vector machines. Advances in Neural Information processing systems, pages 368--374, 1999. Google ScholarDigital Library
- J. Blitzer, M. Dredze, and F. Pereira. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In ACL, volume 7, pages 440--447, 2007.Google Scholar
- A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory, pages 92--100. ACM, 1998. Google ScholarDigital Library
- M. Collins and Y. Singer. Unsupervised models for named entity classification. In Proceedings of the joint SIGDAT conference on empirical methods in natural language processing and very large corpora, pages 189--196, 1999.Google Scholar
- K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machines. The Journal of Machine Learning Research, 2:265--292, 2002. Google ScholarDigital Library
- N. A. Diakopoulos and D. A. Shamma. Characterizing debate performance via aggregated twitter sentiment. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 1195--1198. ACM, 2010. Google ScholarDigital Library
- G. Fung and O. L. Mangasarian. Semi-superyised support vector machines for unlabeled data classification. Optimization methods and software, 15(1):29--44, 2001.Google Scholar
- S. Gao and H. Li. A cross-domain adaptation method for sentiment classification using probabilistic latent analysis. In Proceedings of the 20th ACM international conference on Information and knowledge management, pages 1047--1052. ACM, 2011. Google ScholarDigital Library
- A. Go, R. Bhayani, and L. Huang. Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, pages 1--12, 2009.Google Scholar
- N. Godbole, M. Srinivasaiah, and S. Skiena. Large-scale sentiment analysis for news and blogs. ICWSM, 7, 2007.Google Scholar
- M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: an update. ACM SIGKDD Explorations Newsletter, 11(1):10--18, 2009. Google ScholarDigital Library
- Y. He, C. Lin, and H. Alani. Automatically extracting polarity-bearing topics for cross-domain sentiment classification. 2011.Google Scholar
- M. Hu and B. Liu. Mining opinion features in customer reviews. In AAAI, volume 4, pages 755--760, 2004. Google ScholarDigital Library
- O. I., M. C., L. J., and S. I. Overview of the TREC 2011 microblog track. In TREC'11, 2011.Google Scholar
- S. I., O. I., and L. J. Overview of the TREC 2012 microblog track. In TREC'12, 2012.Google Scholar
- T. Joachims. Transductive inference for text classification using support vector machines. In ICML, volume 99, pages 200--209, 1999. Google ScholarDigital Library
- S. Kiritchenko and S. Matwin. Email classification with co-training. In Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research, pages 301--312. IBM Corp., 2011. Google ScholarDigital Library
- E. Kouloumpis, T. Wilson, and J. Moore. Twitter sentiment analysis: The good the bad and the omg! In ICWSM, 2011.Google Scholar
- O. Kucuktunc, B. B. Cambazoglu, I. Weber, and H. Ferhatosmanoglu. A large-scale sentiment analysis for yahoo! answers. In Proceedings of the fifth ACM international conference on Web search and data mining, pages 633--642. ACM, 2012. Google ScholarDigital Library
- F. Li, C. Han, M. Huang, X. Zhu, Y.-J. Xia, S. Zhang, and H. Yu. Structure-aware review mining and summarization. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 653--661. Association for Computational Linguistics, 2010. Google ScholarDigital Library
- F. Li, N. Liu, H. Jin, K. Zhao, Q. Yang, and X. Zhu. Incorporating reviewer and product information for review rating prediction. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence-Volume Volume Three, pages 1820--1825. AAAI Press, 2011. Google ScholarDigital Library
- F. Li, S. J. Pan, O. Jin, Q. Yang, and X. Zhu. Cross-domain co-extraction of sentiment and topic lexicons. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pages 410--419. Association for Computational Linguistics, 2012. Google ScholarDigital Library
- S. Li, C.-R. Huang, G. Zhou, and S. Y. M. Lee. Employing personal/impersonal views in supervised and semi-supervised sentiment classification. In Proceedings of the 48th annual meeting of the association for computational linguistics, pages 414--423. Association for Computational Linguistics, 2010. Google ScholarDigital Library
- S. Li, Z. Wang, G. Zhou, and S. Y. M. Lee. Semi-supervised learning for imbalanced sentiment classification. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence-Volume Volume Three, pages 1826--1831. AAAI Press, 2011. Google ScholarDigital Library
- K.-L. Liu, W.-J. Li, and M. Guo. Emoticon smoothed language models for twitter sentiment analysis. In AAAI, 2012.Google ScholarDigital Library
- R. Mehta, D. Mehta, D. Chheda, C. Shah, and P. M. Chawan. Sentiment analysis and influence tracking using twitter. International Journal of Advanced Research in Computer Science and Electronics Engineering (IJARCSEE), 1(2):pp--72, 2012.Google Scholar
- Y. Mejova and P. Srinivasan. Crossing media streams with sentiment: Domain adaptation in blogs, reviews and twitter. In ICWSM, 2012.Google Scholar
- A. Mukherjee and B. Liu. Aspect extraction through semi-supervised modeling. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pages 339--348. Association for Computational Linguistics, 2012. Google ScholarDigital Library
- L. T. Nguyen, P. Wu, W. Chan, W. Peng, and Y. Zhang. Predicting collective sentiment dynamics from time-series social media. In Proceedings of the First International Workshop on Issues of Sentiment Discovery and Opinion Mining, page 6. ACM, 2012. Google ScholarDigital Library
- K. Nigam and R. Ghani. Analyzing the effectiveness and applicability of co-training. In Proceedings of the ninth international conference on Information and knowledge management, pages 86--93. ACM, 2000. Google ScholarDigital Library
- K. Nigam and R. Ghani. Understanding the behavior of co-training. In Proceedings of KDD-2000 workshop on text mining, volume 14. Citeseer, 2000.Google Scholar
- S. J. Pan, X. Ni, J.-T. Sun, Q. Yang, and Z. Chen. Cross-domain sentiment classification via spectral feature alignment. In Proceedings of the 19th international conference on World wide web, pages 751--760. ACM, 2010. Google ScholarDigital Library
- B. Pang and L. Lee. Opinion mining and sentiment analysis. Foundations and trends in information retrieval, 2(1-2):1--135, 2008. Google ScholarDigital Library
- M. Thelwall, K. Buckley, and G. Paltoglou. Sentiment in twitter events. Journal of the American Society for Information Science and Technology, 62(2):406--418, 2011. Google ScholarDigital Library
- I. Titov and R. McDonald. A joint model of text and aspect ratings for sentiment summarization. Urbana, 51:61801, 2008.Google Scholar
- A. Tumasjan, T. O. Sprenger, P. G. Sandner, and I. M. Welpe. Predicting elections with twitter: What 140 characters reveal about political sentiment. ICWSM, 10:178--185, 2010.Google ScholarCross Ref
- P. D. Turney. Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 417--424. Association for Computational Linguistics, 2002. Google ScholarDigital Library
- V. N. Vapnik. An overview of statistical learning theory. Neural Networks, IEEE Transactions on, 10(5):988--999, 1999. Google ScholarDigital Library
- X. Wan. Co-training for cross-lingual sentiment classification. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1, pages 235--243. Association for Computational Linguistics, 2009. Google ScholarDigital Library
- N. Yu and S. Kübler. Filling the gap: Semi-supervised learning for opinion detection across domains. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning, pages 200--209. Association for Computational Linguistics, 2011. Google ScholarDigital Library
Index Terms
- Adaptive co-training SVM for sentiment classification on tweets
Recommendations
TASC:Topic-Adaptive Sentiment Classification on Dynamic Tweets
Sentiment classification is a topic-sensitive task, i.e., a classifier trained from one topic will perform worse on another. This is especially a problem for the tweets sentiment analysis. Since the topics in Twitter are very diverse, it is impossible to ...
Sentence-level Sentiment Classification with Weak Supervision
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information RetrievalSentence-level sentiment classification is important to understand users' fine-grained opinions. Existing methods for sentence-level sentiment classification are mainly based on supervised learning. However, it is difficult to obtain sentiment labels of ...
Sentiment Lexicon Enhanced Neural Sentiment Classification
CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge ManagementSentiment classification is an important task in the sentiment analysis field. Many deep learning based sentiment classification methods have been proposed in recent years. However, these methods usually rely on massive labeled texts to train sentiment ...
Comments