skip to main content
10.1145/2505515.2505569acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Adaptive co-training SVM for sentiment classification on tweets

Authors Info & Claims
Published:27 October 2013Publication History

ABSTRACT

Sentiment classification is an important problem in tweets mining. There lack labeled data and rating mechanism for generating them in Twitter service. And topics in Twitter are more diverse while sentiment classifiers always dedicate themselves to a specific domain or topic. Thus it is a challenge to make sentiment classification adaptive to diverse topics without sufficient labeled data. Therefore we formally propose an adaptive multiclass SVM model which transfers an initial common sentiment classifier to a topic-adaptive one. To tackle the tweet sparsity, non-text features are explored besides the conventional text features, which are intuitively split into two views. An iterative algorithm is proposed for solving this model by alternating among three steps: optimization, unlabeled data selection and adaptive feature expansion steps. The algorithm alternatively minimizes the margins of two independent objectives on different views to learn coefficient matrices, which are collaboratively used for unlabeled tweets selection from the topic that the algorithm is adapting to. And then topic-adaptive sentiment words are expended based on the above selection, in turn to help the first two steps find more confident and unlabeled tweets and boost the final performance. Comparing with the well-known supervised sentiment classifiers and semi-supervised approaches, our algorithm achieves promising increases in accuracy averagely on the 6 topics from public tweet corpus.

References

  1. K. Bennett, A. Demiriz, et al. Semi-supervised support vector machines. Advances in Neural Information processing systems, pages 368--374, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Blitzer, M. Dredze, and F. Pereira. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In ACL, volume 7, pages 440--447, 2007.Google ScholarGoogle Scholar
  3. A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory, pages 92--100. ACM, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Collins and Y. Singer. Unsupervised models for named entity classification. In Proceedings of the joint SIGDAT conference on empirical methods in natural language processing and very large corpora, pages 189--196, 1999.Google ScholarGoogle Scholar
  5. K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machines. The Journal of Machine Learning Research, 2:265--292, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. N. A. Diakopoulos and D. A. Shamma. Characterizing debate performance via aggregated twitter sentiment. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 1195--1198. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Fung and O. L. Mangasarian. Semi-superyised support vector machines for unlabeled data classification. Optimization methods and software, 15(1):29--44, 2001.Google ScholarGoogle Scholar
  8. S. Gao and H. Li. A cross-domain adaptation method for sentiment classification using probabilistic latent analysis. In Proceedings of the 20th ACM international conference on Information and knowledge management, pages 1047--1052. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Go, R. Bhayani, and L. Huang. Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, pages 1--12, 2009.Google ScholarGoogle Scholar
  10. N. Godbole, M. Srinivasaiah, and S. Skiena. Large-scale sentiment analysis for news and blogs. ICWSM, 7, 2007.Google ScholarGoogle Scholar
  11. M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: an update. ACM SIGKDD Explorations Newsletter, 11(1):10--18, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. He, C. Lin, and H. Alani. Automatically extracting polarity-bearing topics for cross-domain sentiment classification. 2011.Google ScholarGoogle Scholar
  13. M. Hu and B. Liu. Mining opinion features in customer reviews. In AAAI, volume 4, pages 755--760, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. O. I., M. C., L. J., and S. I. Overview of the TREC 2011 microblog track. In TREC'11, 2011.Google ScholarGoogle Scholar
  15. S. I., O. I., and L. J. Overview of the TREC 2012 microblog track. In TREC'12, 2012.Google ScholarGoogle Scholar
  16. T. Joachims. Transductive inference for text classification using support vector machines. In ICML, volume 99, pages 200--209, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Kiritchenko and S. Matwin. Email classification with co-training. In Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research, pages 301--312. IBM Corp., 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. Kouloumpis, T. Wilson, and J. Moore. Twitter sentiment analysis: The good the bad and the omg! In ICWSM, 2011.Google ScholarGoogle Scholar
  19. O. Kucuktunc, B. B. Cambazoglu, I. Weber, and H. Ferhatosmanoglu. A large-scale sentiment analysis for yahoo! answers. In Proceedings of the fifth ACM international conference on Web search and data mining, pages 633--642. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. F. Li, C. Han, M. Huang, X. Zhu, Y.-J. Xia, S. Zhang, and H. Yu. Structure-aware review mining and summarization. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 653--661. Association for Computational Linguistics, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. F. Li, N. Liu, H. Jin, K. Zhao, Q. Yang, and X. Zhu. Incorporating reviewer and product information for review rating prediction. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence-Volume Volume Three, pages 1820--1825. AAAI Press, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. F. Li, S. J. Pan, O. Jin, Q. Yang, and X. Zhu. Cross-domain co-extraction of sentiment and topic lexicons. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pages 410--419. Association for Computational Linguistics, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Li, C.-R. Huang, G. Zhou, and S. Y. M. Lee. Employing personal/impersonal views in supervised and semi-supervised sentiment classification. In Proceedings of the 48th annual meeting of the association for computational linguistics, pages 414--423. Association for Computational Linguistics, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Li, Z. Wang, G. Zhou, and S. Y. M. Lee. Semi-supervised learning for imbalanced sentiment classification. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence-Volume Volume Three, pages 1826--1831. AAAI Press, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. K.-L. Liu, W.-J. Li, and M. Guo. Emoticon smoothed language models for twitter sentiment analysis. In AAAI, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R. Mehta, D. Mehta, D. Chheda, C. Shah, and P. M. Chawan. Sentiment analysis and influence tracking using twitter. International Journal of Advanced Research in Computer Science and Electronics Engineering (IJARCSEE), 1(2):pp--72, 2012.Google ScholarGoogle Scholar
  27. Y. Mejova and P. Srinivasan. Crossing media streams with sentiment: Domain adaptation in blogs, reviews and twitter. In ICWSM, 2012.Google ScholarGoogle Scholar
  28. A. Mukherjee and B. Liu. Aspect extraction through semi-supervised modeling. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pages 339--348. Association for Computational Linguistics, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L. T. Nguyen, P. Wu, W. Chan, W. Peng, and Y. Zhang. Predicting collective sentiment dynamics from time-series social media. In Proceedings of the First International Workshop on Issues of Sentiment Discovery and Opinion Mining, page 6. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. K. Nigam and R. Ghani. Analyzing the effectiveness and applicability of co-training. In Proceedings of the ninth international conference on Information and knowledge management, pages 86--93. ACM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. K. Nigam and R. Ghani. Understanding the behavior of co-training. In Proceedings of KDD-2000 workshop on text mining, volume 14. Citeseer, 2000.Google ScholarGoogle Scholar
  32. S. J. Pan, X. Ni, J.-T. Sun, Q. Yang, and Z. Chen. Cross-domain sentiment classification via spectral feature alignment. In Proceedings of the 19th international conference on World wide web, pages 751--760. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. B. Pang and L. Lee. Opinion mining and sentiment analysis. Foundations and trends in information retrieval, 2(1-2):1--135, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M. Thelwall, K. Buckley, and G. Paltoglou. Sentiment in twitter events. Journal of the American Society for Information Science and Technology, 62(2):406--418, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. I. Titov and R. McDonald. A joint model of text and aspect ratings for sentiment summarization. Urbana, 51:61801, 2008.Google ScholarGoogle Scholar
  36. A. Tumasjan, T. O. Sprenger, P. G. Sandner, and I. M. Welpe. Predicting elections with twitter: What 140 characters reveal about political sentiment. ICWSM, 10:178--185, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  37. P. D. Turney. Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 417--424. Association for Computational Linguistics, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. V. N. Vapnik. An overview of statistical learning theory. Neural Networks, IEEE Transactions on, 10(5):988--999, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. X. Wan. Co-training for cross-lingual sentiment classification. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1, pages 235--243. Association for Computational Linguistics, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. N. Yu and S. Kübler. Filling the gap: Semi-supervised learning for opinion detection across domains. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning, pages 200--209. Association for Computational Linguistics, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Adaptive co-training SVM for sentiment classification on tweets

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management
      October 2013
      2612 pages
      ISBN:9781450322638
      DOI:10.1145/2505515

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 October 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CIKM '13 Paper Acceptance Rate143of848submissions,17%Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader