ABSTRACT
Due to highly domain-specific nature, supervised sentiment classifiers typically require a large number of new labeled training data when transferred to another domain. This is so-called domaintransfer problem. In this work, we attempt to tackle this problem by combining old-domain labeled examples with new-domain unlabeled ones. The basic idea is to use old-domain-trained classifier to label some informative unlabeled examples in new domain, and train the base classifier again. The experimental results demonstrate that proposed method dramatically boosts the accuracy of the base sentiment classifier on new domain.
- Aue, A. and Gamon, M. Customizing Sentiment Classifiers to New Domains: a Case Study. RANLP. 2005.Google Scholar
- Blum, A. and Mitchell, T. (1998). Combining labeled and unlabeled data with Co-Training. COLT. 1998, 92--100. Google ScholarDigital Library
- Cui, H., Mittal, V., Datar, M. Comparative Experiments on Sentiment Classification for Online Product Reviews. AAAI. 2006. Google ScholarDigital Library
- Engström, C. Topic Dependence in sentiment classification. Unpublished M.Sc. thesis, University of Cambridge, 2004.Google Scholar
- Finn, A., and Kushmerick, N. 2003. Learning to classify documents according to genre. In IJCAI-03 Workshop on Computational Approaches to Style Analysis and SynthesisGoogle Scholar
- Han, E. and Karypis, G. Centroid-Based Document Classification Analysis & Experimental Result. PKDD. 2000. Google ScholarDigital Library
- Joachims, T. Transductive inference for text classification using support vector machines. ICML. 1999, 200--209. Google ScholarDigital Library
- Kennedy, A. and Inkpen, D. Sentiment Classification of Movie and Product Reviews Using Contextual Valence Shifters. FINEXIN. 2005.Google Scholar
- Lanquillon, C. Learning from Labeled and Unlabeled Documents: A Comparative Study on Semi-Supervised Text Classification. PKDD. 2000, 490--497 Google ScholarDigital Library
- Mullen, T. and Collier, N. Sentiment analysis using support vector machines with diverse information sources. EMNLP. 2004, 412--418Google Scholar
- Nigam, K., McCallum, A., Thrun, S. and Mitchell, T. Learning to classify text from labeled and unlabeled documents. AAAI. 1998, 792--799. Google ScholarDigital Library
- Pang, P., Lee, L., and Vaithyanathan S. Thumbs up? Sentiment classification using machine learning techniques. EMNLP. 2002. Google ScholarDigital Library
- Salton, G., McGill, M. Introduction to Modern Information Retrieval. McGraw-Hill Book Company, New York. 1983. Google ScholarDigital Library
- Turney, P. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. ACL. 2002, 417--427 Google ScholarDigital Library
- Whitelaw, C., Garg, N., Argamon, S. Using appraisal groups for sentiment analysis. CIKM. 2005, 625--631. Google ScholarDigital Library
- Yang, Y. A study on thresholding strategies for text categorization. SIGIR. 2001, 137--145 Google ScholarDigital Library
- Jing Jiang and ChengXiang Zhai. Instance weighting for domain adaptation in NLP. ACL 2007.Google Scholar
Index Terms
- Using unlabeled data to handle domain-transfer problem of semantic detection
Recommendations
A novel scheme for domain-transfer problem in the context of sentiment analysis
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge managementIn this work, we attempt to tackle domain-transfer problem by combining old-domain labeled examples with new-domain unlabeled ones. The basic idea is to use old-domain-trained classifier to label some informative unlabeled examples in new domain, and ...
A weakly supervised approach to Chinese sentiment classification using partitioned self-training
With the rapid evolution of documents on the World Wide Web which express opinions, there exists an increasing demand for developing such a sentiment analysis technique that can easily adapt to new domains with minimum supervision. This article ...
Sentiment labeling for extending initial labeled data to improve semi-supervised sentiment classification
Semi-supervised framework which exploits unsupervised approach (JST) is proposed.Self-training suffers from incorrectly labeling problem with insufficient data.Confidently predicted instances are labeled and used as training data by JST.Self-training ...
Comments