Abstract
Recently, cross-domain sentiment classification is becoming popular owing to its potential applications, such as marketing et al. It seeks to generalize a model, which is trained on a source domain and using it to label samples in the target domain. However, the source and target distributions differ substantially in many cases. To address this issue, we propose a comprehensive model, which takes sample filtering and labeling adaptation into account simultaneously, named joint Sample Filtering with Subject-based Ensemble Model (SF-SE). Firstly, a sentence level Latent Dirichlet Allocation (LDA) model, which incorporates topic and sentiment together (SS-LDA) is introduced. Under this model, a high-quality training dataset is constructed in an unsupervised way. Secondly, inspired by the distribution variance of domain-independent and domain-specific features related to the subject of a sentence, we introduce a Subject-based Ensemble model to efficiently improve the classification performance. Experimental results show that the proposed model is effective for cross-domain sentiment classification.
References
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the ACL-2002 Conference on Empirical Methods in Natural Language Processing, pp. 79–86 (2002)
Pang, B., Lee, L.: Opinion mining and sentiment analysis. J. Found. Trends Inf. Retrieval 2, 1–135 (2008)
Liu, Y., Huang, X., An, A., Yu, X.: ARSA: a sentiment-aware model for predicting sales performance using blogs. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 607–614 (2007)
Yu, X., Liu, Y., Huang, X., An, A.: Mining online reviews for predicting sales performance: a case study in the movie domain. IEEE Trans. J. Knowl. Data Eng. 24, 720–734 (2012)
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. J. Knowl. Data Eng. 22, 1345–1359 (2010)
Blitzer, J., McDonald, R., Pereira, F.: Domain adaptation with structural correspondence learning. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 120–128 (2006)
Blitzer J., Dredze M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, vol. 7, pp. 440–447 (2007)
Pan, S. J., Ni, X., Sun, J., Yang, Q., Chen, Z.: Cross-domain sentiment classification via spectral feature alignment. In: Proceedings of the 19th International Conference on World Wide Web, pp. 751–760 (2010)
He, Y., Lin, C., Alani, H.: Automatically extracting polarity-bearing topics for cross-domain sentiment classification. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 123–131 (2011)
Duan, L., Xu, D., Tsang, I.: Learning with augmented features for heterogeneous domain adaptation. J. arXiv preprint (2012). arXiv:1206.4660
Jiang, J., Zhai, C.: Instance weighting for domain adaptation in NLP. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, vol. 7, pp. 264–271 (2007)
Xia, R., Zong, C.: A POS-based ensemble model for cross-domain sentiment classification. In: Proceedings of the 5th International Joint Conference on Natural Language Processing, pp. 614–622. Citeseer (2011)
Samdani, R., Yih, W.: Domain adaptation with ensemble of feature groups. In: Proceedings-International Joint Conference on Artificial Intelligence, vol. 22, p. 1458 (2011)
Gao, J., Fan, W., Jiang, J., Han, J.: Knowledge transfer via multiple model local structure mapping. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 283–291 (2008)
Yoshida, Y., Hirao, T., Iwata, T., Nagata, M., Matsumoto, Y.: Twenty-Fifth AAAI Conference on Artificial Intelligence (2011)
Lin, C., He, Y.: Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 375–384 (2009)
Xia, R., Zong, C., Hu, X., Cambria, E.: Feature ensemble plus sample selection: domain adaptation for sentiment classification. J. Intell. Syst. 28, 10–18 (2013)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Lu, B., Ott, M., Cardie, C., Tsou, B.K.: Multi-aspect sentiment analysis with topic models. In: IEEE 11th International Conference on Data Mining Workshops (ICDMW), pp. 81–88 (2011)
Lin, J.: Divergence measures based on the Shannon entropy. IEEE Trans. J. Inf. Theo. 37, 145–151 (1991)
Fumera, G., Roli, F.: A theoretical and experimental analysis of linear combiners for multiple classifier systems. IEEE Trans. J. Pattern Analy. Mach. Intell. 27, 942–956 (2005)
Juang, B.H., Katagiri, S.: Discriminative learning for minimum error classification. IEEE Trans. J. Signal Process. 40, 3043–3054 (1992)
Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p. 271 (2004)
Acknowledgements
This work is partially supported by grant from the Natural Science Foundation of China (No. 61277370, 61402075), Natural Science Foundation of Liaoning Province, China (No. 201202031, 2014020003), State Education Ministry and The Research Fund for the Doctoral Program of Higher Education (No. 20090041110002), the Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Yang, L., Zhang, S., Lin, H., Wei, X. (2015). Incorporating Sample Filtering into Subject-Based Ensemble Model for Cross-Domain Sentiment Classification. In: Sun, M., Liu, Z., Zhang, M., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. CCL NLP-NABD 2015 2015. Lecture Notes in Computer Science(), vol 9427. Springer, Cham. https://doi.org/10.1007/978-3-319-25816-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-25816-4_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25815-7
Online ISBN: 978-3-319-25816-4
eBook Packages: Computer ScienceComputer Science (R0)