Skip to main content

Incorporating Sample Filtering into Subject-Based Ensemble Model for Cross-Domain Sentiment Classification

  • Conference paper
  • First Online:
Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data (CCL 2015, NLP-NABD 2015)

Abstract

Recently, cross-domain sentiment classification is becoming popular owing to its potential applications, such as marketing et al. It seeks to generalize a model, which is trained on a source domain and using it to label samples in the target domain. However, the source and target distributions differ substantially in many cases. To address this issue, we propose a comprehensive model, which takes sample filtering and labeling adaptation into account simultaneously, named joint Sample Filtering with Subject-based Ensemble Model (SF-SE). Firstly, a sentence level Latent Dirichlet Allocation (LDA) model, which incorporates topic and sentiment together (SS-LDA) is introduced. Under this model, a high-quality training dataset is constructed in an unsupervised way. Secondly, inspired by the distribution variance of domain-independent and domain-specific features related to the subject of a sentence, we introduce a Subject-based Ensemble model to efficiently improve the classification performance. Experimental results show that the proposed model is effective for cross-domain sentiment classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the ACL-2002 Conference on Empirical Methods in Natural Language Processing, pp. 79–86 (2002)

    Google Scholar 

  • Pang, B., Lee, L.: Opinion mining and sentiment analysis. J. Found. Trends Inf. Retrieval 2, 1–135 (2008)

    Article  Google Scholar 

  • Liu, Y., Huang, X., An, A., Yu, X.: ARSA: a sentiment-aware model for predicting sales performance using blogs. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 607–614 (2007)

    Google Scholar 

  • Yu, X., Liu, Y., Huang, X., An, A.: Mining online reviews for predicting sales performance: a case study in the movie domain. IEEE Trans. J. Knowl. Data Eng. 24, 720–734 (2012)

    Article  Google Scholar 

  • Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. J. Knowl. Data Eng. 22, 1345–1359 (2010)

    Article  Google Scholar 

  • Blitzer, J., McDonald, R., Pereira, F.: Domain adaptation with structural correspondence learning. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 120–128 (2006)

    Google Scholar 

  • Blitzer J., Dredze M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, vol. 7, pp. 440–447 (2007)

    Google Scholar 

  • Pan, S. J., Ni, X., Sun, J., Yang, Q., Chen, Z.: Cross-domain sentiment classification via spectral feature alignment. In: Proceedings of the 19th International Conference on World Wide Web, pp. 751–760 (2010)

    Google Scholar 

  • He, Y., Lin, C., Alani, H.: Automatically extracting polarity-bearing topics for cross-domain sentiment classification. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 123–131 (2011)

    Google Scholar 

  • Duan, L., Xu, D., Tsang, I.: Learning with augmented features for heterogeneous domain adaptation. J. arXiv preprint (2012). arXiv:1206.4660

  • Jiang, J., Zhai, C.: Instance weighting for domain adaptation in NLP. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, vol. 7, pp. 264–271 (2007)

    Google Scholar 

  • Xia, R., Zong, C.: A POS-based ensemble model for cross-domain sentiment classification. In: Proceedings of the 5th International Joint Conference on Natural Language Processing, pp. 614–622. Citeseer (2011)

    Google Scholar 

  • Samdani, R., Yih, W.: Domain adaptation with ensemble of feature groups. In: Proceedings-International Joint Conference on Artificial Intelligence, vol. 22, p. 1458 (2011)

    Google Scholar 

  • Gao, J., Fan, W., Jiang, J., Han, J.: Knowledge transfer via multiple model local structure mapping. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 283–291 (2008)

    Google Scholar 

  • Yoshida, Y., Hirao, T., Iwata, T., Nagata, M., Matsumoto, Y.: Twenty-Fifth AAAI Conference on Artificial Intelligence (2011)

    Google Scholar 

  • Lin, C., He, Y.: Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 375–384 (2009)

    Google Scholar 

  • Xia, R., Zong, C., Hu, X., Cambria, E.: Feature ensemble plus sample selection: domain adaptation for sentiment classification. J. Intell. Syst. 28, 10–18 (2013)

    Article  Google Scholar 

  • Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  • Lu, B., Ott, M., Cardie, C., Tsou, B.K.: Multi-aspect sentiment analysis with topic models. In: IEEE 11th International Conference on Data Mining Workshops (ICDMW), pp. 81–88 (2011)

    Google Scholar 

  • Lin, J.: Divergence measures based on the Shannon entropy. IEEE Trans. J. Inf. Theo. 37, 145–151 (1991)

    Article  MathSciNet  Google Scholar 

  • Fumera, G., Roli, F.: A theoretical and experimental analysis of linear combiners for multiple classifier systems. IEEE Trans. J. Pattern Analy. Mach. Intell. 27, 942–956 (2005)

    Article  Google Scholar 

  • Juang, B.H., Katagiri, S.: Discriminative learning for minimum error classification. IEEE Trans. J. Signal Process. 40, 3043–3054 (1992)

    Article  Google Scholar 

  • Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p. 271 (2004)

    Google Scholar 

Download references

Acknowledgements

This work is partially supported by grant from the Natural Science Foundation of China (No. 61277370, 61402075), Natural Science Foundation of Liaoning Province, China (No. 201202031, 2014020003), State Education Ministry and The Research Fund for the Doctoral Program of Higher Education (No. 20090041110002), the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongfei Lin .

Editor information

Editors and Affiliations

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Yang, L., Zhang, S., Lin, H., Wei, X. (2015). Incorporating Sample Filtering into Subject-Based Ensemble Model for Cross-Domain Sentiment Classification. In: Sun, M., Liu, Z., Zhang, M., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. CCL NLP-NABD 2015 2015. Lecture Notes in Computer Science(), vol 9427. Springer, Cham. https://doi.org/10.1007/978-3-319-25816-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25816-4_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25815-7

  • Online ISBN: 978-3-319-25816-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics