Skip to main content

Advertisement

Log in

Cross-lingual sentiment classification with stacked autoencoders

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Cross-lingual sentiment classification is a popular research topic in natural language processing. The fundamental challenge of cross-lingual learning stems from a lack of overlap between the feature spaces of the source language data and the target language data. In this article, we propose a new model which uses stacked autoencoders to learn language-independent high-level feature representations for the both languages in an unsupervised fashion. The proposed framework aims to force the aligned input bilingual sentences into a common latent space, and the objective function is defined by minimizing the input and output vector representations as well as the distance of the common representations in the latent space. Sentiment classifiers trained on the source language can be adapted to predict sentiment polarity of the target language with the language-independent high-level feature representations. We conduct extensive experiments on English–Chinese sentiment classification tasks of multiple data sets. Our experimental results demonstrate the efficacy of the proposed cross-lingual approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Banea C, Mihalcea R, Wiebe J, Hassan S (2008) Multilingual subjectivity analysis using machine translation. In: Proceedings of the conference on empirical methods in natural language processing, Honolulu, Hawaii, pp 127–135

  2. Bespalov D, Bai B, Qi Y, Shokoufandeh A (2011) Sentiment classification based on supervised latent N-gram analysis. In: Proceedings of the 20th ACM international conference on information and knowledge management, Glasgow, Scotland, UK, pp 375–382

  3. Baccianella S, Esuli A, Sebastiani F (1996) SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: Language resources and evaluation

  4. Choi Y, Cardie C (2008) Learning with compositional semantics as structural inference for subsentential sentiment analysis. In: Proceedings of the conference on empirical methods in natural language processing, Honolulu, Hawaii, pp 793–801

  5. Duh K, Fujino A, Nagata M (2011) Is machine translation ripe for cross-lingual sentiment classification? In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, Portland, OR, pp 429–433

  6. Goldberg B, Zhu X (2006) Seeing stars when there aren’t many stars: graph-based semi-supervised learning for sentiment categorization? In: Proceedings of the first workshop on graph based methods for natural language processing, Stroudsburg, PA, USA, pp 45–52

  7. Glorot X, Bordes A, Bengio Y (2011) Domain adaptation for large-scale sentiment classification: a deep learning approach. In: Proceedings of the twenty-eight international conference on machine learning

  8. Joachims T (1999) Making large-scale support vector machine learning practical. In: Advances in kernel methods, Cambridge, MA, pp 169–184

  9. Kim J, Li J, Lee J (2009) Discovering the discriminative views: measuring term weights for sentiment analysis. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP, Suntec, Singapore, pp 253–261

  10. Klementiev A, Titov I, Bhattarai B (2012) Inducing crosslingual distributed representations of words. In: Proceedings of the international conference on computational linguistics, Bombay, India

  11. Lafferty J (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML, Morgan Kaufmann, pp 282–289

  12. Li S, Wang Z, Zhou G, Lee S (2011) Semi-supervised learning for imbalanced sentiment classification. In: Proceedings of the twenty-second international joint conference on artificial intelligence. Catalonia, Spain, Barcelona, pp 1826–1831

  13. Liu B (2012) Sentiment analysis and opinion mining. In: Synthesis lectures on human language technologies

  14. Lu B, Tan C, Cardie C, Tsou K (2011) Joint bilingual sentiment classification with unlabeled parallel corpora. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, Portland, OR, pp 320–330

  15. Maas L, Daly E, Pham T, Huang D, Ng Y, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meeting of the association for computational linguistics, Portland, OR, pp 142–150

  16. Mejova Y, Padmini S (2011) Exploring feature definition and selection for sentiment classifiers. In: Proceedings of the fifth international AAAI conference on webblogs and social media

  17. Meng X, Wei F, Liu X, Zhou M, Xu G, Wang H (2012) Cross-lingual mixture model for sentiment classification. In: Proceedings of the 50th annual meeting of the association for computational linguistics, Jeju Island, Korea, pp 572–581

  18. Munteanu S, Marcu D (2005) Improving machine translation performance by exploiting non-parallel corpora. Comput Linguist 31(4):477–504

    Article  Google Scholar 

  19. Nakagawa T, Inui K, Kurohashi S (2010) Dependency tree-based sentiment classification using CRFs with hidden variables. In: The 2010 annual conference of the North American chapter of the association for computational linguistics, Los Angeles, CA, pp 786–794

  20. Ng V, Dasgupta S, Arifin S (2006) Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews. In: Proceedings of the COLING/ACL on main conference poster sessions, Sydney, Australia, pp 611–618

  21. Och F, Ney H (2000) Improved statistical alignment models. In: Proceedings of the 38th annual meeting on association for computational linguistics, Hong Kong, pp 440–447

  22. Pan S, Ni X, Sun J, Yang Q, Chen Z (2010) Cross-domain sentiment classification via spectral feature alignment. In: Proceedings of the 19th international conference on World Wide Web, Raleigh, NC, USA, pp 751–760

  23. Pan J, Xue G, Yu Y, Wang Y (2011) Cross-lingual sentiment classification via Bi-view non-negative matrix tri-factorization. In: Proceedings of the 15th Pacific-Asia conference on advances in knowledge discovery and data mining, Shenzhen, China, pp 289–300

  24. Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1):1–135

    Article  Google Scholar 

  25. Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing, Stroudsburg, PA, USA, pp 79–86

  26. Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd annual meeting on association for computational linguistics, Barcelona, Spain

  27. Peng W, Park D (2011) Generate adjective sentiment dictionary for social media sentiment analysis using constrained nonnegative matrix factorization. In: The international conference on weblogs and social media, Barcelona, Spain. The AAAI Press

  28. Prettenhofer P, Stein B (2010) Cross-language text classification using structural correspondence learning. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Uppsala, Sweden, pp 1118–1127

  29. Seki Y, Evans D, Ku L, Chen H, Kando N, Lin C (2007) Overview of opinion analysis pilot task at NTCIR-6. In: Proceedings of the workshop meeting of the national institute of informatics test collection for information retrieval systems (NTCIR)

  30. Seki Y, Evans D, Ku L, Chen H, Kando N, Lin C (2007) Overview of multilingual opinion analysis task at NTCIR-7. In: Proceedings of NTCIR-7

  31. Seki Y, Evans D, Ku L, Chen H, Kando N, Lin C (2004) Mining multilingual opinions through classification and translation. In: AAAI Spring symposium on exploring attitude and affect in text

  32. Silberer C, Lapata M (2014) Learning grounded meaning representations with autoencoders. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, Baltimore, MD, pp 721–732

  33. Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Linguist 37(2):267–307

    Article  Google Scholar 

  34. Tseng H (2005) A conditional random field word segmenter. In: Fourth SIGHAN workshop on Chinese language processing

  35. Turney D (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th annual meeting on association for computational linguistics, Philadelphia, PA, pp 417–424

  36. Vincent P, Larochelle H, Bengio Y, Manzagol P (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning, Helsinki, Finland, pp 1096–1103

  37. Vikas S, Prem M (2008) Document-word co-regularization for semi-supervised sentiment analysis. In: Proceedings of the international conference on data mining

  38. Wan X (2008) Using bilingual knowledge and ensemble techniques for unsupervised Chinese sentiment analysis. In: Proceedings of the conference on empirical methods in natural language processing, Honolulu, Hawaii, pp 553–561

  39. Wan X (2009) Co-training for cross-lingual sentiment classification. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing, Suntec, Singapore, pp 235–243

  40. Wan X (2011) Bilingual co-training for sentiment classification of Chinese product reviews. Comput Linguist 37(3):587–616

    Article  Google Scholar 

  41. Wiebe J, Cardie C (2005) Annotating expressions of opinions and emotions in language. In: Language resources and evaluation, language resources and evaluation (formerly computers and the humanities)

  42. Wu K, Wang X, Lu B (2008) Cross language text categorization using a bilingual lexicon. In: Proceedings of the third international joint conference on natural language processing

  43. Xia R, Zong C (2010) Exploring the use of word relation features for sentiment classification. In: Proceedings of the 23rd international conference on computational linguistics: posters, Beijing, China, pp 1336–1344

  44. Xiao M, Guo Y (2013) Semi-supervised representation learning for cross-lingual text classification. In: Proceedings of the conference on empirical methods on natural language processing, Seattle, USA, pp 1465–1475

  45. Yoshua B, Pascal L, Dan P, Hugo L (2011) Greedy layer-wise training of deep networks. In: Proceedings of the NIPS

  46. Yoshua B (2011) Learning deep architectures for AI. In: Foundations and trends in machine learning, Hanover, MA, USA, pp 1–127

  47. Zhou G, He T, Zhao J (2014) Bridge the language gap: learning distributed semantics for cross-lingual sentiment classification. In: Proceedings of the 3rd international conference on natural language processing and Chinese computing, Shenzhen, China, pp 138–149

  48. Zhou G, Zhao J, Zeng D (2014) Sentiment classification with graph co-regularization. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers, Dublin, Ireland, pp 1331–1340

  49. Zhou G, He T, Zhao J, Wu W (2015) A subspace learning framework for cross-lingual sentiment classification with partial parallel data. In: Proceedings of the international joint conference on artificial intelligence, Buenos Aires

  50. Zou Y, Socher R, Cer M, Manning D (2013) Bilingual word embeddings for phrase-based machine translation. In: Proceedings of the conference on empirical methods on natural language processing, Seattle, USA, pp 1393–1398

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Nos. 61303180, 61272332, 61402191), the Beijing Natural Science Foundation (No. 4144087), the Major Project of National Social Science Found (No. 12&2D223), the Fundamental Research Funds for the Central Universities (No. CCNU15ZD003) and also sponsored by CCF-Tencent Open Research Fund. We thank the anonymous reviewers for their insightful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guangyou Zhou.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, G., Zhu, Z., He, T. et al. Cross-lingual sentiment classification with stacked autoencoders. Knowl Inf Syst 47, 27–44 (2016). https://doi.org/10.1007/s10115-015-0849-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-015-0849-0

Keywords

Navigation