Cross-lingual sentiment classification with stacked autoencoders

Zhou, Guangyou; Zhu, Zhiyuan; He, Tingting; Hu, Xiaohua Tony

doi:10.1007/s10115-015-0849-0

Cross-lingual sentiment classification with stacked autoencoders

Regular Paper
Published: 11 June 2015

Volume 47, pages 27–44, (2016)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Guangyou Zhou¹,
Zhiyuan Zhu²,
Tingting He¹ &
…
Xiaohua Tony Hu^1,3

1007 Accesses
12 Altmetric
Explore all metrics

Abstract

Cross-lingual sentiment classification is a popular research topic in natural language processing. The fundamental challenge of cross-lingual learning stems from a lack of overlap between the feature spaces of the source language data and the target language data. In this article, we propose a new model which uses stacked autoencoders to learn language-independent high-level feature representations for the both languages in an unsupervised fashion. The proposed framework aims to force the aligned input bilingual sentences into a common latent space, and the objective function is defined by minimizing the input and output vector representations as well as the distance of the common representations in the latent space. Sentiment classifiers trained on the source language can be adapted to predict sentiment polarity of the target language with the language-independent high-level feature representations. We conduct extensive experiments on English–Chinese sentiment classification tasks of multiple data sets. Our experimental results demonstrate the efficacy of the proposed cross-lingual approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bidirectional transfer learning model for sentiment analysis of natural language

Article 02 January 2021

A New Method for Sentiment Analysis Using Contextual Auto-Encoders

Article 19 November 2018

A sentiment analysis model based on dynamic pre-training and stacked involutions

Article 04 April 2024

References

Banea C, Mihalcea R, Wiebe J, Hassan S (2008) Multilingual subjectivity analysis using machine translation. In: Proceedings of the conference on empirical methods in natural language processing, Honolulu, Hawaii, pp 127–135
Bespalov D, Bai B, Qi Y, Shokoufandeh A (2011) Sentiment classification based on supervised latent N-gram analysis. In: Proceedings of the 20th ACM international conference on information and knowledge management, Glasgow, Scotland, UK, pp 375–382
Baccianella S, Esuli A, Sebastiani F (1996) SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: Language resources and evaluation
Choi Y, Cardie C (2008) Learning with compositional semantics as structural inference for subsentential sentiment analysis. In: Proceedings of the conference on empirical methods in natural language processing, Honolulu, Hawaii, pp 793–801
Duh K, Fujino A, Nagata M (2011) Is machine translation ripe for cross-lingual sentiment classification? In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, Portland, OR, pp 429–433
Goldberg B, Zhu X (2006) Seeing stars when there aren’t many stars: graph-based semi-supervised learning for sentiment categorization? In: Proceedings of the first workshop on graph based methods for natural language processing, Stroudsburg, PA, USA, pp 45–52
Glorot X, Bordes A, Bengio Y (2011) Domain adaptation for large-scale sentiment classification: a deep learning approach. In: Proceedings of the twenty-eight international conference on machine learning
Joachims T (1999) Making large-scale support vector machine learning practical. In: Advances in kernel methods, Cambridge, MA, pp 169–184
Kim J, Li J, Lee J (2009) Discovering the discriminative views: measuring term weights for sentiment analysis. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP, Suntec, Singapore, pp 253–261
Klementiev A, Titov I, Bhattarai B (2012) Inducing crosslingual distributed representations of words. In: Proceedings of the international conference on computational linguistics, Bombay, India
Lafferty J (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML, Morgan Kaufmann, pp 282–289
Li S, Wang Z, Zhou G, Lee S (2011) Semi-supervised learning for imbalanced sentiment classification. In: Proceedings of the twenty-second international joint conference on artificial intelligence. Catalonia, Spain, Barcelona, pp 1826–1831
Liu B (2012) Sentiment analysis and opinion mining. In: Synthesis lectures on human language technologies
Lu B, Tan C, Cardie C, Tsou K (2011) Joint bilingual sentiment classification with unlabeled parallel corpora. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, Portland, OR, pp 320–330
Maas L, Daly E, Pham T, Huang D, Ng Y, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meeting of the association for computational linguistics, Portland, OR, pp 142–150
Mejova Y, Padmini S (2011) Exploring feature definition and selection for sentiment classifiers. In: Proceedings of the fifth international AAAI conference on webblogs and social media
Meng X, Wei F, Liu X, Zhou M, Xu G, Wang H (2012) Cross-lingual mixture model for sentiment classification. In: Proceedings of the 50th annual meeting of the association for computational linguistics, Jeju Island, Korea, pp 572–581
Munteanu S, Marcu D (2005) Improving machine translation performance by exploiting non-parallel corpora. Comput Linguist 31(4):477–504
Article Google Scholar
Nakagawa T, Inui K, Kurohashi S (2010) Dependency tree-based sentiment classification using CRFs with hidden variables. In: The 2010 annual conference of the North American chapter of the association for computational linguistics, Los Angeles, CA, pp 786–794
Ng V, Dasgupta S, Arifin S (2006) Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews. In: Proceedings of the COLING/ACL on main conference poster sessions, Sydney, Australia, pp 611–618
Och F, Ney H (2000) Improved statistical alignment models. In: Proceedings of the 38th annual meeting on association for computational linguistics, Hong Kong, pp 440–447
Pan S, Ni X, Sun J, Yang Q, Chen Z (2010) Cross-domain sentiment classification via spectral feature alignment. In: Proceedings of the 19th international conference on World Wide Web, Raleigh, NC, USA, pp 751–760
Pan J, Xue G, Yu Y, Wang Y (2011) Cross-lingual sentiment classification via Bi-view non-negative matrix tri-factorization. In: Proceedings of the 15th Pacific-Asia conference on advances in knowledge discovery and data mining, Shenzhen, China, pp 289–300
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1):1–135
Article Google Scholar
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing, Stroudsburg, PA, USA, pp 79–86
Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd annual meeting on association for computational linguistics, Barcelona, Spain
Peng W, Park D (2011) Generate adjective sentiment dictionary for social media sentiment analysis using constrained nonnegative matrix factorization. In: The international conference on weblogs and social media, Barcelona, Spain. The AAAI Press
Prettenhofer P, Stein B (2010) Cross-language text classification using structural correspondence learning. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Uppsala, Sweden, pp 1118–1127
Seki Y, Evans D, Ku L, Chen H, Kando N, Lin C (2007) Overview of opinion analysis pilot task at NTCIR-6. In: Proceedings of the workshop meeting of the national institute of informatics test collection for information retrieval systems (NTCIR)
Seki Y, Evans D, Ku L, Chen H, Kando N, Lin C (2007) Overview of multilingual opinion analysis task at NTCIR-7. In: Proceedings of NTCIR-7
Seki Y, Evans D, Ku L, Chen H, Kando N, Lin C (2004) Mining multilingual opinions through classification and translation. In: AAAI Spring symposium on exploring attitude and affect in text
Silberer C, Lapata M (2014) Learning grounded meaning representations with autoencoders. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, Baltimore, MD, pp 721–732
Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Linguist 37(2):267–307
Article Google Scholar
Tseng H (2005) A conditional random field word segmenter. In: Fourth SIGHAN workshop on Chinese language processing
Turney D (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th annual meeting on association for computational linguistics, Philadelphia, PA, pp 417–424
Vincent P, Larochelle H, Bengio Y, Manzagol P (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning, Helsinki, Finland, pp 1096–1103
Vikas S, Prem M (2008) Document-word co-regularization for semi-supervised sentiment analysis. In: Proceedings of the international conference on data mining
Wan X (2008) Using bilingual knowledge and ensemble techniques for unsupervised Chinese sentiment analysis. In: Proceedings of the conference on empirical methods in natural language processing, Honolulu, Hawaii, pp 553–561
Wan X (2009) Co-training for cross-lingual sentiment classification. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing, Suntec, Singapore, pp 235–243
Wan X (2011) Bilingual co-training for sentiment classification of Chinese product reviews. Comput Linguist 37(3):587–616
Article Google Scholar
Wiebe J, Cardie C (2005) Annotating expressions of opinions and emotions in language. In: Language resources and evaluation, language resources and evaluation (formerly computers and the humanities)
Wu K, Wang X, Lu B (2008) Cross language text categorization using a bilingual lexicon. In: Proceedings of the third international joint conference on natural language processing
Xia R, Zong C (2010) Exploring the use of word relation features for sentiment classification. In: Proceedings of the 23rd international conference on computational linguistics: posters, Beijing, China, pp 1336–1344
Xiao M, Guo Y (2013) Semi-supervised representation learning for cross-lingual text classification. In: Proceedings of the conference on empirical methods on natural language processing, Seattle, USA, pp 1465–1475
Yoshua B, Pascal L, Dan P, Hugo L (2011) Greedy layer-wise training of deep networks. In: Proceedings of the NIPS
Yoshua B (2011) Learning deep architectures for AI. In: Foundations and trends in machine learning, Hanover, MA, USA, pp 1–127
Zhou G, He T, Zhao J (2014) Bridge the language gap: learning distributed semantics for cross-lingual sentiment classification. In: Proceedings of the 3rd international conference on natural language processing and Chinese computing, Shenzhen, China, pp 138–149
Zhou G, Zhao J, Zeng D (2014) Sentiment classification with graph co-regularization. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers, Dublin, Ireland, pp 1331–1340
Zhou G, He T, Zhao J, Wu W (2015) A subspace learning framework for cross-lingual sentiment classification with partial parallel data. In: Proceedings of the international joint conference on artificial intelligence, Buenos Aires
Zou Y, Socher R, Cer M, Manning D (2013) Bilingual word embeddings for phrase-based machine translation. In: Proceedings of the conference on empirical methods on natural language processing, Seattle, USA, pp 1393–1398

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Nos. 61303180, 61272332, 61402191), the Beijing Natural Science Foundation (No. 4144087), the Major Project of National Social Science Found (No. 12&2D223), the Fundamental Research Funds for the Central Universities (No. CCNU15ZD003) and also sponsored by CCF-Tencent Open Research Fund. We thank the anonymous reviewers for their insightful comments.

Author information

Authors and Affiliations

School of Computer, Central China Normal University, Wuhan, 430079, China
Guangyou Zhou, Tingting He & Xiaohua Tony Hu
Chinese Institute of Electronics, Beijing, 100036, China
Zhiyuan Zhu
College of Computing and Informatics, Drexel University, Philadelphia, PA, 19104, USA
Xiaohua Tony Hu

Authors

Guangyou Zhou
View author publications
You can also search for this author inPubMed Google Scholar
Zhiyuan Zhu
View author publications
You can also search for this author inPubMed Google Scholar
Tingting He
View author publications
You can also search for this author inPubMed Google Scholar
Xiaohua Tony Hu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Guangyou Zhou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, G., Zhu, Z., He, T. et al. Cross-lingual sentiment classification with stacked autoencoders. Knowl Inf Syst 47, 27–44 (2016). https://doi.org/10.1007/s10115-015-0849-0

Download citation

Received: 08 May 2014
Revised: 15 March 2015
Accepted: 30 May 2015
Published: 11 June 2015
Issue Date: April 2016
DOI: https://doi.org/10.1007/s10115-015-0849-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross-lingual sentiment classification with stacked autoencoders

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Bidirectional transfer learning model for sentiment analysis of natural language

A New Method for Sentiment Analysis Using Contextual Auto-Encoders

A sentiment analysis model based on dynamic pre-training and stacked involutions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now