research-article

Semi-Supervised Sentiment Classification and Emotion Distribution Learning Across Domains

Authors:

Raymond Y. K. Lau,

Jian YinAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data, Volume 17, Issue 5

Article No.: 74, Pages 1 - 30

https://doi.org/10.1145/3571736

Published: 27 February 2023 Publication History

Abstract

In this study, sentiment classification and emotion distribution learning across domains are both formulated as a semi-supervised domain adaptation problem, which utilizes a small amount of labeled documents in the target domain for model training. By introducing a shared matrix that captures the stable association between document clusters and word clusters, non-negative matrix tri-factorization (NMTF) is robust to the labeled target domain data and has shown remarkable performance in cross-domain text classification. However, the existing NMTF-based models ignore the incompatible relationship of sentiment polarities and the relatedness among emotions. Besides, their applications on large-scale datasets are limited by the high computation complexity. To address these issues, we propose a semi-supervised NMTF framework for sentiment classification and emotion distribution learning across domains. Based on a many-to-many mapping between document clusters and sentiment polarities (or emotions), we first incorporate the prior information of label dependency to improve the model performance. Then, we develop a parallel algorithm based on message passing interface (MPI) to further enhance the model scalability. Extensive experiments on real-world datasets validate the effectiveness of our method.

References

[1]

S. Bao, S. Xu, L. Zhang, R. Yan, Z. Su, D. Han, and Y. Yu. 2012. Mining social emotions from affective text. IEEE Transactions on Knowledge and Data Engineering 24, 9 (2012), 1658–1670.

Digital Library

[2]

J. Blitzer, M. Dredze, and F. Pereira. 2007. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. 440–447.

[3]

J. Blitzer, R. McDonald, and F. Pereira. 2006. Domain adaptation with structural correspondence learning. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. 120–128.

[4]

L. A. M. Bostan and R. Klinger. 2019. Exploring fine-tuned embeddings that model intensifiers for emotion analysis. In Proceedings of the 10th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 25–34.

[5]

Yufu Chen, Zhiqi Lei, Yanghui Rao, Haoran Xie, Fu Lee Wang, Jian Yin, and Qing Li. 2022. Parallel non-negative matrix tri-factorization for text data co-clustering. IEEE Transactions on Knowledge and Data Engineering (2022), 1–15. DOI:

[6]

S. Chhabra, P. Majumdar, M. Vatsa, and R. Singh. 2019. Data fine-tuning. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 8223–8230.

Digital Library

[7]

A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39, 1 (1977), 1–38.

[8]

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171–4186.

[9]

C. H. Q. Ding, T. Li, W. Peng, and H. Park. 2006. Orthogonal nonnegative matrix tri-factorizations for clustering. In Proceedings of the 12th International Conference on Knowledge Discovery and Data Mining. 126–135.

Digital Library

[10]

X. Glorot, A. Bordes, and Y. Bengio. 2011. Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proceedings of the 28th International Conference on Machine Learning. 513–520.

Digital Library

[11]

A. Go, R. Bhayani, and L. Huang. 2009. Twitter Sentiment Classification Using Distant Supervision. CS224N Project Report. Stanford.

[12]

R. He, W. S. Lee, H. T. Ng, and D. Dahlmeier. 2018. Adaptive semi-supervised learning for cross-domain sentiment classification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3467–3476.

[13]

J. Howard and S. Ruder. 2018. Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 328–339.

[14]

M. Hu and B. Liu. 2004. Mining and summarizing customer reviews. In Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining. 168–177.

Digital Library

[15]

R. Kannan, G. Ballard, and H. Park. 2018. MPI-FAUN: An MPI-based framework for alternating-updating nonnegative matrix factorization. IEEE Transactions on Knowledge and Data Engineering 30, 3 (2018), 544–558.

[16]

P. Katz, M. Singleton, and R. Wicentowski. 2007. Swat-mp: The semeval-2007 systems for task 5 and task 14. In Proceedings of the 4th International Workshop on Semantic Evaluations. 308–313.

[17]

S. Kiritchenko and S. M. Mohammad. 2016. The effect of negators, modals, and degree adverbs on sentiment composition. In Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 43–52.

[18]

S. Kiritchenko, X. Zhu, and S. M. Mohammad. 2014. Sentiment analysis of short informal texts. Journal of Artificial Intelligence Research 50 (2014), 723–762.

[19]

Z. Li, Y. Wei, Y. Zhang, and Q. Yang. 2018. Hierarchical attention transfer network for cross-domain sentiment classification. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 5852–5859.

[20]

K. H.-Y. Lin and H.-H. Chen. 2008. Ranking reader emotions using pairwise loss minimization and emotional distribution regression. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing. 136–144.

[21]

Yu Meng, Jiaxin Huang, Guangyuan Wang, Zihan Wang, Chao Zhang, Yu Zhang, and Jiawei Han. 2020. Discriminative topic mining via category-name guided text embedding. In Proceedings of the Web Conference. 2121–2132.

Digital Library

[22]

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems. 3111–3119.

Digital Library

[23]

K. Nigam, A. K. McCallum, S. Thrun, and T. Mitchell. 2000. Text classification from labeled and unlabeled documents using EM. Machine Learning 39, 2–3 (2000), 103–134.

Digital Library

[24]

B. Ohana and B. Tierney. 2009. Sentiment classification of reviews using SentiWordNet. In Proceedings of the 9th IT & T Conference. 13.

[25]

S. J. Pan, X. Ni, J. T. Sun, Q. Yang, and Z. Chen. 2010. Cross-domain sentiment classification via spectral feature alignment. In Proceedings of the 19th International Conference on World Wide Web. 751–760.

Digital Library

[26]

B. Pang and L. Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval 2, 1–2 (2008), 1–135.

Digital Library

[27]

B. Pang, L. Lee, and S. Vaithyanathan. 2002. Thumbs up?: Sentiment classification using machine learning techniques. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing. 79–86.

Digital Library

[28]

M. Peng, Q. Zhang, Y.-G. Jiang, and X. Huang. 2018. Cross-domain sentiment classification with target domain specific information. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2505–2513.

[29]

J. Pennington, R. Socher, and C. D. Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 1532–1543.

[30]

X. Qin, Y. Chen, Y. Rao, H. Xie, M. L. Wong, and F. L. Wang. 2021. A constrained optimization approach for cross-domain emotion distribution learning. Knowledge-Based Systems 227 (2021), 107160.

Digital Library

[31]

Xiaorui Qin, Yuyin Lu, Yufu Chen, and Yanghui Rao. 2021. Lifelong learning of topics and domain-specific word embeddings. In Proceedings of the Findings of the Association for Computational Linguistics. 2294–2309.

[32]

X. Qu, Z. Zou, Y. Cheng, Y. Yang, and P. Zhou. 2019. Adversarial category alignment network for cross-domain sentiment classification. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2496–2508.

[33]

X. Quan, Q. Wang, Y. Zhang, L. Si, and W. Liu. 2015. Latent discriminative models for social emotion detection with emotional dependency. ACM Transactions on Information Systems 34, 1 (2015), 2:1–2:19.

Digital Library

[34]

D. Ramage, D. Hall, R. Nallapati, and C. D. Manning. 2009. Labeled LDA: A supervised topic model for credit attribution in multi-label corpora. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 248–256.

[35]

Y. Rao. 2016. Contextual sentiment topic model for adaptive social emotion classification. IEEE Intelligent Systems 31, 1 (2016), 41–47.

Digital Library

[36]

Y. Rao, Q. Li, X. Mao, and L. Wenyin. 2014. Sentiment topic models for social emotion mining. Information Sciences 266 (2014), 90–100.

Digital Library

[37]

S. Rosenthal, P. Nakov, S. Kiritchenko, S. Mohammad, A. Ritter, and V. Stoyanov. 2015. SemEval-2015 task 10: Sentiment analysis in Twitter. In Proceedings of the 9th International Workshop on Semantic Evaluation. 451–463.

[38]

S. S. Shapiro and M. B. Wilk. 1965. An analysis of variance test for normality (complete samples). Biometrika 52, 3/4 (1965), 591–611.

[39]

R. Sharma, P. Bhattacharyya, S. Dandapat, and H. S. Bhatt. 2018. Identifying transferable information across domains for cross-domain sentiment classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 968–978.

[40]

C. Strapparava and R. Mihalcea. 2007. Semeval-2007 task 14: Affective text. In Proceedings of the 4th International Workshop on Semantic Evaluations. 70–74.

[41]

B. Tan, Y. Song, E. Zhong, and Q. Yang. 2015. Transitive transfer learning. In Proceedings of the 21st International Conference on Knowledge Discovery and Data Mining. 1155–1164.

Digital Library

[42]

C. Wang and B. Wang. 2020. An end-to-end topic-enhanced self-attention network for social emotion classification. In Proceedings of the Web Conference. 2210–2219.

Digital Library

[43]

C. Wang, B. Wang, W. Xiang, and M. Xu. 2019. Encoding syntactic dependency and topical information for social emotion classification. In Proceedings of the 42nd International Conference on Research & Development in Information Retrieval. 881–884.

Digital Library

[44]

Y. Wang and A. Pal. 2015. Detecting emotions in social media: A constrained optimization approach. In Proceedings of the 24th International Joint Conference on Artificial Intelligence. 996–1002.

[45]

Y.-X. Wang and Y.-J. Zhang. 2013. Nonnegative matrix factorization: A comprehensive review. IEEE Transactions on Knowledge and Data Engineering 25, 6 (2013), 1336–1353.

Digital Library

[46]

T. Wilson, J. Wiebe, and P. Hoffmann. 2009. Recognizing contextual polarity: An exploration of features for phrase-level sentiment analysis. Computational Linguistics 35, 3 (2009), 399–433.

Digital Library

[47]

R. Xia, C. Zong, X. Hu, and E. Cambria. 2013. Feature ensemble plus sample selection: Domain adaptation for sentiment analysis. IEEE Intelligent Systems 28, 3 (2013), 10–18.

Digital Library

[48]

Q. Xue, W. Zhang, and H. Zha. 2020. Improving domain-adapted sentiment classification by deep adversarial mutual learning. In Proceedings of the 34th AAAI Conference on Artificial Intelligence. 9362–9369.

[49]

K. Zhang, H. Zhang, Q. Liu, H. Zhao, H. Zhu, and E. Chen. 2019. Interactive attention transfer network for cross-domain sentiment classification. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 5773–5780.

Digital Library

[50]

Y. Zhang, J. Fu, D. She, Y. Zhang, S. Wang, and J. Yang. 2018. Text emotion distribution learning via multi-task convolutional neural network. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 4595–4601.

Digital Library

[51]

Y. Zhang, N. Zhang, L. Si, Y. Lu, Q. Wang, and X. Yuan. 2014. Cross-domain and cross-category emotion tagging for comments of online news. In Proceedings of the 37th International Conference on Research & Development in Information Retrieval. 627–636.

Digital Library

[52]

Z. Zhao and X. Ma. 2019. Text emotion distribution learning from small sample: A meta-learning approach. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 3955–3965.

[53]

D. Zhou, X. Zhang, Y. Zhou, Q. Zhao, and X. Geng. 2016. Emotion distribution learning from texts. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 638–647.

[54]

E. Zhu, Y. Rao, H. Xie, Y. Liu, J. Yin, and F. L. Wang. 2017. Cluster-level emotion pattern matching for cross-domain social emotion classification. In Proceedings of the 2017 Conference on Information and Knowledge Management. 2435–2438.

Digital Library

[55]

F. Zhuang, P. Luo, C. Du, Q. He, and Z. Shi. 2013. Triplex transfer learning: Exploiting both shared and distinct concepts for text classification. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining. 425–434.

Digital Library

[56]

F. Zhuang, P. Luo, H. Xiong, Q. He, Y. Xiong, and Z. Shi. 2010. Exploiting associations between word clusters and document classes for cross-domain text categorization. In Proceedings of the SIAM International Conference on Data Mining. 13–24.

Cited By

Liang KLiu HShan MZhao JLi XZhou L(2023)Enhancing scenic recommendation and tour route personalization in tourism using UGC text miningApplied Intelligence10.1007/s10489-023-05244-654:1(1063-1098)Online publication date: 29-Dec-2023
https://dl.acm.org/doi/10.1007/s10489-023-05244-6

Index Terms

Semi-Supervised Sentiment Classification and Emotion Distribution Learning Across Domains
1. Computing methodologies
  1. Machine learning
    1. Learning settings
      1. Semi-supervised learning settings
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Sentiment analysis

Recommendations

Sentiment labeling for extending initial labeled data to improve semi-supervised sentiment classification

Semi-supervised framework which exploits unsupervised approach (JST) is proposed.Self-training suffers from incorrectly labeling problem with insufficient data.Confidently predicted instances are labeled and used as training data by JST.Self-training ...
Extracting Pseudo-Labeled Samples for Sentiment Classification Using Emotion Keywords
IALP '11: Proceedings of the 2011 International Conference on Asian Language Processing

Sentiment and emotion analysis have been traditionally established as independent research topics in NLP. Although they are two important aspects of subjective information and are closely related, there have been few attempts to combine the two ...
Sentence-level Sentiment Classification with Weak Supervision
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

Sentence-level sentiment classification is important to understand users' fine-grained opinions. Existing methods for sentence-level sentiment classification are mainly based on supervised learning. However, it is difficult to obtain sentiment labels of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 17, Issue 5

June 2023

386 pages

ISSN:1556-4681

EISSN:1556-472X

DOI:10.1145/3583066

Editor:
Charu Aggarwal
IBM T. J. Watson Research, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 February 2023

Online AM: 18 November 2022

Accepted: 08 November 2022

Revised: 08 November 2022

Received: 13 July 2021

Published in TKDD Volume 17, Issue 5

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Lam Woo Research Fund
Faculty Research Grants
Research Grants Council of the HKSAR, China
City University of Hong Kong SRG
National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
378
Total Downloads

Downloads (Last 12 months)103
Downloads (Last 6 weeks)6

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liang KLiu HShan MZhao JLi XZhou L(2023)Enhancing scenic recommendation and tour route personalization in tourism using UGC text miningApplied Intelligence10.1007/s10489-023-05244-654:1(1063-1098)Online publication date: 29-Dec-2023
https://dl.acm.org/doi/10.1007/s10489-023-05244-6

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents