skip to main content
research-article

Semi-Supervised Sentiment Classification and Emotion Distribution Learning Across Domains

Published: 27 February 2023 Publication History

Abstract

In this study, sentiment classification and emotion distribution learning across domains are both formulated as a semi-supervised domain adaptation problem, which utilizes a small amount of labeled documents in the target domain for model training. By introducing a shared matrix that captures the stable association between document clusters and word clusters, non-negative matrix tri-factorization (NMTF) is robust to the labeled target domain data and has shown remarkable performance in cross-domain text classification. However, the existing NMTF-based models ignore the incompatible relationship of sentiment polarities and the relatedness among emotions. Besides, their applications on large-scale datasets are limited by the high computation complexity. To address these issues, we propose a semi-supervised NMTF framework for sentiment classification and emotion distribution learning across domains. Based on a many-to-many mapping between document clusters and sentiment polarities (or emotions), we first incorporate the prior information of label dependency to improve the model performance. Then, we develop a parallel algorithm based on message passing interface (MPI) to further enhance the model scalability. Extensive experiments on real-world datasets validate the effectiveness of our method.

References

[1]
S. Bao, S. Xu, L. Zhang, R. Yan, Z. Su, D. Han, and Y. Yu. 2012. Mining social emotions from affective text. IEEE Transactions on Knowledge and Data Engineering 24, 9 (2012), 1658–1670.
[2]
J. Blitzer, M. Dredze, and F. Pereira. 2007. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. 440–447.
[3]
J. Blitzer, R. McDonald, and F. Pereira. 2006. Domain adaptation with structural correspondence learning. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. 120–128.
[4]
L. A. M. Bostan and R. Klinger. 2019. Exploring fine-tuned embeddings that model intensifiers for emotion analysis. In Proceedings of the 10th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 25–34.
[5]
Yufu Chen, Zhiqi Lei, Yanghui Rao, Haoran Xie, Fu Lee Wang, Jian Yin, and Qing Li. 2022. Parallel non-negative matrix tri-factorization for text data co-clustering. IEEE Transactions on Knowledge and Data Engineering (2022), 1–15. DOI:
[6]
S. Chhabra, P. Majumdar, M. Vatsa, and R. Singh. 2019. Data fine-tuning. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 8223–8230.
[7]
A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39, 1 (1977), 1–38.
[8]
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171–4186.
[9]
C. H. Q. Ding, T. Li, W. Peng, and H. Park. 2006. Orthogonal nonnegative matrix tri-factorizations for clustering. In Proceedings of the 12th International Conference on Knowledge Discovery and Data Mining. 126–135.
[10]
X. Glorot, A. Bordes, and Y. Bengio. 2011. Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proceedings of the 28th International Conference on Machine Learning. 513–520.
[11]
A. Go, R. Bhayani, and L. Huang. 2009. Twitter Sentiment Classification Using Distant Supervision. CS224N Project Report. Stanford.
[12]
R. He, W. S. Lee, H. T. Ng, and D. Dahlmeier. 2018. Adaptive semi-supervised learning for cross-domain sentiment classification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3467–3476.
[13]
J. Howard and S. Ruder. 2018. Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 328–339.
[14]
M. Hu and B. Liu. 2004. Mining and summarizing customer reviews. In Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining. 168–177.
[15]
R. Kannan, G. Ballard, and H. Park. 2018. MPI-FAUN: An MPI-based framework for alternating-updating nonnegative matrix factorization. IEEE Transactions on Knowledge and Data Engineering 30, 3 (2018), 544–558.
[16]
P. Katz, M. Singleton, and R. Wicentowski. 2007. Swat-mp: The semeval-2007 systems for task 5 and task 14. In Proceedings of the 4th International Workshop on Semantic Evaluations. 308–313.
[17]
S. Kiritchenko and S. M. Mohammad. 2016. The effect of negators, modals, and degree adverbs on sentiment composition. In Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 43–52.
[18]
S. Kiritchenko, X. Zhu, and S. M. Mohammad. 2014. Sentiment analysis of short informal texts. Journal of Artificial Intelligence Research 50 (2014), 723–762.
[19]
Z. Li, Y. Wei, Y. Zhang, and Q. Yang. 2018. Hierarchical attention transfer network for cross-domain sentiment classification. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 5852–5859.
[20]
K. H.-Y. Lin and H.-H. Chen. 2008. Ranking reader emotions using pairwise loss minimization and emotional distribution regression. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing. 136–144.
[21]
Yu Meng, Jiaxin Huang, Guangyuan Wang, Zihan Wang, Chao Zhang, Yu Zhang, and Jiawei Han. 2020. Discriminative topic mining via category-name guided text embedding. In Proceedings of the Web Conference. 2121–2132.
[22]
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems. 3111–3119.
[23]
K. Nigam, A. K. McCallum, S. Thrun, and T. Mitchell. 2000. Text classification from labeled and unlabeled documents using EM. Machine Learning 39, 2–3 (2000), 103–134.
[24]
B. Ohana and B. Tierney. 2009. Sentiment classification of reviews using SentiWordNet. In Proceedings of the 9th IT & T Conference. 13.
[25]
S. J. Pan, X. Ni, J. T. Sun, Q. Yang, and Z. Chen. 2010. Cross-domain sentiment classification via spectral feature alignment. In Proceedings of the 19th International Conference on World Wide Web. 751–760.
[26]
B. Pang and L. Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval 2, 1–2 (2008), 1–135.
[27]
B. Pang, L. Lee, and S. Vaithyanathan. 2002. Thumbs up?: Sentiment classification using machine learning techniques. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing. 79–86.
[28]
M. Peng, Q. Zhang, Y.-G. Jiang, and X. Huang. 2018. Cross-domain sentiment classification with target domain specific information. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2505–2513.
[29]
J. Pennington, R. Socher, and C. D. Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 1532–1543.
[30]
X. Qin, Y. Chen, Y. Rao, H. Xie, M. L. Wong, and F. L. Wang. 2021. A constrained optimization approach for cross-domain emotion distribution learning. Knowledge-Based Systems 227 (2021), 107160.
[31]
Xiaorui Qin, Yuyin Lu, Yufu Chen, and Yanghui Rao. 2021. Lifelong learning of topics and domain-specific word embeddings. In Proceedings of the Findings of the Association for Computational Linguistics. 2294–2309.
[32]
X. Qu, Z. Zou, Y. Cheng, Y. Yang, and P. Zhou. 2019. Adversarial category alignment network for cross-domain sentiment classification. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2496–2508.
[33]
X. Quan, Q. Wang, Y. Zhang, L. Si, and W. Liu. 2015. Latent discriminative models for social emotion detection with emotional dependency. ACM Transactions on Information Systems 34, 1 (2015), 2:1–2:19.
[34]
D. Ramage, D. Hall, R. Nallapati, and C. D. Manning. 2009. Labeled LDA: A supervised topic model for credit attribution in multi-label corpora. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 248–256.
[35]
Y. Rao. 2016. Contextual sentiment topic model for adaptive social emotion classification. IEEE Intelligent Systems 31, 1 (2016), 41–47.
[36]
Y. Rao, Q. Li, X. Mao, and L. Wenyin. 2014. Sentiment topic models for social emotion mining. Information Sciences 266 (2014), 90–100.
[37]
S. Rosenthal, P. Nakov, S. Kiritchenko, S. Mohammad, A. Ritter, and V. Stoyanov. 2015. SemEval-2015 task 10: Sentiment analysis in Twitter. In Proceedings of the 9th International Workshop on Semantic Evaluation. 451–463.
[38]
S. S. Shapiro and M. B. Wilk. 1965. An analysis of variance test for normality (complete samples). Biometrika 52, 3/4 (1965), 591–611.
[39]
R. Sharma, P. Bhattacharyya, S. Dandapat, and H. S. Bhatt. 2018. Identifying transferable information across domains for cross-domain sentiment classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 968–978.
[40]
C. Strapparava and R. Mihalcea. 2007. Semeval-2007 task 14: Affective text. In Proceedings of the 4th International Workshop on Semantic Evaluations. 70–74.
[41]
B. Tan, Y. Song, E. Zhong, and Q. Yang. 2015. Transitive transfer learning. In Proceedings of the 21st International Conference on Knowledge Discovery and Data Mining. 1155–1164.
[42]
C. Wang and B. Wang. 2020. An end-to-end topic-enhanced self-attention network for social emotion classification. In Proceedings of the Web Conference. 2210–2219.
[43]
C. Wang, B. Wang, W. Xiang, and M. Xu. 2019. Encoding syntactic dependency and topical information for social emotion classification. In Proceedings of the 42nd International Conference on Research & Development in Information Retrieval. 881–884.
[44]
Y. Wang and A. Pal. 2015. Detecting emotions in social media: A constrained optimization approach. In Proceedings of the 24th International Joint Conference on Artificial Intelligence. 996–1002.
[45]
Y.-X. Wang and Y.-J. Zhang. 2013. Nonnegative matrix factorization: A comprehensive review. IEEE Transactions on Knowledge and Data Engineering 25, 6 (2013), 1336–1353.
[46]
T. Wilson, J. Wiebe, and P. Hoffmann. 2009. Recognizing contextual polarity: An exploration of features for phrase-level sentiment analysis. Computational Linguistics 35, 3 (2009), 399–433.
[47]
R. Xia, C. Zong, X. Hu, and E. Cambria. 2013. Feature ensemble plus sample selection: Domain adaptation for sentiment analysis. IEEE Intelligent Systems 28, 3 (2013), 10–18.
[48]
Q. Xue, W. Zhang, and H. Zha. 2020. Improving domain-adapted sentiment classification by deep adversarial mutual learning. In Proceedings of the 34th AAAI Conference on Artificial Intelligence. 9362–9369.
[49]
K. Zhang, H. Zhang, Q. Liu, H. Zhao, H. Zhu, and E. Chen. 2019. Interactive attention transfer network for cross-domain sentiment classification. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 5773–5780.
[50]
Y. Zhang, J. Fu, D. She, Y. Zhang, S. Wang, and J. Yang. 2018. Text emotion distribution learning via multi-task convolutional neural network. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 4595–4601.
[51]
Y. Zhang, N. Zhang, L. Si, Y. Lu, Q. Wang, and X. Yuan. 2014. Cross-domain and cross-category emotion tagging for comments of online news. In Proceedings of the 37th International Conference on Research & Development in Information Retrieval. 627–636.
[52]
Z. Zhao and X. Ma. 2019. Text emotion distribution learning from small sample: A meta-learning approach. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 3955–3965.
[53]
D. Zhou, X. Zhang, Y. Zhou, Q. Zhao, and X. Geng. 2016. Emotion distribution learning from texts. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 638–647.
[54]
E. Zhu, Y. Rao, H. Xie, Y. Liu, J. Yin, and F. L. Wang. 2017. Cluster-level emotion pattern matching for cross-domain social emotion classification. In Proceedings of the 2017 Conference on Information and Knowledge Management. 2435–2438.
[55]
F. Zhuang, P. Luo, C. Du, Q. He, and Z. Shi. 2013. Triplex transfer learning: Exploiting both shared and distinct concepts for text classification. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining. 425–434.
[56]
F. Zhuang, P. Luo, H. Xiong, Q. He, Y. Xiong, and Z. Shi. 2010. Exploiting associations between word clusters and document classes for cross-domain text categorization. In Proceedings of the SIAM International Conference on Data Mining. 13–24.

Cited By

View all
  • (2023)Enhancing scenic recommendation and tour route personalization in tourism using UGC text miningApplied Intelligence10.1007/s10489-023-05244-654:1(1063-1098)Online publication date: 29-Dec-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 17, Issue 5
June 2023
386 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3583066
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 February 2023
Online AM: 18 November 2022
Accepted: 08 November 2022
Revised: 08 November 2022
Received: 13 July 2021
Published in TKDD Volume 17, Issue 5

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Semi-supervised learning
  2. sentiment classification
  3. emotion distribution learning
  4. non-negative matrix tri-factorization
  5. label dependency

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China
  • Lam Woo Research Fund
  • Faculty Research Grants
  • Research Grants Council of the HKSAR, China
  • City University of Hong Kong SRG
  • National Natural Science Foundation of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)103
  • Downloads (Last 6 weeks)6
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Enhancing scenic recommendation and tour route personalization in tourism using UGC text miningApplied Intelligence10.1007/s10489-023-05244-654:1(1063-1098)Online publication date: 29-Dec-2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media