skip to main content
10.1145/2505515.2505556acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

A partially supervised cross-collection topic model for cross-domain text classification

Published: 27 October 2013 Publication History

Abstract

Cross-domain text classification aims to automatically train a precise text classifier for a target domain by using labelled text data from a related source domain. To this end, one of the most promising ideas is to induce a new feature representation so that the distributional difference between domains can be reduced and a more accurate classifier can be learned in this new feature space. However, most existing methods do not explore the duality of the marginal distribution of examples and the conditional distribution of class labels given labeled training examples in the source domain. Besides, few previous works attempt to explicitly distinguish the domain-independent and domain-specific latent features and align the domain-specific features to further improve the cross-domain learning. In this paper, we propose a model called Partially Supervised Cross-Collection LDA topic model (PSCCLDA) for cross-domain learning with the purpose of addressing these two issues in a unified way. Experimental results on nine datasets show that our model outperforms two standard classifiers and four state-of-the-art methods, which demonstrates the effectiveness of our proposed model.

References

[1]
S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. Analysis of representations for domain adaptation. Advances in Neural Information Processing Systems, 19:137, 2007.
[2]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003.
[3]
J. Blitzer, M. Dredze, and F. Pereira. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL '07, pages 440--447, 2007.
[4]
J. Blitzer, R. McDonald, and F. Pereira. Domain adaptation with structural correspondence learning. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, EMNLP '06, pages 120--128, 2006.
[5]
G. Doyle and C. Elkan. Accounting for burstiness in topic models. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pages 281--288, New York, NY, USA, 2009.
[6]
R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin. Liblinear: A library for large linear classification. The Journal of Machine Learning Research, 9:1871--1874, 2008.
[7]
T. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 101(Suppl 1):5228--5235, 2004.
[8]
T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42(1):177--196, 2001.
[9]
L. Hong, B. Dom, S. Gurumurthy, and K. Tsioutsiouliklis. A time-dependent topic model for multiple text streams. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '11, pages 832--840, New York, NY, USA, 2011.
[10]
J. Jing. A literature survey on domain adaptation of statistical classifiers. URL: http://sifaka. cs. uiuc. edu/jiang4/domainadaptation/survey, 2008.
[11]
L. Li, X. Jin, and M. Long. Topic correlation analysis for cross-domain text classification. In Proceedings of the 26th Conference on Artificial Intelligence, AAAI '12, 2012.
[12]
C. Lin, R. Weng, and S. Keerthi. Trust region newton method for logistic regression. The Journal of Machine Learning Research, 9:627--650, 2008.
[13]
M. Long, J. Wang, G. Ding, W. Cheng, X. Zhang, and W. Wang. Dual transfer learning. In Proceedings of the 12th SIAM International Conference on Data Mining, SDM '12, 2012.
[14]
S. Pan, X. Ni, J. Sun, Q. Yang, and Z. Chen. Cross-domain sentiment classification via spectral feature alignment. In Proceedings of the 19th International Conference on World Wide Web, WWW '10, pages 751--760, 2010.
[15]
S. Pan and Q. Yang. A survey on transfer learning. Knowledge and Data Engineering, IEEE Transactions on, 22(10):1345--1359, 2010.
[16]
M. Paul and R. Girju. Cross-cultural analysis of blogs and forums with mixed-collection topic models. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP '09, pages 1408--1417, 2009.
[17]
I. Titov. Domain adaptation by constraining inter-domain variability of latent feature representation. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL '11, pages 62--71, 2011.
[18]
G. Xue, W. Dai, Q. Yang, and Y. Yu. Topic-bridged plsa for cross-domain text classification. In Proceedings of the 31st annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '08, pages 627--634, 2008.
[19]
J. Yoo and S. Choi. Probabilistic matrix tri-factorization. In Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on, pages 1553--1556, 2009.
[20]
C. Zhai, A. Velivelli, and B. Yu. A cross-collection mixture model for comparative text mining. In Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining, KDD '04, pages 743--748, 2004.
[21]
F. Zhuang, P. Luo, Z. Shen, Q. He, Y. Xiong, Z. Shi, and H. Xiong. Collaborative dual-plsa: mining distinction and commonality across multiple domains for text classification. In Proceedings of the 19th International Conference on Information and Knowledge Management, CIKM '10, pages 359--368, 2010.

Cited By

View all
  • (2023)Detecting Symptoms of Depression on RedditProceedings of the 15th ACM Web Science Conference 202310.1145/3578503.3583621(174-183)Online publication date: 30-Apr-2023
  • (2023)Coherent Topic Modeling for Creative Multimodal Data on Social MediaProceedings of the ACM Web Conference 202310.1145/3543507.3587433(3923-3927)Online publication date: 30-Apr-2023
  • (2023)Challenges and Issues in Sentiment Analysis: A Comprehensive SurveyIEEE Access10.1109/ACCESS.2023.329304111(69626-69642)Online publication date: 2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management
October 2013
2612 pages
ISBN:9781450322638
DOI:10.1145/2505515
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cross-domain learning
  2. lda
  3. text classification
  4. topic modeling

Qualifiers

  • Research-article

Conference

CIKM'13
Sponsor:
CIKM'13: 22nd ACM International Conference on Information and Knowledge Management
October 27 - November 1, 2013
California, San Francisco, USA

Acceptance Rates

CIKM '13 Paper Acceptance Rate 143 of 848 submissions, 17%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Detecting Symptoms of Depression on RedditProceedings of the 15th ACM Web Science Conference 202310.1145/3578503.3583621(174-183)Online publication date: 30-Apr-2023
  • (2023)Coherent Topic Modeling for Creative Multimodal Data on Social MediaProceedings of the ACM Web Conference 202310.1145/3543507.3587433(3923-3927)Online publication date: 30-Apr-2023
  • (2023)Challenges and Issues in Sentiment Analysis: A Comprehensive SurveyIEEE Access10.1109/ACCESS.2023.329304111(69626-69642)Online publication date: 2023
  • (2022)A BERT-Based Aspect-Level Sentiment Analysis Algorithm for Cross-Domain TextComputational Intelligence and Neuroscience10.1155/2022/87266212022Online publication date: 1-Jan-2022
  • (2022)Cross-Lingual Knowledge Transferring by Structural Correspondence and Space TransferIEEE Transactions on Cybernetics10.1109/TCYB.2021.305100552:7(6555-6566)Online publication date: Jul-2022
  • (2021)Lost in Transduction: Transductive Transfer Learning in Text ClassificationACM Transactions on Knowledge Discovery from Data10.1145/345314616:1(1-21)Online publication date: 20-Jul-2021
  • (2021)Coarse Alignment of Topic and Sentiment: A Unified Model for Cross-Lingual Sentiment ClassificationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2020.297922532:2(736-747)Online publication date: Feb-2021
  • (2020)A Micro Perspective of Research Dynamics Through “Citations of Citations” Topic AnalysisJournal of Data and Information Science10.2478/jdis-2020-00345:4(19-34)Online publication date: 28-Jul-2020
  • (2020)Knowledge-Based Topic Model for Multi-Modal Social Event AnalysisIEEE Transactions on Multimedia10.1109/TMM.2019.295119422:8(2098-2110)Online publication date: Aug-2020
  • (2019)Multi-modal max-margin supervised topic model for social event analysisMultimedia Tools and Applications10.1007/s11042-017-5605-x78:1(141-160)Online publication date: 1-Jan-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media