skip to main content
10.1145/1871437.1871486acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Collaborative Dual-PLSA: mining distinction and commonality across multiple domains for text classification

Published: 26 October 2010 Publication History

Abstract

The distribution difference among multiple data domains has been considered for the cross-domain text classification problem. In this study, we show two new observations along this line. First, the data distribution difference may come from the fact that different domains use different key words to express the same concept. Second, the association between this conceptual feature and the document class may be stable across domains. These two issues are actually the distinction and commonality across data domains.
Inspired by the above observations, we propose a generative statistical model, named Collaborative Dual-PLSA (CD-PLSA), to simultaneously capture both the domain distinction and commonality among multiple domains. Different from Probabilistic Latent Semantic Analysis (PLSA) with only one latent variable, the proposed model has two latent factors y and z, corresponding to word concept and document class respectively. The shared commonality intertwines with the distinctions over multiple domains, and is also used as the bridge for knowledge transformation. We exploit an Expectation Maximization (EM) algorithm to learn this model, and also propose its distributed version to handle the situation where the data domains are geographically separated from each other. Finally, we conduct extensive experiments over hundreds of classification tasks with multiple source domains and multiple target domains to validate the superiority of the proposed CD-PLSA model over existing state-of-the-art methods of supervised and transfer learning. In particular, we show that CD-PLSA is more tolerant of distribution differences.

References

[1]
W. Y. Dai, Y. Q. Chen, G. R. Xue, Q. Yang, and Y. Yu. Translated learning: Transfer learning across different feature spaces. In Proceedings of the 22nd Neural Information Processing Systems (NIPS), Vancouver, British Columbia, Canada, 2008.
[2]
J. Gao, W. Fan, J. Jiang, and J. W. Han. Knowledge transfer via multiple model local structure mapping. In Proceedings of the 14th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Las Vegas, Nevada, USA, pages 283--291, 2008.
[3]
D. K. Xing, W. Y. Dai, G. R. Xue, and Y. Yu. Bridged refinement for transfer learning. In Proceedings of the 11th Principles and Practice of Knowledge Discovery in Databases (PKDD), Warsaw, Poland, pages 324--335, 2007.
[4]
J. Gao, W. Fan, Y. Z. Sun, and J. W. Han. Heterogeneous source consensus learning via decision propagation and negotiation. In Proceedings of the 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Pairs, France, 2009.
[5]
W. Y. Dai, G. R. Xue, Q. Yang, and Y. Yu. Co-clustering based classification for out-of-domain documents. In Proceedings of the 13th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), San Jose, California, pages 210--219, 2007.
[6]
P. Luo, F. Z. Zhuang, H. Xiong, Y. H. Xiong, and Q. He. Transfer learning from multiple source domains via consensus regularization. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM), Napa Valley, California, USA, pages 103--112, 2008.
[7]
W. Y. Dai, Q. Yang, G. R. Xue, and Y. Yu. Boosting for transfer learning. In Proceedings of the 24th International Conference on Machine Learning (ICML), pages 193--200, 2007.
[8]
G. R. Xue, W. Y. Dai, Q. Yang, and Y. Yu. Topic-bridged plsa for cross-domain text classification. In Proc. of the 31st ACM Annual International Conference on Research and Development in Information Retrieval (SIGIR), Singapore, pages 627--634, 2008.
[9]
W. Y. Dai, O. Jin, G. R. Xue, Q. Yang, and Y. Yu. Eigen transfer: a unified framework for transfer learning. In Proc. of the 26th Annual International Conference on Machine Learning (ICML), Montreal, Quebec, Canada, pages 193--200, 2009.
[10]
T. Hofmann. Probabilistic latent semantic analysis. In Proceedings of 15th Conference on Uncertainty in Artificial Intelligence (UAI), pages 289--296, 1999.
[11]
Y. Jiho and S. J. Choi. Probabilistic matrix tri-factorization. In Proc. of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1553--1556, 2009.
[12]
David Hosmer and Stanley Lemeshow. Applied Logistic Regression. Wiley, New York, 2000.
[13]
J. Jiang and C. X. Zhai. A two-stage approach to domain adaptation for statistical classifiers. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM), pages 401--410, 2007.
[14]
F. Z. Zhuang, P. Luo, H. Xiong, Q. He, Y. H. Xiong, and Z. Z. Shi. Exploiting associations between word clusters and document classes for cross-domain text categorization. In Proc. of the SIAM International Conference on Data Mining (SDM), Columbus, Ohio, USA, pages 13--24, 2010.
[15]
S. J. Pan, J. T. Kwok, and Q. Yang. Transfer learning via dimensionality reduction. In Proceedings of the 23rd Conference on Artificial Intelligence (AAAI), pages 677--682, 2008.
[16]
Q. Q. Gu and J. Zhou. Learning the shared subspace for multi-task clustering and transductive transfer classification. In Proc. of the International Conference on Data Mining (ICDM), Miami, Florida, USA, 2009.
[17]
S. H. Xie, W. Fan, J. Peng, O. Verscheure, and J. T. Ren. Latent space domain transfer between high dimensional overlapping distributions. In Proc. of ACM Conference on World Wide Web (WWW), Madrid, Spain, pages 91--100, 2009.
[18]
J. Jiang and C. X. Zhai. Instance weighting for domain adaptation in nlp. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL), pages 264--271, 2007.
[19]
M. Dredze, A. Kulesza, and K. Crammer. Multi-domain learning by confidence-weighted parameter combination. Journal of Machine Learning, 2009.
[20]
C. X. Zhai, A. Velivelli, and B. Yu. A cross-collection mixture modle for comparative text mining. In Proceedings of the 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Seattle, Washington, USA, pages 743--748, 2004.
[21]
V. N. Vapnik. Statictic Learning Theory. New York: Wiely-Interscience, 1998.

Cited By

View all
  • (2023)Challenges and Issues in Sentiment Analysis: A Comprehensive SurveyIEEE Access10.1109/ACCESS.2023.329304111(69626-69642)Online publication date: 2023
  • (2021)Dual Attentive Sequential Learning for Cross-Domain Click-Through Rate PredictionProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3467140(3172-3180)Online publication date: 14-Aug-2021
  • (2021)Dual Attention Transfer in Session-based Recommendation with Multi-dimensional IntegrationProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462866(869-878)Online publication date: 11-Jul-2021
  • Show More Cited By

Index Terms

  1. Collaborative Dual-PLSA: mining distinction and commonality across multiple domains for text classification

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management
    October 2010
    2036 pages
    ISBN:9781450300995
    DOI:10.1145/1871437
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 October 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. classification
    2. cross-domain learning
    3. statistical generative models

    Qualifiers

    • Research-article

    Conference

    CIKM '10

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Challenges and Issues in Sentiment Analysis: A Comprehensive SurveyIEEE Access10.1109/ACCESS.2023.329304111(69626-69642)Online publication date: 2023
    • (2021)Dual Attentive Sequential Learning for Cross-Domain Click-Through Rate PredictionProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3467140(3172-3180)Online publication date: 14-Aug-2021
    • (2021)Dual Attention Transfer in Session-based Recommendation with Multi-dimensional IntegrationProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462866(869-878)Online publication date: 11-Jul-2021
    • (2021)Coarse Alignment of Topic and Sentiment: A Unified Model for Cross-Lingual Sentiment ClassificationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2020.297922532:2(736-747)Online publication date: Feb-2021
    • (2021)Dual Metric Learning for Effective and Efficient Cross-Domain RecommendationsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3074395(1-1)Online publication date: 2021
    • (2020)DDTCDRProceedings of the 13th International Conference on Web Search and Data Mining10.1145/3336191.3371793(331-339)Online publication date: 20-Jan-2020
    • (2020)Probabilistic Latent Semantic Analysis-Based Gear Fault Diagnosis Under Variable Working ConditionsIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2019.292541069:6(2845-2857)Online publication date: Jun-2020
    • (2020)A Comprehensive Survey on Transfer LearningProceedings of the IEEE10.1109/JPROC.2020.3004555(1-34)Online publication date: 2020
    • (2020)Dynamic Stakeholder-Associated Topic Modeling on Public Concerns in Megainfrastructure Projects: Case of Hong Kong–Zhuhai–Macao BridgeJournal of Management in Engineering10.1061/(ASCE)ME.1943-5479.000084536:6Online publication date: Nov-2020
    • (2019)Domain-Adversarial Graph Neural Networks for Text Classification2019 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM.2019.00075(648-657)Online publication date: Nov-2019
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media