research-article

Extracting discriminative concepts for domain adaptation in text mining

Authors:

Tak-Lam WongAuthors Info & Claims

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 179 - 188

https://doi.org/10.1145/1557019.1557045

Published: 28 June 2009 Publication History

Abstract

One common predictive modeling challenge occurs in text mining problems is that the training data and the operational (testing) data are drawn from different underlying distributions. This poses a great difficulty for many statistical learning methods. However, when the distribution in the source domain and the target domain are not identical but related, there may exist a shared concept space to preserve the relation. Consequently a good feature representation can encode this concept space and minimize the distribution gap. To formalize this intuition, we propose a domain adaptation method that parameterizes this concept space by linear transformation under which we explicitly minimize the distribution difference between the source domain with sufficient labeled data and target domains with only unlabeled data, while at the same time minimizing the empirical loss on the labeled data in the source domain. Another characteristic of our method is its capability for considering multiple classes and their interactions simultaneously. We have conducted extensive experiments on two common text mining problems, namely, information extraction and document classification to demonstrate the effectiveness of our proposed method.

Supplementary Material

JPG File (p179-chen.jpg)

Download
10.59 KB

MP4 File (p179-chen.mp4)

Download
91.81 MB

References

[1]

M. Belkin, P. Niyogi, and V. Sindhwani. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 12:2399--2434, 2006.

Digital Library

[2]

S. Bickel, C. Sawade, and T. Scheffer. Transfer learning by distribution matching for targeted advertising. In Advances in Neural Information Processing Systems 21, pages 145--152, 2009.

[3]

J. Blitzer, R. McDonald, and F. Pereira. Domain adaptation with structural correspondence learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 120--128, 2006.

Digital Library

[4]

H. Daume III. Frustratingly easy domain adaptation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, pages 256--263, June 2007.

[5]

A. Gretton, K. Borgwardt, M. Rasch, B. Scholkolpf, and A. Smola. A kernel method for the two-sample problem. In Advances in Neural Information Processing Systems 19, pages 513--520, 2007.

[6]

J. Huang, A. Smola, A. Gretton, K. M. Borgwardt, and B. Scholkopf. Correcting sample selection bias by unlabeled data. In Advances in Neural Information Processing Systems 19, pages 601--608, 2007.

[7]

S. Ji, L. Tang, S. Yu, and J. Ye. Extracting shared subspace for multi-label classifcation. In Proceedings of the 14th ACM SIGKDD International Conference On Knowledge Discovery and Data Mining, pages 381--389, 2008.

Digital Library

[8]

J. Jiang and C. Zhai. Instance weighting for domain adaptation in NLP. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, pages 264--271, 2007.

[9]

T. Joachims. Learning to Classify Text Using Support Vector Machines - Methods, Theory, and Algorithms. Kluwer/Springer, 2002.

Digital Library

[10]

X. Ling, W. Dai, G.-R. Xue, Q. Yang, and Y. Yu. Spectral domain-transfer learning. In Proceedings of the 14th ACM SIGKDD International Conference On Knowledge Discovery and Data Mining, pages 488--496, 2008.

Digital Library

[11]

S. J. Pan, J. T. Kwok, and Q. Yang. Transfer learning via dimensionality reduction. In Proceedings of the 23rd AAAI conference on Artifcial Intelligence, pages 677--682, 2008.

Digital Library

[12]

R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng. Self-taught learning: transfer learning from unlabeled data. In Proceedings of the 24th Annual International Conference on Machine Learning, pages 759--766, 2007.

Digital Library

[13]

S. Satpal and S. Sarawagi. Domain adaptation of conditional probability models via feature subsetting. In Proceedings of European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 224--235, 2007.

[14]

A. Storkey and M. Sugiyama. Mixture regression for covariate shift. In Advances in Neural Information Processing Systems 19, pages 1337--1344, 2007.

[15]

M. Sugiyama, S. Nakajima, H. Kashima, P. von Bunau, and M. Kawanabe. Direct importance estimation with model selection and its application to covariate shift adaptation. In Advances in Neural Information Processing Systems 20, pages 1433--1440, 2008.

[16]

J. Yang, R. Yan, and A. G. Hauptmann. Cross-domain video concept detection using adaptive svms. In ACM Multimedia, pages 188--197, 2007.

Digital Library

Cited By

Zhao TLuo DZhang XWang SBaeza-Yates RBonchi F(2024)Multi-source Unsupervised Domain Adaptation on Graphs with Transferability ModelingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671829(4479-4489)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671829
Bi WCheng XXu BSun XXu LShen HFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)Bridged-GNN: Knowledge Bridge Learning for Effective Knowledge TransferProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614796(99-109)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3614796
Kihlman RFasli M(2022)Improving the co-training algorithm to enhance semi-supervised learning results2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020477(5962-5970)Online publication date: 17-Dec-2022
https://doi.org/10.1109/BigData55660.2022.10020477
Show More Cited By

Index Terms

Extracting discriminative concepts for domain adaptation in text mining
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
      1. Feature selection
2. Information systems
  1. Information systems applications

Recommendations

Semi-supervised adversarial discriminative domain adaptation
Abstract
Domain adaptation is a potential method to train a powerful deep neural network across various datasets. More precisely, domain adaptation methods train the model on training data and test that model on a completely separate dataset. The ...
Domain Adaptation for Face Recognition: Targetize Source Domain Bridged by Common Subspace

In many applications, a face recognition model learned on a source domain but applied to a novel target domain degenerates even significantly due to the mismatch between the two domains. Aiming at learning a better face recognition model for the target ...
Iterative Discriminative Domain Adaptation
Pattern Recognition and Computer Vision
Abstract
A popular formulation of domain adaptation (DA) is to simultaneously minimize the source risk and the cross-domain discrepancy between the source domain and target domain . However, this is believed to be suboptimal since the shared feature, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

June 2009

1426 pages

ISBN:9781605584959

DOI:10.1145/1557019

General Chairs:
John Elder
Elder Research, Inc., USA
,
Françoise Soulié Fogelman
KXEN, France
,
Program Chairs:
Peter Flach
University of Bristol, UK
,
Mohammed Zaki
RPI, USA

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD09

Sponsor:

KDD09: The 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

June 28 - July 1, 2009

Paris, France

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

47
Total Citations
View Citations
1,259
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)2

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhao TLuo DZhang XWang SBaeza-Yates RBonchi F(2024)Multi-source Unsupervised Domain Adaptation on Graphs with Transferability ModelingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671829(4479-4489)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671829
Bi WCheng XXu BSun XXu LShen HFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)Bridged-GNN: Knowledge Bridge Learning for Effective Knowledge TransferProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614796(99-109)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3614796
Kihlman RFasli M(2022)Improving the co-training algorithm to enhance semi-supervised learning results2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020477(5962-5970)Online publication date: 17-Dec-2022
https://doi.org/10.1109/BigData55660.2022.10020477
Zhang HCao HYang XDeng CTao D(2021)Self-Training With Progressive Representation Enhancement for Unsupervised Cross-Domain Person Re-IdentificationIEEE Transactions on Image Processing10.1109/TIP.2021.308229830(5287-5298)Online publication date: 2021
https://doi.org/10.1109/TIP.2021.3082298
Xu PDeng ZWang JZhang QChoi KWang S(2021)Transfer Representation Learning With TSK Fuzzy SystemIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2019.295829929:3(649-663)Online publication date: Mar-2021
https://doi.org/10.1109/TFUZZ.2019.2958299
Wei PKe YGoh C(2019)A General Domain Specific Feature Transfer Framework for Hybrid Domain AdaptationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2018.286473231:8(1440-1451)Online publication date: 1-Aug-2019
https://dl.acm.org/doi/10.1109/TKDE.2018.2864732
Ji DJiang YQian PWang S(2019)A Novel Doubly Reweighting Multisource Transfer Learning FrameworkIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2018.28683263:5(380-391)Online publication date: Oct-2019
https://doi.org/10.1109/TETCI.2018.2868326
Ruan GMinku LMenzel SSendhoff BYao X(2019)When and How to Transfer Knowledge in Dynamic Multi-objective Optimization2019 IEEE Symposium Series on Computational Intelligence (SSCI)10.1109/SSCI44817.2019.9002815(2034-2041)Online publication date: Dec-2019
https://doi.org/10.1109/SSCI44817.2019.9002815
Abbasi KPoso AGhasemi JAmanlou MMasoudi-Nejad A(2019)Deep Transferable Compound Representation across Domains and Tasks for Low Data Drug DiscoveryJournal of Chemical Information and Modeling10.1021/acs.jcim.9b0062659:11(4528-4539)Online publication date: 29-Oct-2019
https://doi.org/10.1021/acs.jcim.9b00626
Wei PKe YGoh C(2018)Feature Analysis of Marginalized Stacked Denoising Autoenconder for Unsupervised Domain AdaptationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2018.2868709(1-14)Online publication date: 2018
https://doi.org/10.1109/TNNLS.2018.2868709
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten