Abstract
We present a novel approach to domain adaptation for text categorization, which merely requires that the source domain data are weakly annotated in the form of labeled features. The main advantage of our approach resides in the fact that labeling words is less expensive than labeling documents. We propose two methods, the first of which seeks to minimize the divergence between the distributions of the source domain, which contains labeled features, and the target domain, which contains only unlabeled data. The second method augments the labeled features set in an unsupervised way, via the discovery of a shared latent concept space between source and target. We empirically show that our approach outperforms standard supervised and semi-supervised methods, and obtains results competitive to those reported by state-of-the-art domain adaptation methods, while requiring considerably less supervision.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Andrzejewski, D., Zhu, X.: Latent Dirichlet Allocation with Topic-in-Set Knowledge. In: NAACL-SSLNLP (2009)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning 3, 993–1022 (2003)
Blitzer, J., McDonald, R., Pereira, F.: Domain adaptation with structural correspondence learning. In: EMNLP (2006)
Chen, B., Lam, W., Tsang, I., Wong, T.L.: Extracting discriminative concepts for domain adaptation in text mining. In: KDD (2009)
Daume III., H.: Frustratingly easy domain adaptation. In: ACL (2007)
Druck, G., Mann, G., McCallum, A.: Learning from labeled features using generalized expectation criteria. In: SIGIR (2008)
Finkel, J.R., Manning, C.D.: Hierarchical bayesian domain adaptation. In: NAACL (2009)
Guo, H., Zhu, H., Guo, Z., Zhang, X., Wu, X., Su, Z.: Domain adaptation with latent semantic association for named entity recognition. In: NAACL (2009)
Jiang, J., Zhai, C.: Instance weighting for domain adaptation in nlp. In: ACL (2007)
Joachims, T.: Transductive Inference for Text Classification using Support Vector Machines. In: ICML (1999)
Kullback, S., Leibler, R.A.: On Information and Sufficiency. Annals of Mathematical Statistics 22(1), 79–86 (1951)
Lang, K.: NewsWeeder: Learning to Filter Netnews. In: ICML (1995)
Mann, G.S., McCallum, A.: Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data. Journal of Machine Learning 11, 955–984 (2010)
Pan, S.J., Kwok, J.T., Yang, Q.: Transfer learning via dimensionality reduction. In: AAAI (2008)
Pan, S.J., Tsang, I.W., Kwok, J.T., Yang, Q.: Domain adaptation via transfer component analysis. In: IJCAI (2009)
Ben-David, S., Blitzer, J., Crammer, K., Pereira, F.: Analysis of representations for domain adaptation. In: NIPS (2006)
Schapire, R., Rochery, M., Rahim, M., Gupta, N.: Incorporating prior knowledge into boosting. In: ICML (2002)
Ni, X., Xue, G.-R., Ling, X., Yu, Y., Yang, Q.: Exploring in the Weblog Space by Detecting Informative and Affective Articles. In: WWW (2007)
Xue, G., Dai, W., Yang, Q., Yu, Y.: Topic-bridged PLSA for cross-domain text classification. In: SIGIR (2008)
Xu, G., Yang, S.-H., Li, H.: Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation. In: KDD (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kadar, C., Iria, J. (2011). Domain Adaptation for Text Categorization by Feature Labeling. In: Clough, P., et al. Advances in Information Retrieval. ECIR 2011. Lecture Notes in Computer Science, vol 6611. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20161-5_42
Download citation
DOI: https://doi.org/10.1007/978-3-642-20161-5_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20160-8
Online ISBN: 978-3-642-20161-5
eBook Packages: Computer ScienceComputer Science (R0)