skip to main content
10.1145/1321440.1321498acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

A two-stage approach to domain adaptation for statistical classifiers

Published: 06 November 2007 Publication History

Abstract

In this paper, we consider the problem of adapting statistical classifiers trained from some source domains where labeled examples are available to a target domain where no labeled example is available. One characteristic of such a domain adaptation problem is that the examples in the source domains and the target domain are known to follow different distributions. Thus a regular classification method would tend to overfit the source domains. We present a two-stage approach to domain adaptation, where at the first <generalization stage, we look for a set of features generalizable across domains, and at the second adaptation stage, we pick up useful features specific to the target domain. Observing that the exact objective function is hard to optimize, we then propose a number of heuristics to approximately achieve the goal of generalization and adaptation. Our experiments on gene name recognition using a real data set show the effectiveness of our general framework and the heuristics.

References

[1]
R. K. Ando and T. Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6:1817--1853, 2005.
[2]
S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. Analysis of representations for domain adaptation. In Advances in Neural Information Processing Systems 19, 2007.
[3]
J. C. Bezdek and R. J. Hathaway. Some notes on alternating optimization. In Proceedings of the 2002 AFSS International Conference on Fuzzy Systems, pages 288--300, 2002.
[4]
J. Blitzer, R. McDonald, and F. Pereira. Domain adaptation with structural correspondence learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 120--128, 2006.
[5]
Y. S. Chan and H. T. Ng. Estimating class priors in domain adaptation for word sense disambiguation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 89--96, 2006.
[6]
C. Chelba and A. Acero. Adaptation of maximum entropy capitalizer: Little data can help a lot. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 285--292, 2004.
[7]
H. Daumé III and D. Marcu. Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research, 26:101--126, 2006.
[8]
J. Finkel, S. Dingare, C. D. Manning, M. Nissim, B. Alex, and C. Grover. Exploring the boundaries: gene and protein identification in biomedical text. BMC Bioinformatics, 6(Suppl 1):S5, 2005.
[9]
R. Florian, H. Hassan, A. Ittycheriah, H. Jing, N. Kambhatla, X. Luo, N. Nicolov, and S. Roukos. A statistical model for multilingual entity detection and tracking. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pages 1--8, 2004.
[10]
D. W. Hosmer and S. Lemeshow. Applied Logistic Regression. Wiley Series in Probability and Statistics. John Wiley &amp; Sons, Inc., 2000.
[11]
J. Jiang and C. Zhai. Exploiting domain structure for named entity recognition. In Proceedings of The Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pages 74--81, 2006.
[12]
X. Li and J. Bilmes. A Bayesian divergence prior for classifier adaptation. In Proceedings of the 11th International Conference on Artificial Intelligence and Statistics, 2007.
[13]
B. Roark and M. Bacchiani. Supervised and unsupervised PCFG adaptatin to novel domains. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pages 126--133, 2003.
[14]
V. N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag New York, Inc., 1995.
[15]
X. Zhu. Semi-supervised learning literature survey. Technical Report 1530, University of Wisconsin, 2005.

Cited By

View all
  • (2023)Data analytics methods to measure service quality: A systematic reviewIntelligent Decision Technologies10.3233/IDT-23036317:4(1007-1029)Online publication date: 20-Nov-2023
  • (2023)An empirical analysis of the shift and scale parameters in BatchNormInformation Sciences10.1016/j.ins.2023.118951(118951)Online publication date: Apr-2023
  • (2023)Multi-source Transfer Learning Based on the Power Set FrameworkInternational Journal of Computational Intelligence Systems10.1007/s44196-023-00281-y16:1Online publication date: 17-Jun-2023
  • Show More Cited By

Index Terms

  1. A two-stage approach to domain adaptation for statistical classifiers

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
    November 2007
    1048 pages
    ISBN:9781595938039
    DOI:10.1145/1321440
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 November 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. classification
    2. domain adaptation
    3. feature selection
    4. logistic regression
    5. semi-supervised learning

    Qualifiers

    • Research-article

    Conference

    CIKM07

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)28
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Data analytics methods to measure service quality: A systematic reviewIntelligent Decision Technologies10.3233/IDT-23036317:4(1007-1029)Online publication date: 20-Nov-2023
    • (2023)An empirical analysis of the shift and scale parameters in BatchNormInformation Sciences10.1016/j.ins.2023.118951(118951)Online publication date: Apr-2023
    • (2023)Multi-source Transfer Learning Based on the Power Set FrameworkInternational Journal of Computational Intelligence Systems10.1007/s44196-023-00281-y16:1Online publication date: 17-Jun-2023
    • (2023)Learning with Domain-Knowledge for Generalizable Prediction of Alzheimer’s Disease from Multi-site Structural MRIMedical Image Computing and Computer Assisted Intervention – MICCAI 202310.1007/978-3-031-43904-9_44(452-461)Online publication date: 1-Oct-2023
    • (2022)Defect Detection in Multiple Product Variants Using Hammering Test with Machine LearningInternational Journal of Automation Technology10.20965/ijat.2022.p078316:6(783-794)Online publication date: 5-Nov-2022
    • (2022)A Traffic Sign Detection Network Based on PosNeg-Balanced Anchors and Domain AdaptationArabian Journal for Science and Engineering10.1007/s13369-022-06818-148:2(1333-1347)Online publication date: 19-Apr-2022
    • (2021)Multiple Latent Spaces Learning for Cross-Domain Text Classification2021 5th Asian Conference on Artificial Intelligence Technology (ACAIT)10.1109/ACAIT53529.2021.9730891(17-24)Online publication date: 29-Oct-2021
    • (2021)Transfer learning based dynamic security assessmentIET Generation, Transmission & Distribution10.1049/gtd2.1218115:16(2333-2343)Online publication date: 2-May-2021
    • (2021)Supervısed Learnıng Algorıthm: A SurveyAdvanced Informatics for Computing Research10.1007/978-981-16-3660-8_7(71-78)Online publication date: 20-Jun-2021
    • (2020)Robust and stable black box explanationsProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525460(5628-5638)Online publication date: 13-Jul-2020
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media