Unsupervised Feature Adaptation for Cross-Domain NLP with an Application to Compositionality Grading

Michelbacher, Lukas; Han, Qi; Schütze, Hinrich

doi:10.1007/978-3-642-37247-6_1

Unsupervised Feature Adaptation for Cross-Domain NLP with an Application to Compositionality Grading

Lukas Michelbacher¹⁷,
Qi Han¹⁷ &
Hinrich Schütze¹⁷

Conference paper

2263 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7816))

Abstract

In this paper, we introduce feature adaptation, an unsupervised method for cross-domain natural language processing (NLP). Feature adaptation adapts a supervised NLP system to a new domain by recomputing feature values while retaining the model and the feature definitions used on the original domain. We demonstrate the effectiveness of feature adaptation through cross-domain experiments in compositionality grading and show that it rivals supervised target domain systems when moving from generic web text to a specialized physics text domain.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Daumé III, H., Marcu, D.: Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research (JAIR) 26 (2006)
Google Scholar
Blitzer, J., McDonald, R., Pereira, F.: Domain adaptation with structural correspondence learning. In: EMNLP, pp. 120–128 (2006)
Google Scholar
Biemann, C., Giesbrecht, E.: Distributional semantics and compositionality 2011: Shared task description and results. In: ACL 2011 Workshop on Distributional Semantics and Compositionality, pp. 21–28 (2011)
Google Scholar
Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword Expressions: A Pain in the Neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002)
Chapter Google Scholar
Gildea, D.: Corpus variation and parser performance. In: EMNLP, pp. 167–202 (2001)
Google Scholar
McClosky, D., Charniak, E., Johnson, M.: Reranking and self-training for parser adaptation. In: ACL/COLING, pp. 337–344 (2006)
Google Scholar
Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In: ACL, pp. 440–447 (2007)
Google Scholar
Daume III, H., Jagarlamudi, J.: Domain adaptation for machine translation by mining unseen words. In: ACL/HLT, pp. 407–412 (2011)
Google Scholar
Baroni, M., Bernardini, S., Ferraresi, A., Zanchetta, E.: The wacky wide web: a collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation 43(3) (2009)
Google Scholar
Lykke, M., Larsen, B., Lund, H., Ingwersen, P.: Developing a Test Collection for the Evaluation of Integrated Search. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., van Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 627–630. Springer, Heidelberg (2010)
Chapter Google Scholar
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: International Conference on New Methods in Language Processing, pp. 44–49 (1994)
Google Scholar
Evert, S.: The Statistics of Word Cooccurrences: Word Pairs and Collocations. PhD thesis, Institut für maschinelle Sprachverarbeitung (IMS), Universität Stuttgart (2004)
Google Scholar
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press (1999)
Google Scholar
Schone, P., Jurafsky, D.: Is knowledge-free induction of multiword unit dictionary headwords a solved problem? In: EMNLP, pp. 100–108 (2001)
Google Scholar
Schütze, H.: Dimensions of meaning. In: 1992 ACM/IEEE Conference on Supercomputing, Supercomputing 1992, pp. 787–796. IEEE (1992)
Google Scholar
Sahlgren, M.: The Word-Space Model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. PhD thesis, Swedish Institute of Computer Science (2006)
Google Scholar
Turney, P.D., Pantel, P.: From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research 37(1) (2010)
Google Scholar
Harris, Z.: Distributional structure. Word (1954)
Google Scholar
Mitchell, J., Lapata, M.: Composition in distributional models of semantics. Cognitive Science 34(8) (2010)
Google Scholar
Baldwin, T., Bannard, C., Tanaka, T., Widdows, D.: An empirical model of multiword expression decomposability. In: ACL 2003 Workshop on Multiword Expressions, pp. 89–96 (2003)
Google Scholar
Michelbacher, L., Kothari, A., Forst, M., Lioma, C., Schütze, H.: A cascaded classification approach to semantic head recognition. In: EMNLP, pp. 793–803 (2011)
Google Scholar
Garrido, G., Peñas, A.: Detecting compositionality using semantic vector space models based on syntactic context. shared task system description. In: ACL 2011 Workshop on Distributional Semantics and Compositionality, pp. 43–47 (2011)
Google Scholar
Guevara, E.: A regression model of adjective-noun compositionality in distributional semantics. In: 2010 Workshop on Geometrical Models of Natural Language Semantics, pp. 33–37 (2010)
Google Scholar
Breiman, L.: Random forests. Machine Learning 45(1) (2001)
Google Scholar
Reddy, S., McCarthy, D., Manandhar, S., Gella, S.: Exemplar-based word-space model for compositionality detection: Shared task system description. In: ACL 2011 Workshop on Distributional Semantics and Compositionality, pp. 54–60 (2011)
Google Scholar
Johannsen, A., Martinez, H., Rishøj, C., Søgaard, A.: Shared task system description: Frustratingly hard compositionality prediction. In: ACL 2011 Workshop on Distributional Semantics and Compositionality, pp. 29–32 (2011)
Google Scholar
Chelba, C., Acero, A.: Adaptation of maximum entropy capitalizer: Little data can help a lot. Computer Speech & Language 20(4) (2006)
Google Scholar
Huang, F., Yates, A.: Distributional representations for handling sparsity in supervised sequence-labeling. In: ACL/IJCNLP, pp. 495–503 (2009)
Google Scholar
Bertoldi, N., Federico, M.: Domain adaptation for statistical machine translation with monolingual resources. In: Fourth Workshop on Statistical Machine Translation, pp. 167–174 (2009)
Google Scholar
Pan, S.J., Ni, X., Sun, J.T., Yang, Q., Chen, Z.: Cross-domain sentiment classification via spectral feature alignment. In: WWW, pp. 751–760 (2010)
Google Scholar
Lin, D.: Automatic identification of non-compositional phrases. In: ACL, pp. 317–324 (1999)
Google Scholar
Bannard, C., Baldwin, T., Lascarides, A.: A statistical approach to the semantics of verb-particles. In: ACL 2003 Workshop on Multiword Expressions, pp. 65–72 (2003)
Google Scholar
McCarthy, D., Keller, B., Carroll, J.: Detecting a continuum of compositionality in phrasal verbs. In: ACL 2003 Workshop on Multiword Expressions, pp. 73–80 (2003)
Google Scholar
Katz, G., Giesbrecht, E.: Automatic identification of non-compositional multi-word expressions using latent semantic analysis. In: ACL 2006 Workshop on Multiword Expressions, pp. 12–19 (2006)
Google Scholar
Sporleder, C., Li, L.: Unsupervised recognition of literal and non-literal use of idiomatic expressions. In: EACL, pp. 754–762 (2009)
Google Scholar
Turian, J., Ratinov, L., Bengio, Y.: Word representations: A simple and general method for semi-supervised learning. In: ACL (2010)
Google Scholar
Gliozzo, A., Strapparava, C.: Cross language text categorization by acquiring multilingual domain models from comparable corpora. In: Proceedings of the ACL Workshop on Building and Using Parallel Texts, pp. 9–16. Association for Computational Linguistics (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Natural Language Processing, University of Stuttgart, Germany
Lukas Michelbacher, Qi Han & Hinrich Schütze

Authors

Lukas Michelbacher
View author publications
You can also search for this author in PubMed Google Scholar
Qi Han
View author publications
You can also search for this author in PubMed Google Scholar
Hinrich Schütze
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Mexico D.F., Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Michelbacher, L., Han, Q., Schütze, H. (2013). Unsupervised Feature Adaptation for Cross-Domain NLP with an Application to Compositionality Grading. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2013. Lecture Notes in Computer Science, vol 7816. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37247-6_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-37247-6_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37246-9
Online ISBN: 978-3-642-37247-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics