Experiments in Cross-Language Morphological Annotation Transfer

Feldman, Anna; Hana, Jirka; Brew, Chris

doi:10.1007/11671299_4

Anna Feldman¹⁷,
Jirka Hana¹⁷ &
Chris Brew¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3878))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1474 Accesses

Abstract

Annotated corpora are valuable resources for NLP which are often costly to create. We introduce a method for transferring annotation from a morphologically annotated corpus of a source language to a target language. Our approach assumes only that an unannotated text corpus exists for the target language and a simple textbook which describes the basic morphological properties of that language is available. Our paper describes experiments with Polish, Czech, and Russian. However, the method is not tied in any way to these languages. In all the experiments we use the TnT tagger ([3]), a second-order Markov model. Our approach assumes that the information acquired about one language can be used for processing a related language. We have found out that even breathtakingly naive things (such as approximating the Russian transitions by Czech and/or Polish and approximating the Russian emissions by (manually/automatically derived) Czech cognates) can lead to a significant improvement of the tagger’s performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Tweaking NooJ’s Resources to Export Morpheme-Level or Intra-word Annotations

A Close Look at Russian Morphological Parsers: Which One Is the Best?

A Universal Feature Schema for Rich Morphological Annotation and Fine-Grained Cross-Lingual Part-of-Speech Tagging

References

Agirre, E., Atutxa, A., Gojenola, K., Sarasola, K.: Exploring Portability of syntactic information from English to Basque. In: Proceedings of LREC 2004, Lisbon, Portugal (2004)
Google Scholar
Bémová, A., Hajič, J., Hladká, B., Panevová, J.: Morphological and Syntactic Tagging of the Prague Dependency Treebank. In: Proceedings of ATALA Workshop, Paris, France, pp. 21–29 (1999)
Google Scholar
Brants, T.: TnT — A Statistical Part-of-Speech Tagger. Proceedings of ANLP-NAACL, 224–231 (2000)
Google Scholar
Hajic, J.: Morphological Tagging: Data vs. Dictionaries. In: Proceedings of ANLP-NAACL Conference, Seattle, WA, USA, pp. 94–101 (2000)
Google Scholar
Hana, J.: Knowledge and labor light morphological analysis of Czech and Russian. Ms. Linguistic Department. The Ohio State University (2005)
Google Scholar
Hana, J., Feldman, A., Brew, C.: A Resource-light Approach to Russian Morphology: Tagging Russian using Czech resources. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 222–229 (2004)
Google Scholar
Hwa, R., Resnik, P., Weinberg, A., Cabezas, C., Kolak, O.: Bootstrapping Parsers via Syntactic Projection across Parallel Texts. Natural Language Engineering 1(1), 1–15 (2004)
Google Scholar
Marcus, M., Santorine, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)
Google Scholar
Przepiórkowski, A.: The IPI PAN Corpus: Preliminary version. IPI PAN, Warszawa (2004)
Google Scholar
Yarowsky, D., Wicentowski, R.: Minimally Supervised Morphological Analysis by Multimodal Alignment. In: Proceedings of the 38th Meeting of the Association for Computational Linguistics, pp. 208–216 (2000)
Google Scholar
Yarowsky, D., Ngai, G.: Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection Across Aligned Corpora. In: Proceedings of NAACL-2001, pp. 200–207 (2001)
Google Scholar
Yarowsky, D., Ngai, G., Wicentowski, R.: Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora. In: Proceedings of HLT 2001, First International Conference on Human Language Technology Research (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Linguistics, Ohio State University, Columbus, OH, 43210-1298, USA
Anna Feldman, Jirka Hana & Chris Brew

Authors

Anna Feldman
View author publications
You can also search for this author in PubMed Google Scholar
Jirka Hana
View author publications
You can also search for this author in PubMed Google Scholar
Chris Brew
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, México
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Feldman, A., Hana, J., Brew, C. (2006). Experiments in Cross-Language Morphological Annotation Transfer. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2006. Lecture Notes in Computer Science, vol 3878. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11671299_4

Download citation

DOI: https://doi.org/10.1007/11671299_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32205-4
Online ISBN: 978-3-540-32206-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Experiments in Cross-Language Morphological Annotation Transfer

Abstract

Access this chapter

Preview

Similar content being viewed by others

Tweaking NooJ’s Resources to Export Morpheme-Level or Intra-word Annotations

A Close Look at Russian Morphological Parsers: Which One Is the Best?

A Universal Feature Schema for Rich Morphological Annotation and Fine-Grained Cross-Lingual Part-of-Speech Tagging

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Experiments in Cross-Language Morphological Annotation Transfer

Abstract

Access this chapter

Preview

Similar content being viewed by others

Tweaking NooJ’s Resources to Export Morpheme-Level or Intra-word Annotations

A Close Look at Russian Morphological Parsers: Which One Is the Best?

A Universal Feature Schema for Rich Morphological Annotation and Fine-Grained Cross-Lingual Part-of-Speech Tagging

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation