Phrase Tagset Mapping for French and English Treebanks and Its Application in Machine Translation Evaluation

Han, Aaron Li-Feng; Wong, Derek F.; Chao, Lidia S.; He, Liangye; Li, Shuo; Zhu, Ling

doi:10.1007/978-3-642-40722-2_13

Aaron Li-Feng Han²²,
Derek F. Wong²²,
Lidia S. Chao²²,
Liangye He²²,
Shuo Li²² &
…
Ling Zhu²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8105))

1330 Accesses
3 Citations
3 Altmetric

Abstract

Many treebanks have been developed in recent years for different languages. But these treebanks usually employ different syntactic tag sets. This forms an obstacle for other researchers to take full advantages of them, especially when they undertake the multilingual research. To address this problem and to facilitate future research in unsupervised induction of syntactic structures, some researchers have developed a universal POS tag set. However, the disaccord problem of the phrase tag sets remains unsolved. Trying to bridge the phrase level tag sets of multilingual treebanks, this paper designs a phrase mapping between the French Treebank and the English Penn Treebank. Furthermore, one of the potential applications of this mapping work is explored in the machine translation evaluation task. This novel evaluation model developed without using reference translations yields promising results as compared to the state-of-the-art evaluation metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Marcus, M., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)
Google Scholar
Mitchell, M., Kim, G., Marcinkiewicz, M.A., MacIntyre, R., Bies, A., Ferguson, M., Katz, K., Schasberger, B.: The Penn Treebank: Annotating Predicate Argument Structure. In: Human Language Technology: Proceedings of Workshop, Plainsboro, New Jersey, March 8-11, pp. 114–119. H94-1020 (1994)
Google Scholar
Skut, W., Krenn, B., Brants, T., Uszkoreit, H.: An annotation scheme for free word order languages. In: Proc. of ANLP, pp. 88–95 (1997)
Google Scholar
Abeillé, A., Clément, L., Toussenel, F.: Building a Treebank for French. Building and Using Parsed Corpora. In: Abeillé (Abeillé, 2003), ch. 10. ANNE Abeillé, Treebanks. Kluwer Academic Publishers (2003)
Google Scholar
Chen, K., Luo, C., Chang, M., Chen, F., Chen, C., Huang, C., Gao, Z.: Sinica tree-bank: Design criteria, representational issues and implementation. In: Abeillé, ch. 13, pp. 231–248 (2003)
Google Scholar
Slav, P., Das, D., McDonald, R.: A Universal Part-of-Speech Tagset. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey (2012)
Google Scholar
Bies, A., Ferguson, M., Katz, K., MacIntyre, R.: Bracketing Guidelines for Treebank II style Penn Treebank Project. Linguistic Data Consortium (1995)
Google Scholar
Kishore, P., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL 2002), pp. 311–318. Association for Computational Linguistics, Stroudsburg (2002)
Google Scholar
George, D.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the Second International Conference on Human Language Technology Research (HLT 2002), pp. 138–145. Morgan Kaufmann Publishers Inc., San Francisco (2002)
Google Scholar
Satanjeev, B., Lavie, A.: METEOR: An Automatic Metric for MT Eval-uation with Improved Correlation with Human Judgments. In: Proceedings of the 43th An-nual Meeting of the Association of Computational Linguistics (ACL 2005), pp. 65–72. Association of Computational Linguistics, Ann Arbor (June 2005)
Google Scholar
Matthew, S., Dorr, B.J., Schwartz, R., Micciulla, L., Makhoul, J.: A Study of Translation Edit Rate with Targeted Human Annotation. In: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas (AMTA 2006), USA, pp. 223–231 (2006)
Google Scholar
Callison-Burch, C., Koehn, P., Monz, C., Zaidan, O.F.: Findings of the 2011 Workshop on Statistical Machine Translation. In: Proceedings of the Sixth Workshop on Statistical Machine Translation of the Association for Computational Linguistics(ACL-WMT), pp. 22–64. Association for Computational Linguistics, Edinburgh (2011)
Google Scholar
Chris, C.-B., Koehn, P., Monz, C., Post, M., Soricut, R., Specia, L.: Findings of the 2012Workshop on Statistical Machine Translation. In: Pro-ceedings of the Seventh Workshop on Statistical Machine Translation, pp. 10–51. Association for Computational Linguistics, Mon-treal (2012)
Google Scholar
Petrov, S., Barrett, L., Thibaux, R., Klein, D.: Learning accurate, compact, and interpretable tree annotation. In: Proceedings of the 21st International Con-ference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (ACL-44), pp. 433–440. Association for Computational Linguistics, Strouds-burg (2006)
Google Scholar
Cohen, J.: Statistical power analysis for the behavioral sciences, 2nd edn. Psychology Press (1988)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Science, University of Macau, Av. Padre Toms Pereira Taipa, Macau, China
Aaron Li-Feng Han, Derek F. Wong, Lidia S. Chao, Liangye He, Shuo Li & Ling Zhu

Authors

Aaron Li-Feng Han
View author publications
You can also search for this author in PubMed Google Scholar
Derek F. Wong
View author publications
You can also search for this author in PubMed Google Scholar
Lidia S. Chao
View author publications
You can also search for this author in PubMed Google Scholar
Liangye He
View author publications
You can also search for this author in PubMed Google Scholar
Shuo Li
View author publications
You can also search for this author in PubMed Google Scholar
Ling Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Technical University Darmstadt, 64289 Darmstadt, Germany, and German Institute for International Education Research,, 60486, Frankfurt, Germany
Iryna Gurevych
Technical University Darmstadt, 64289, Darmstadt, Germany
Chris Biemann
Technical University Darmstadt, 64289 Darmsadt, and German Institute for International Educational Research, 60486, Frankfurt, Germany
Torsten Zesch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Han, A.LF., Wong, D.F., Chao, L.S., He, L., Li, S., Zhu, L. (2013). Phrase Tagset Mapping for French and English Treebanks and Its Application in Machine Translation Evaluation. In: Gurevych, I., Biemann, C., Zesch, T. (eds) Language Processing and Knowledge in the Web. Lecture Notes in Computer Science(), vol 8105. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40722-2_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-40722-2_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40721-5
Online ISBN: 978-3-642-40722-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics