Abstract
Many treebanks have been developed in recent years for different languages. But these treebanks usually employ different syntactic tag sets. This forms an obstacle for other researchers to take full advantages of them, especially when they undertake the multilingual research. To address this problem and to facilitate future research in unsupervised induction of syntactic structures, some researchers have developed a universal POS tag set. However, the disaccord problem of the phrase tag sets remains unsolved. Trying to bridge the phrase level tag sets of multilingual treebanks, this paper designs a phrase mapping between the French Treebank and the English Penn Treebank. Furthermore, one of the potential applications of this mapping work is explored in the machine translation evaluation task. This novel evaluation model developed without using reference translations yields promising results as compared to the state-of-the-art evaluation metrics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Marcus, M., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)
Mitchell, M., Kim, G., Marcinkiewicz, M.A., MacIntyre, R., Bies, A., Ferguson, M., Katz, K., Schasberger, B.: The Penn Treebank: Annotating Predicate Argument Structure. In: Human Language Technology: Proceedings of Workshop, Plainsboro, New Jersey, March 8-11, pp. 114–119. H94-1020 (1994)
Skut, W., Krenn, B., Brants, T., Uszkoreit, H.: An annotation scheme for free word order languages. In: Proc. of ANLP, pp. 88–95 (1997)
Abeillé, A., Clément, L., Toussenel, F.: Building a Treebank for French. Building and Using Parsed Corpora. In: Abeillé (Abeillé, 2003), ch. 10. ANNE Abeillé, Treebanks. Kluwer Academic Publishers (2003)
Chen, K., Luo, C., Chang, M., Chen, F., Chen, C., Huang, C., Gao, Z.: Sinica tree-bank: Design criteria, representational issues and implementation. In: Abeillé, ch. 13, pp. 231–248 (2003)
Slav, P., Das, D., McDonald, R.: A Universal Part-of-Speech Tagset. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey (2012)
Bies, A., Ferguson, M., Katz, K., MacIntyre, R.: Bracketing Guidelines for Treebank II style Penn Treebank Project. Linguistic Data Consortium (1995)
Kishore, P., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL 2002), pp. 311–318. Association for Computational Linguistics, Stroudsburg (2002)
George, D.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the Second International Conference on Human Language Technology Research (HLT 2002), pp. 138–145. Morgan Kaufmann Publishers Inc., San Francisco (2002)
Satanjeev, B., Lavie, A.: METEOR: An Automatic Metric for MT Eval-uation with Improved Correlation with Human Judgments. In: Proceedings of the 43th An-nual Meeting of the Association of Computational Linguistics (ACL 2005), pp. 65–72. Association of Computational Linguistics, Ann Arbor (June 2005)
Matthew, S., Dorr, B.J., Schwartz, R., Micciulla, L., Makhoul, J.: A Study of Translation Edit Rate with Targeted Human Annotation. In: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas (AMTA 2006), USA, pp. 223–231 (2006)
Callison-Burch, C., Koehn, P., Monz, C., Zaidan, O.F.: Findings of the 2011 Workshop on Statistical Machine Translation. In: Proceedings of the Sixth Workshop on Statistical Machine Translation of the Association for Computational Linguistics(ACL-WMT), pp. 22–64. Association for Computational Linguistics, Edinburgh (2011)
Chris, C.-B., Koehn, P., Monz, C., Post, M., Soricut, R., Specia, L.: Findings of the 2012Workshop on Statistical Machine Translation. In: Pro-ceedings of the Seventh Workshop on Statistical Machine Translation, pp. 10–51. Association for Computational Linguistics, Mon-treal (2012)
Petrov, S., Barrett, L., Thibaux, R., Klein, D.: Learning accurate, compact, and interpretable tree annotation. In: Proceedings of the 21st International Con-ference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (ACL-44), pp. 433–440. Association for Computational Linguistics, Strouds-burg (2006)
Cohen, J.: Statistical power analysis for the behavioral sciences, 2nd edn. Psychology Press (1988)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Han, A.LF., Wong, D.F., Chao, L.S., He, L., Li, S., Zhu, L. (2013). Phrase Tagset Mapping for French and English Treebanks and Its Application in Machine Translation Evaluation. In: Gurevych, I., Biemann, C., Zesch, T. (eds) Language Processing and Knowledge in the Web. Lecture Notes in Computer Science(), vol 8105. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40722-2_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-40722-2_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40721-5
Online ISBN: 978-3-642-40722-2
eBook Packages: Computer ScienceComputer Science (R0)