Abstract
This paper presents a simple but effective sentence-length informed method to select informative sentences for active learning (AL) based SMT. A length factor is introduced to penalize short sentences to balance the “exploration” and “exploitation” problem. The penalty is dynamically updated at each iteration of sentence selection by the ratio of the current candidate sentence length and the overall average sentence length of the monolingual corpus. Experimental results on NIST Chinese–English pair and WMT French-English pair show that the proposed sentence-length penalty based method performs best compared with the typical selection method and random selection strategy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Callison-Burch, C., Koehn, P., Osborne, M.: Improved statistical machine translation using paraphrases. In: Proceedings of HLT-NAACL 2006: Proceedings of the NAACL, pp. 17–24 (2006)
Nakov, P.: Improving English-Spanish statistical machine translation: experiments in domain adaptation, sentence paraphrasing, tokenization, and recasing. In: Proceedings of WMT, pp. 147–150 (2008)
Du, J., Jiang, J., Way, A.: Facilitating Translation Using Source Language Paraphrase Lattices. In: Proceedings of EMNLP, pp. 420–429 (2010)
Nakov, P., Ng, H.: Improved statistical machine translation for resource-poor languages using related resource-rich languages. In: Proceedings of EMNLP, pp. 1358–1367 (2009)
Haffari, G., Roy, M., Sarkar, A.: Active learning for statistical phrase-based machine translation. In: Proceedings of NAACL, pp. 415–423 (2009)
Haffari, G., Sarkar, A.: Active Learning for Multilingual Statistical Machine Translation. In: Proceedings of ACL and the 4th IJCNLP, pp. 181–189 (2009)
Ambati, V., Vogel, S., Carbonell, J.: Active learning and crowd-sourcing for machine translation. In: Proceedings of LREC, pp. 2169–2174 (2010)
Ambati, V., Vogel, S., Carbonell, J.: Multi-strategy approaches to active learning for smt. In: Proceedings of the MT Summit XIII, pp. 122–129 (2011)
Ambati, V., Hewavitharana, S., Vogel, S., Carbonell, J.: Active learning with multiple annotations for comparable data classification task. In: Proceedings of the Fourth Workshop on Building and Using Comparable Corpora, pp. 69–77 (2011)
Bakhshaei, S., Khadivi, S.: A Pool-based Active Learning Method for Improving Farsi-English MT system. In: Proceedings of IST, pp. 822–826 (2012)
Koehn, P., Hoang, H., Callison-Burch, C., et al.: Moses: Open Source Toolkit for Statistical Machine Translation. In: Proceedings of ACL, pp. 177–180 (2007)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a Method for Automatic Evaluation of Machine Translation. In: Proceedings of ACL, pp. 311–318 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Du, J., Wang, M., Zhang, M. (2014). Sentence-Length Informed Method for Active Learning Based Resource-Poor Statistical Machine Translation. In: Zong, C., Nie, JY., Zhao, D., Feng, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2014. Communications in Computer and Information Science, vol 496. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45924-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-662-45924-9_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45923-2
Online ISBN: 978-3-662-45924-9
eBook Packages: Computer ScienceComputer Science (R0)