Sentence-Length Informed Method for Active Learning Based Resource-Poor Statistical Machine Translation

Du, Jinhua; Wang, Miaomiao; Zhang, Meng

doi:10.1007/978-3-662-45924-9_9

Jinhua Du^16,17,
Miaomiao Wang^16,17 &
Meng Zhang¹⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 496))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

1887 Accesses

Abstract

This paper presents a simple but effective sentence-length informed method to select informative sentences for active learning (AL) based SMT. A length factor is introduced to penalize short sentences to balance the “exploration” and “exploitation” problem. The penalty is dynamically updated at each iteration of sentence selection by the ratio of the current candidate sentence length and the overall average sentence length of the monolingual corpus. Experimental results on NIST Chinese–English pair and WMT French-English pair show that the proposed sentence-length penalty based method performs best compared with the typical selection method and random selection strategy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Research on Statistical Machine Translation of Bilingual Resources Lacking Language Pairs Based on Active Learning

Optimization Strategy of Machine Translation Algorithm for English Long Sentences Based on Semantic Relations

Machine Translation System Based on Semantic Selection and Information Features

References

Callison-Burch, C., Koehn, P., Osborne, M.: Improved statistical machine translation using paraphrases. In: Proceedings of HLT-NAACL 2006: Proceedings of the NAACL, pp. 17–24 (2006)
Google Scholar
Nakov, P.: Improving English-Spanish statistical machine translation: experiments in domain adaptation, sentence paraphrasing, tokenization, and recasing. In: Proceedings of WMT, pp. 147–150 (2008)
Google Scholar
Du, J., Jiang, J., Way, A.: Facilitating Translation Using Source Language Paraphrase Lattices. In: Proceedings of EMNLP, pp. 420–429 (2010)
Google Scholar
Nakov, P., Ng, H.: Improved statistical machine translation for resource-poor languages using related resource-rich languages. In: Proceedings of EMNLP, pp. 1358–1367 (2009)
Google Scholar
Haffari, G., Roy, M., Sarkar, A.: Active learning for statistical phrase-based machine translation. In: Proceedings of NAACL, pp. 415–423 (2009)
Google Scholar
Haffari, G., Sarkar, A.: Active Learning for Multilingual Statistical Machine Translation. In: Proceedings of ACL and the 4th IJCNLP, pp. 181–189 (2009)
Google Scholar
Ambati, V., Vogel, S., Carbonell, J.: Active learning and crowd-sourcing for machine translation. In: Proceedings of LREC, pp. 2169–2174 (2010)
Google Scholar
Ambati, V., Vogel, S., Carbonell, J.: Multi-strategy approaches to active learning for smt. In: Proceedings of the MT Summit XIII, pp. 122–129 (2011)
Google Scholar
Ambati, V., Hewavitharana, S., Vogel, S., Carbonell, J.: Active learning with multiple annotations for comparable data classification task. In: Proceedings of the Fourth Workshop on Building and Using Comparable Corpora, pp. 69–77 (2011)
Google Scholar
Bakhshaei, S., Khadivi, S.: A Pool-based Active Learning Method for Improving Farsi-English MT system. In: Proceedings of IST, pp. 822–826 (2012)
Google Scholar
Koehn, P., Hoang, H., Callison-Burch, C., et al.: Moses: Open Source Toolkit for Statistical Machine Translation. In: Proceedings of ACL, pp. 177–180 (2007)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a Method for Automatic Evaluation of Machine Translation. In: Proceedings of ACL, pp. 311–318 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Automation and Information Engineering, Xi’an University of Technology, China
Jinhua Du, Miaomiao Wang & Meng Zhang
Shaanxi Key Laboratory of Complex System Control and Intelligent Information Processing, Xi’an, 710048, China
Jinhua Du & Miaomiao Wang

Authors

Jinhua Du
View author publications
You can also search for this author in PubMed Google Scholar
Miaomiao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Meng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, 100190, Beijing, China
Chengqing Zong
Dept. of Computer Science and Operations Research, University of Montreal, Montreal, Quebec, Canada
Jian-Yun Nie
Peking University, Beijing, China
Dongyan Zhao
Institute of Computer Science & Technology, Peking University, 100871, Beijing, China
Yansong Feng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Du, J., Wang, M., Zhang, M. (2014). Sentence-Length Informed Method for Active Learning Based Resource-Poor Statistical Machine Translation. In: Zong, C., Nie, JY., Zhao, D., Feng, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2014. Communications in Computer and Information Science, vol 496. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45924-9_9

Download citation

DOI: https://doi.org/10.1007/978-3-662-45924-9_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45923-2
Online ISBN: 978-3-662-45924-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics