Online discriminative learning for machine translation with binary-valued feedback

Saluja, Avneesh; Zhang, Ying

doi:10.1007/s10590-014-9154-z

Online discriminative learning for machine translation with binary-valued feedback

Published: 10 September 2014

Volume 28, pages 69–90, (2014)
Cite this article

Machine Translation

Avneesh Saluja¹ &
Ying Zhang¹

276 Accesses
Explore all metrics

Abstract

Viewing machine translation (MT) as a structured classification problem has provided a gateway for a host of structured prediction techniques to enter the field. In particular, large-margin methods for discriminative training of feature weights, such as the structured perceptron or MIRA, have started to match or exceed the performance of existing methods such as MERT. One issue with these problems in general is the difficulty in obtaining fully structured labels, e.g. in MT, obtaining reference translations or parallel sentence corpora for arbitrary language pairs. Another issue, more specific to the translation domain, is the difficulty in online training and updating of MT systems, since existing methods often require bilingual knowledge to correct translation outputs online. The problem is an important one, especially with the usage of MT in the mobile domain: in the process of translating user inputs, these systems can also receive feedback from the user on the quality of the translations produced. We propose a solution to these two problems, by demonstrating a principled way to incorporate binary-labeled feedback (i.e. feedback on whether a translation hypothesis is a “good” or understandable one or not), a form of supervision that can be easily integrated in an online and monolingual manner, into an MT framework. Experimental results on Chinese–English and Arabic–English corpora for both sparse and dense feature sets show marked improvements by incorporating binary feedback on unseen test data, with gains in some cases exceeding 5.5 BLEU points. Experiments with human evaluators providing feedback present reasonable correspondence with the larger-scale, synthetic experiments and underline the relative ease by which binary feedback for translation hypotheses can be collected, in comparison to parallel data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine Translation Performance Prediction System: Optimal Prediction for Optimal Translation

Article 19 May 2022

Ergun Biçici

Experimenting with Different Machine Translation Models in Medium-Resource Settings

Controlled Ascent: Imbuing Statistical MT with Linguistic Knowledge

Notes

No sentences with intermediate BLEU values (i.e. greater than 0.2 and less than 0.8) were used for the human evaluation.
The maximum value is \(\frac{2*\min (H(X), H(Y))}{H(X) +H(Y)}\), which in our instances was just less than 1.
It is also called the “loss-augmented hypothesis” in the literature, but we choose to avoid this term lest it be confused with the loss function instead of the extrinsic cost function.
The implementations of these algorithms are available at https://github.com/redpony/cdec.

References

Callison-Burch C, Osborne M, Koehn P (2006) Re-evaluating the role of BLEU in machine translation research. In: EACL-2006: 11th conference of the European chapter of the association for computational linguistics, Trento, Italy, pp 249–256
Chang MW, Srikumar V, Goldwasser D, Roth D (2010) Structured output learning with indirect supervision. In: Proceedings of the 27th annual international conference on machine learning (HICML’10), Haifa, Israel, pp 199–206
Chen SF, Goodman J (1999) An empirical study of smoothing techniques for language modeling. Comput Speech Lang 4(13):359–393
Article Google Scholar
Cherry C, Foster G (2012) Batch tuning strategies for statistical machine translation. In: Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: human language technologies, Montréal, Canada, pp 427–436
Chiang D (2012) Hope and fear for discriminative training of statistical translation models. J Mach Learn Res 13:1159–1187
MathSciNet MATH Google Scholar
Chiang D, Marton Y, Resnik P (2008) Online large-margin training of syntactic and structural translation features. In: EMNLP 2008: Proceedings of the 2008 conference on empirical methods in natural language processing, Honolulu, Hawaii, USA, pp 224–233
Chiang D, Knight K, Wang W (2009) 11,001 new features for statistical machine translation. In: NAACL HLT 2009: Human language technologies: the 2009 annual conference of the North American Chapter of the ACL, Boulder, Colorado, pp 218–226
Crammer K, Dekel O, Keshet J, Shalev-Shwartz S, Singer Y (2006) Online passive-aggressive algorithms. J Mach Learn Res 7:551–585
MathSciNet MATH Google Scholar
Crammer K, Kulesza A, Dredze M (2009) Adaptive regularization of weight vectors. Advances in neural information processing systems, vol. 22. Vancouver, Canada, pp 414–422
Dyer C, Lopez A, Ganitkevitch J, Weese J, Ture F, Blunsom P, Setiawan H, Eidelman V, Resnik P (2010) cdec: a decoder, alignment, and learning framework for finite-state and context-free translation models. In: Proceedings of the ACL 2010 system demonstrations. Uppsala, Sweden, pp 7–12
Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5):378–382
Article Google Scholar
Gimpel K, Smith NA (2012) Structured ramp loss minimization for machine translation. In: Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: human language technologies, Montréal, Canada, pp 221–231
Hall K, McDonald RT, Katz-Brown J, Ringgaard M (2011) Training dependency parsers by jointly optimizing multiple objectives. In: Proceedings of the 2011 conference on empirical methods in natural language processing. Edinburgh, Scotland, pp 1489–1499
Heafield K (2011) KenLM: faster and smaller language model queries. In: WMT 2011: Proceedings of the 6th workshop on statistical machine translation. Edinburgh, Scotland, pp 187–197
Jiao F, Wang S, Lee CH, Greiner R, Schuurmans D (2006) Semi-supervised conditional random fields for improved sequence segmentation and labeling. In: Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the association for computational linguistics. Australia, Sydney, pp 209–216
Liang P, Bouchard-Côté A, Klein D, Taskar B (2006) An end-to-end discriminative approach to machine translation. Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the association for computational linguistics. Australia, Sydney, pp 761–768
Lin CY, Och FJ (2004) ORANGE: a method for evaluating automatic evaluation metrics for machine translation. Coling 2004: proceedings of the 20th international conference on computational linguistics. Geneva, Switzerland, pp 501–507
López-Salcedo FJ, Sanchis-Trilles G, Casacuberta F (2012) Online learning of log-linear weights in interactive machine translation. In: IberSPEECH: VII Jornadas en Tecnología del Habla and III Iberian SLTech Workshop, Madrid, Spain, pp 277–286
Madnani N, Resnik P, Dorr BJ, Schwartz R (2008) Are multiple reference translations necessary? Investigating the value of paraphrased reference translations in parameter optimization. In: AMTA-2008. MT at work: proceedings of the eighth conference of the association for machine translation in the Americas, Waikiki, Hawai’i, pp 143–152
Mann GS, McCallum A (2008) Generalized expectation criteria for semi-supervised learning of conditional random fields. In: ACL-08: HLT 46th annual meeting of the association for computational linguistics: Human language technologies, proceedings of the conference, Columbus, Ohio, USA, pp 870–878
Och FJ (2003) Minimum error rate training in statistical machine translation. In: ACL-2003: 41st annual meeting of the association for computational linguistics. Sapporo, Japan, pp 160–167
Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51
Article MATH Google Scholar
Ortiz-Martínez D, García-Varea I, Casacuberta F (2010) Online Learning for interactive statistical machine translation. In: NAACL HLT 2010 human language technologies: the 2010 annual conference of the North American chapter of the association for computational linguistics, proceedings of the main conference, Los Angeles, California, pp 546–554
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: 40th annual meeting of the association for computational linguistics, proceedings of the conference, Philadelphia, PA, USA, pp 311–318
Paul M (2009) Overview of the IWSLT 2009 evaluation campaign. In: IWSLT 2009: Proceedings of the international workshop on spoken language translation. Tokyo, Japan, pp 1–18
Ratliff ND, Bagnell JA, Zinkevich MA (2007) (Online) Subgradient methods for structured prediction. In: Eleventh international conference on artificial intelligence and statistics. San Juan, Puerto Rico, pp 380–387
Saluja A, Lane I, Zhang Y (2012) Machine translation with binary feedback: a large-margin approach. In: AMTA-2012: Proceedings of the tenth biennial conference of the association for machine translation in the Americas, San Diego, California
Smith NA (2011) Linguistic structure prediction. Synthesis lectures on human language technologies, vol. 4, No. 2, Morgan and Claypool, San Rafael, CA
Stolcke A (2002) SRILM—an extensible language modeling toolkit. In: Proceedings of the seventh international conference of spoken language processing (ICSLP2002). Denver, Colorado, USA, pp 901–904
Taskar B, Guestrin C, Koller D (2003) Max-margin Markov networks. In: Advances in neural information processing systems, vol. 16. Vancouver, Canada, pp 25–32
Tsochantaridis I, Joachims T, Hofmann T, Altun Y (2005) Large margin methods for structured and interdependent output variables. J Mach Learn Res 6:1453–1484
MathSciNet MATH Google Scholar
Watanabe T, Suzuki J, Tsukada H, Isozaki H (2007) Online large-margin training for statistical machine translation. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL). Czech Republic, Prague, pp 764–773
Weese J, Ganitkevitch J, Callison-Burch C, Post M, Lopez A (2011) Joshua 3.0: syntax-based machine translation with the thrax grammar extractor. In: WMT 2011: Proceedings of the 6th workshop on statistical machine translation, Edinburgh, Scotland, pp 478–484
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco
Google Scholar
Yu CNJ, Joachims T (2009) Learning structural SVMs with latent variables. In: ICML ’09: Proceedings of the 26th annual international conference on machine learning. Montréal, Canada, pp 1169–1176
Yuille AL, Rangarajan A (2003) The concave–convex procedure. Neural Comput 15(4):915–936
Article MATH Google Scholar
Zhu X (2008) Semi-supervised learning literature survey. Tech. Rep. 1530, Computer Sciences, University of Wisconsin-Madison, Madison, WI

Download references

Acknowledgments

We thank the various anonymous reviewers that have reviewed this work in its various forms in the past. The latent SSVM algorithm (without binary feedback) was implemented by the first author as a class project in conjunction with Jeff Flanigan. This work is partly supported by the Defense Advanced Research Projects Agency (DARPA) Transformative App program under the contract D11PC20022 and by the DARPA Broad Operational Language Translation (BOLT) project under Contract No. HR0011-12-C-0017.

Author information

Authors and Affiliations

Carnegie Mellon University, Pittsburgh, PA, 15213, USA
Avneesh Saluja & Ying Zhang

Authors

Avneesh Saluja
View author publications
You can also search for this author in PubMed Google Scholar
Ying Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Avneesh Saluja.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saluja, A., Zhang, Y. Online discriminative learning for machine translation with binary-valued feedback. Machine Translation 28, 69–90 (2014). https://doi.org/10.1007/s10590-014-9154-z

Download citation

Received: 22 September 2013
Accepted: 28 August 2014
Published: 10 September 2014
Issue Date: October 2014
DOI: https://doi.org/10.1007/s10590-014-9154-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Online discriminative learning for machine translation with binary-valued feedback

Abstract

Access this article

Similar content being viewed by others

Machine Translation Performance Prediction System: Optimal Prediction for Optimal Translation

Experimenting with Different Machine Translation Models in Medium-Resource Settings

Controlled Ascent: Imbuing Statistical MT with Linguistic Knowledge

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Online discriminative learning for machine translation with binary-valued feedback

Abstract

Access this article

Similar content being viewed by others

Machine Translation Performance Prediction System: Optimal Prediction for Optimal Translation

Experimenting with Different Machine Translation Models in Medium-Resource Settings

Controlled Ascent: Imbuing Statistical MT with Linguistic Knowledge

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation