research-article

Towards Machine Translation in Semantic Vector Space

Authors:

Chengqing ZongAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), Volume 14, Issue 2

Article No.: 9, Pages 1 - 26

https://doi.org/10.1145/2699927

Published: 20 April 2015 Publication History

Abstract

Measuring the quality of the translation rules and their composition is an essential issue in the conventional statistical machine translation (SMT) framework. To express the translation quality, the previous lexical and phrasal probabilities are calculated only according to the co-occurrence statistics in the bilingual corpus and may be not reliable due to the data sparseness problem. To address this issue, we propose measuring the quality of the translation rules and their composition in the semantic vector embedding space (VES). We present a recursive neural network (RNN)-based translation framework, which includes two submodels. One is the bilingually-constrained recursive auto-encoder, which is proposed to convert the lexical translation rules into compact real-valued vectors in the semantic VES. The other is a type-dependent recursive neural network, which is proposed to perform the decoding process by minimizing the semantic gap (meaning distance) between the source language string and its translation candidates at each state in a bottom-up structure. The RNN-based translation model is trained using a max-margin objective function that maximizes the margin between the reference translation and the n-best translations in forced decoding. In the experiments, we first show that the proposed vector representations for the translation rules are very reliable for application in translation modeling. We further show that the proposed type-dependent, RNN-based model can significantly improve the translation quality in the large-scale, end-to-end Chinese-to-English translation evaluation.

References

[1]

Michael Auli, Michel Galley, Chris Quirk, and Geoffrey Zweig. 2013. Joint language and translation modeling with recurrent neural networks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1044--1054.

[2]

Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. J. Machine Learn. Res. 3, 1137--1155.

Digital Library

[3]

Yoshua Bengio, Holger Schwenk, Jean-Sébastien Senécal, Fréderic Morin, and Jean-Luc Gauvain. 2006. Neural probabilistic language models. In Innovations in Machine Learning. Studies in Fuzziness and Soft Computing, Springer, Verlag, Berlin, Heidelberg, 137--186.

[4]

David Chiang. 2007. Hierarchical phrase-based translation. Comput. Linguistics 33, 2, 201--228.

Digital Library

[5]

Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning. 160--167.

Digital Library

[6]

Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. J. Machine Learn. Res. 12, 2493--2537.

Digital Library

[7]

Jacob Devlin, Rabih Zbib, Zhongqiang Huang, Thomas Lamar, Richard Schwartz, and John Makhoul. 2014. Fast and robust neural network joint models for statistical machine translation. In Proceedings of the 52nd ACL. 1370--1380.

[8]

John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. J. Machine Learn. Res. 12, 2121--2159.

Digital Library

[9]

Matthias Eck, Stephen Vogal, and Alex Waibel. 2007. Estimating phrase pair relevance for translation model pruning. In Proceedings of the Machine Translation Summit XI.

[10]

Michel Galley, Jonathan Graehl, Kevin Knight, Daniel Marcu, Steve DeNeefe, Wei Wang, and Ignacio Thayer. 2006. Scalable inference and training of context-rich syntactic translation models. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 961--968.

Digital Library

[11]

Jianfeng Gao, Xiaodong He, Wen-tau Yih, and Li Deng. 2014. Learning continuous phrase represenations for translation modeling. In Proceedings of the 52nd ACL. 699--709.

[12]

John Howard Johnson, Joel Martin, George Foster, and Roland Kuhn. 2007. Improving translation quality by discarding most of the phrasetable. In Proceedings of EMNLP.

[13]

Liang Huang, Kevin Knight, and Aravind Joshi. 2006. Statistical syntax-directed translation with extended domain of locality. In Proceedings of AMTA. 66--73.

[14]

Nal Kalchbrenner and Phil Blunsom. 2013. Recurrent continuous translation models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1700--1709.

[15]

Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of EMNLP. 388--395.

[16]

Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyes, Ondřej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. Association for Computational Linguistics, 177--180.

Digital Library

[17]

Peng Li, Yang Liu, and Maosong Sun. 2013. Recursive autoencoders for ITG-based translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.

[18]

Wang Ling, Joao Graça, Isabel Trancoso, and Alan Black. 2012. Entropy-based pruning for phrase-based machine translation. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 962--971.

Digital Library

[19]

Lemao Liu, Taro Watanabe, Eiichiro Sumita, and Tiejun Zhao. 2013. Additive neural networks for statistical machine translation. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 791--801.

[20]

Yang Liu, Qun Liu, and Shouxun Lin. 2006. Tree-to-string alignment template for statistical machine translation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 609--616.

Digital Library

[21]

Tomas Mikolov. 2012. Statistical language models based on neural networks. Ph.D Dissertation. Brno University of Technology.

[22]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of NIPS.

Digital Library

[23]

Franz Josef Och and Hermann Ney. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 295--302.

Digital Library

[24]

Nathan D. Ratliff, J. Andrew Bagnell, and Martin Zinkevich. 2007. (Approximate) Subgradient methods for structured prediction. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 380--387.

[25]

Holger Schwenk. 2010. Continuous-space language models for statistical machine translation. Prague Bullet. Math. Linguistics 93, 137--146.

[26]

Holger Schwenk. 2012. Continuous space translation models for phrase-based statistical machine translation. In Proceedings of the 24th COLING. 1071--1080.

[27]

Richard Socher, John Bauer, Christopher D. Manning, and Andrew Y. Ng. 2013. Parsing with compositional vector grammars. In Proceedings of ACL.

[28]

Richard Socher, Christopher D. Manning, and Andrew Y. Ng. 2010. Learning continuous phrase representations and syntactic parsing with recursive neural networks. In Proceedings of the NIPS-2010 Deep Learning and Unsupervised Feature Learning Workshop.

[29]

Richard Socher, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, and Christopher D. Manning. 2011. Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 151--161.

Digital Library

[30]

Nadi Tomeh, Nicola Cancedda, and Marc Dymetman. 2009. Complexity-based phrase-table filtering for statistical machine translation. In Proceedings of Summit XII. 144--151.

[31]

Ashish Vaswani, Yinggong Zhao, Victoria Fossum, and David Chiang. 2013. Decoding with large-scale neural language models improves translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1387--1392.

[32]

Dekai Wu. 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computat. Linguistics 23, 3, 377--403.

Digital Library

[33]

Deyi Xiong, Qun Liu, and Shouxun Lin. 2006. Maximum entropy based phrase reordering model for statistical machine translation. In Proceedings of ACL-COLING. 505--512.

Digital Library

[34]

Richard Zens, Daisy Stanton, and Peng Xu. 2012. A systematic comparison of phrase table pruning techniques. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 972--983.

Digital Library

[35]

Jiajun Zhang, Shujie Liu, Mu Li, Ming Zhou, and Chengqing Zong. 2014a. Bilingually-constrained phrase embeddings for machine translation. In Proceedings of the 52th Annual Meeting on Association for Computational Linguistics.

[36]

Jiajun Zhang, Shujie Liu, Mu Li, Ming Zhou, and Chengqing Zong. 2014b. Mind the gap: Machine translation by minimizing the semantic gap in embedding space. In Proceedings of the 28th AAAI.

[37]

Jiajun Zhang, Feifei Zhai, and Chengqing Zong. 2011. Augmenting string-to-tree translation models with fuzzy use of source-side syntax. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 204--215.

Digital Library

[38]

Jiajun Zhang, Feifei Zhai, and Chengqing Zong. 2013. Syntax-based translation with bilingually lexicalized synchronous tree substitution grammars. IEEE Trans. Audio, Speech, Lang. Process. 21, 8, 1586--1597.

Digital Library

[39]

Min Zhang, Hongfei Jiang, Aiti Aw, Haizhou Li, Chew Lim Tan, and Sheng Li. 2008. A tree sequence alignment-based tree-to-tree translation model. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL). 559--567.

[40]

Will Y. Zou, Richard Socher, Daniel Cer, and Christopher D. Manning. 2013. Bilingual word embeddings for phrase-based machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1393--1398.

Cited By

Hu FZhu YLiu JLi L(2020)An efficient Long Short-Term Memory model based on Laplacian Eigenmap in artificial neural networksApplied Soft Computing10.1016/j.asoc.2020.10621891(106218)Online publication date: Jun-2020
https://doi.org/10.1016/j.asoc.2020.106218

Index Terms

Towards Machine Translation in Semantic Vector Space
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Syntactic discriminative language model rerankers for statistical machine translation

This article describes a method that successfully exploits syntactic features for n-best translation candidate reranking using perceptrons. We motivate the utility of syntax by demonstrating the superior performance of parsers over n-gram language ...
Improving Semantic Parsing with Enriched Synchronous Context-Free Grammars in Statistical Machine Translation
TALLIP Notes and Regular Papers

Semantic parsing maps a sentence in natural language into a structured meaning representation. Previous studies show that semantic parsing with synchronous context-free grammars (SCFGs) achieves favorable performance over most other alternatives. ...
Dependency treelet translation: the convergence of statistical and example-based machine-translation?

We describe a novel approach to MT that combines the strengths of the two leading corpus-based approaches: Phrasal SMT and EBMT. We use a syntactically informed decoder and reordering model based on the source dependency tree, in combination with ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 14, Issue 2

March 2015

96 pages

ISSN:2375-4699

EISSN:2375-4702

DOI:10.1145/2764912

Editor:
Richard Sproat
Google, Inc., USA

Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 April 2015

Accepted: 01 October 2014

Revised: 01 August 2014

Received: 01 April 2014

Published in TALLIP Volume 14, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Natural Science Foundation of China
High New Technology Research and Development Program of Xinjiang Uyghur Autonomous Region
International Science & Technology Cooperation Program of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
334
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hu FZhu YLiu JLi L(2020)An efficient Long Short-Term Memory model based on Laplacian Eigenmap in artificial neural networksApplied Soft Computing10.1016/j.asoc.2020.10621891(106218)Online publication date: Jun-2020
https://doi.org/10.1016/j.asoc.2020.106218

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents