skip to main content
research-article

Towards Machine Translation in Semantic Vector Space

Published: 20 April 2015 Publication History

Abstract

Measuring the quality of the translation rules and their composition is an essential issue in the conventional statistical machine translation (SMT) framework. To express the translation quality, the previous lexical and phrasal probabilities are calculated only according to the co-occurrence statistics in the bilingual corpus and may be not reliable due to the data sparseness problem. To address this issue, we propose measuring the quality of the translation rules and their composition in the semantic vector embedding space (VES). We present a recursive neural network (RNN)-based translation framework, which includes two submodels. One is the bilingually-constrained recursive auto-encoder, which is proposed to convert the lexical translation rules into compact real-valued vectors in the semantic VES. The other is a type-dependent recursive neural network, which is proposed to perform the decoding process by minimizing the semantic gap (meaning distance) between the source language string and its translation candidates at each state in a bottom-up structure. The RNN-based translation model is trained using a max-margin objective function that maximizes the margin between the reference translation and the n-best translations in forced decoding. In the experiments, we first show that the proposed vector representations for the translation rules are very reliable for application in translation modeling. We further show that the proposed type-dependent, RNN-based model can significantly improve the translation quality in the large-scale, end-to-end Chinese-to-English translation evaluation.

References

[1]
Michael Auli, Michel Galley, Chris Quirk, and Geoffrey Zweig. 2013. Joint language and translation modeling with recurrent neural networks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1044--1054.
[2]
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. J. Machine Learn. Res. 3, 1137--1155.
[3]
Yoshua Bengio, Holger Schwenk, Jean-Sébastien Senécal, Fréderic Morin, and Jean-Luc Gauvain. 2006. Neural probabilistic language models. In Innovations in Machine Learning. Studies in Fuzziness and Soft Computing, Springer, Verlag, Berlin, Heidelberg, 137--186.
[4]
David Chiang. 2007. Hierarchical phrase-based translation. Comput. Linguistics 33, 2, 201--228.
[5]
Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning. 160--167.
[6]
Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. J. Machine Learn. Res. 12, 2493--2537.
[7]
Jacob Devlin, Rabih Zbib, Zhongqiang Huang, Thomas Lamar, Richard Schwartz, and John Makhoul. 2014. Fast and robust neural network joint models for statistical machine translation. In Proceedings of the 52nd ACL. 1370--1380.
[8]
John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. J. Machine Learn. Res. 12, 2121--2159.
[9]
Matthias Eck, Stephen Vogal, and Alex Waibel. 2007. Estimating phrase pair relevance for translation model pruning. In Proceedings of the Machine Translation Summit XI.
[10]
Michel Galley, Jonathan Graehl, Kevin Knight, Daniel Marcu, Steve DeNeefe, Wei Wang, and Ignacio Thayer. 2006. Scalable inference and training of context-rich syntactic translation models. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 961--968.
[11]
Jianfeng Gao, Xiaodong He, Wen-tau Yih, and Li Deng. 2014. Learning continuous phrase represenations for translation modeling. In Proceedings of the 52nd ACL. 699--709.
[12]
John Howard Johnson, Joel Martin, George Foster, and Roland Kuhn. 2007. Improving translation quality by discarding most of the phrasetable. In Proceedings of EMNLP.
[13]
Liang Huang, Kevin Knight, and Aravind Joshi. 2006. Statistical syntax-directed translation with extended domain of locality. In Proceedings of AMTA. 66--73.
[14]
Nal Kalchbrenner and Phil Blunsom. 2013. Recurrent continuous translation models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1700--1709.
[15]
Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of EMNLP. 388--395.
[16]
Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyes, Ondřej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. Association for Computational Linguistics, 177--180.
[17]
Peng Li, Yang Liu, and Maosong Sun. 2013. Recursive autoencoders for ITG-based translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.
[18]
Wang Ling, Joao Graça, Isabel Trancoso, and Alan Black. 2012. Entropy-based pruning for phrase-based machine translation. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 962--971.
[19]
Lemao Liu, Taro Watanabe, Eiichiro Sumita, and Tiejun Zhao. 2013. Additive neural networks for statistical machine translation. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 791--801.
[20]
Yang Liu, Qun Liu, and Shouxun Lin. 2006. Tree-to-string alignment template for statistical machine translation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 609--616.
[21]
Tomas Mikolov. 2012. Statistical language models based on neural networks. Ph.D Dissertation. Brno University of Technology.
[22]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of NIPS.
[23]
Franz Josef Och and Hermann Ney. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 295--302.
[24]
Nathan D. Ratliff, J. Andrew Bagnell, and Martin Zinkevich. 2007. (Approximate) Subgradient methods for structured prediction. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 380--387.
[25]
Holger Schwenk. 2010. Continuous-space language models for statistical machine translation. Prague Bullet. Math. Linguistics 93, 137--146.
[26]
Holger Schwenk. 2012. Continuous space translation models for phrase-based statistical machine translation. In Proceedings of the 24th COLING. 1071--1080.
[27]
Richard Socher, John Bauer, Christopher D. Manning, and Andrew Y. Ng. 2013. Parsing with compositional vector grammars. In Proceedings of ACL.
[28]
Richard Socher, Christopher D. Manning, and Andrew Y. Ng. 2010. Learning continuous phrase representations and syntactic parsing with recursive neural networks. In Proceedings of the NIPS-2010 Deep Learning and Unsupervised Feature Learning Workshop.
[29]
Richard Socher, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, and Christopher D. Manning. 2011. Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 151--161.
[30]
Nadi Tomeh, Nicola Cancedda, and Marc Dymetman. 2009. Complexity-based phrase-table filtering for statistical machine translation. In Proceedings of Summit XII. 144--151.
[31]
Ashish Vaswani, Yinggong Zhao, Victoria Fossum, and David Chiang. 2013. Decoding with large-scale neural language models improves translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1387--1392.
[32]
Dekai Wu. 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computat. Linguistics 23, 3, 377--403.
[33]
Deyi Xiong, Qun Liu, and Shouxun Lin. 2006. Maximum entropy based phrase reordering model for statistical machine translation. In Proceedings of ACL-COLING. 505--512.
[34]
Richard Zens, Daisy Stanton, and Peng Xu. 2012. A systematic comparison of phrase table pruning techniques. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 972--983.
[35]
Jiajun Zhang, Shujie Liu, Mu Li, Ming Zhou, and Chengqing Zong. 2014a. Bilingually-constrained phrase embeddings for machine translation. In Proceedings of the 52th Annual Meeting on Association for Computational Linguistics.
[36]
Jiajun Zhang, Shujie Liu, Mu Li, Ming Zhou, and Chengqing Zong. 2014b. Mind the gap: Machine translation by minimizing the semantic gap in embedding space. In Proceedings of the 28th AAAI.
[37]
Jiajun Zhang, Feifei Zhai, and Chengqing Zong. 2011. Augmenting string-to-tree translation models with fuzzy use of source-side syntax. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 204--215.
[38]
Jiajun Zhang, Feifei Zhai, and Chengqing Zong. 2013. Syntax-based translation with bilingually lexicalized synchronous tree substitution grammars. IEEE Trans. Audio, Speech, Lang. Process. 21, 8, 1586--1597.
[39]
Min Zhang, Hongfei Jiang, Aiti Aw, Haizhou Li, Chew Lim Tan, and Sheng Li. 2008. A tree sequence alignment-based tree-to-tree translation model. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL). 559--567.
[40]
Will Y. Zou, Richard Socher, Daniel Cer, and Christopher D. Manning. 2013. Bilingual word embeddings for phrase-based machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1393--1398.

Cited By

View all
  • (2020)An efficient Long Short-Term Memory model based on Laplacian Eigenmap in artificial neural networksApplied Soft Computing10.1016/j.asoc.2020.10621891(106218)Online publication date: Jun-2020

Index Terms

  1. Towards Machine Translation in Semantic Vector Space

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 14, Issue 2
    March 2015
    96 pages
    ISSN:2375-4699
    EISSN:2375-4702
    DOI:10.1145/2764912
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 April 2015
    Accepted: 01 October 2014
    Revised: 01 August 2014
    Received: 01 April 2014
    Published in TALLIP Volume 14, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. max-margin training
    2. recursive neural network
    3. semantic meaning distance
    4. statistical machine translation
    5. vector embedding space

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • Natural Science Foundation of China
    • High New Technology Research and Development Program of Xinjiang Uyghur Autonomous Region
    • International Science & Technology Cooperation Program of China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)An efficient Long Short-Term Memory model based on Laplacian Eigenmap in artificial neural networksApplied Soft Computing10.1016/j.asoc.2020.10621891(106218)Online publication date: Jun-2020

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media