Elsevier

Knowledge-Based Systems

Volume 156, 15 September 2018, Pages 1-11
Knowledge-Based Systems

Alignment-consistent recursive neural networks for bilingual phrase embeddings

https://doi.org/10.1016/j.knosys.2018.05.003Get rights and content

Abstract

Learning semantic representations of bilingual phrases is very important for statistical machine translation to overcome data sparsity and exploit semantic information. In this paper, we consider word alignments as a semantic bridge between the source and target phrases, and propose two neural networks based on the conventional recursive autocoder, which exploit word alignments to generate alignment-consistent bilingual phrase structures: One is Alignment Enhanced Recursive Autoencoder that incorporates a word-alignment-related error into the final objective function; The other is Alignment Guided Recursive Neural Network which treats word alignments as direct signals to guide phrase structure constructions. Then, we further establish the semantic correspondences between the source and target nodes of the generated bilingual phrase structures via word alignments. By jointly minimizing recursive autoencoder reconstruction errors, structural alignment consistency errors and cross-lingual reconstruction errors, our model not only generates alignment-consistent phrase structures, but also captures different levels of semantic correspondences within bilingual phrases. Experiments on the NIST Chinese-English translation task show that our model achieves significant improvements over the baseline.

Introduction

How to accurately learn the semantic representations of bilingual phrases has become a hot research topic in cross-lingual natural language processing (NLP) tasks, such as statistical machine translation, cross-lingual information retrieval, in the past decade. With the rapid development of deep learning, a variety of “deep architecture” approaches, including autoencoders [31], [32], [33], [34], [35], [36], have been successfully used to implement bilingual phrase embeddings [10], [20], [22], [38], [39], [40], [44], [46]. Originally, these approaches represent words as dense, low-dimensional and real-valued vectors. However, in addition to words, the exploited text units in the above-mentioned tasks often include phrases (sequence of words), of which syntactic and semantic information can not be adequately captured and represented by word embeddings. Therefore, learning vector representations for phrases or even longer expressions is crucial for successful “deep” cross-lingual NLP models.

Inspired by the success of work on monolingual phrase embeddings [3], [17], [18], [31], [32], [33], [34], [35], [36], many efforts have been made on learning bilingual phrase embeddings [9], [11], [21], [23], [45]. However, these studies mainly focus on capturing relations between entire source and target phrases, while ignoring internal phrase structures and bilingual correspondences of sub-phrases within source and target phrases. This is mainly due to the big challenge caused by the integration of them into the learning process of bilingual phrase representations. For this, we believe such internal structures and semantic correspondences may help us learn better phrase representations because they can provide multi-level syntactic and semantic constraints.

In this paper, we study how to leverage word alignments to learn better bilingual phrase structures and representations. Substantially extended from the conventional Bilingually-constrained Recursive Auto-encoders (BRAE) [45], we propose two neural networks exploring inner structure consistency to generate alignment-consistent phrase structures, and then model different levels of semantic correspondences within bilingual phrases to learn better bilingual phrase embeddings. The intuitions behind our model are twofold: (1) the generated bilingual phrase structures should satisfy word alignment constraints as much as possible; and (2) the corresponding sub-phrases on the source and target side of bilingual phrases should be able to reconstruct each other because they are semantic equivalents. To model the first intuition, we maximize the semantic combination scores of bilingual structures consistent with word alignments and minimize those of bilingual structures inconsistent with word alignments simultaneously. This enables our model to produce alignment-consistent bilingual phrase structures. With regard to the second intuition, we reconstruct sub-phrase structures of one language according to aligned nodes in the other language and minimize semantic distances between original and reconstructed structures. In doing so, our model is capable of capturing semantic correspondences at different levels.

To better illustrate our model, let us consider the example in Fig. 1. BRAE neglects the semantic correspondences of sub-phrases within bilingual phrases. Thus, it may combine “adopted” and “today” together to generate an undesirable target tree structure which violates word alignments. In contrast, our model aligns source-side nodes (e.g. “tōngguò” and “juéyì”) to their corresponding target-side nodes (accordingly “resolution” and “adopted”) according to word alignments. Furthermore, in our model, each subtree on the target side can be reconstructed from the corresponding source node that aligns to the target-side node and vice versa. These advantages allow us to obtain improved bilingual phrase embeddings with better inner correspondences of sub-phrases and word alignment consistency.

We conduct experiments with a state-of-the-art statistical machine translation (SMT) system on large-scale data to evaluate the effectiveness of the proposed model. Results on Chinese–English translation task show that our model achieves significant improvements over baselines. The main contributions of our work lie in the following three aspects:

  • We utilize word alignments to generate alignment-consistent bilingual phrase structures. To the best of our knowledge, this has not been investigated before.

  • We perform cross-lingual semantic unfolding to model semantic equivalence of sub-phrases within bilingual phrases. In this way, different granularities of semantic equivalents can be simultaneously exploited for bilingual phrase embeddings.

  • We integrate various similarity features based on bilingual phrase representations and tree structures learned by our model to enhance translation candidate selection in SMT.

The first proposed neural network has been presented in our previous paper [37]. In this paper, we make the following two significant extensions to our previous work:

  • Different from the BRAE and our previous work, where the generated phrase structures are affected by reconstruction errors, in the second proposed neural network, we further explore semantic composition metric which more directly exploits word alignments to guide the generation of bilingual phrase structures.

  • We carry out more experiments on larger scale data sets, providing more details of the proposed models. In particularly, we investigate the effects of various factors on our model. Besides, we deeply analyze how the proposed model improve translation quality through intrinsic and extrinsic evaluations on translation results.

The remainder of this article is organized as follows. Section 2 summarizes the related work and highlights the differences between our model and previous studies. Section 3 briefly describes the conventional recursive autoencoder (RAE) and BRAE, both of which are the bases of our model. Section 4 gives details of the proposed model such as modeling procedure, objective function and model training. Section 5 reports the experiment results on Chinese-to-English translation task. Section 6 deeply studies how thse learned bilingual phrasal similarities improve translation quality. Finally, we conclude in Section 7 with future directions.

Section snippets

Related work

The body of research work about bilingual phrase representation originate from the studies of monolingual phrase representation, which aim at learning a real-valued vector to semantically represent a given phrase. In this aspect, conventional studies mainly focus on designing manual and task-dependent features to represent phrase meanings [4]. Despite of their success, their inability in dealing with data sparseness limi ts their further development. To overcome this drawback, Bengio et al. [1]

RAE and BRAE

In this section, we briefly introduce the RAE and its bilingual variation BRAE, both of which are the bases of our proposed model.

The proposed model

As discussed previously, the learned phrase embeddings using BRAE may be unreasonable due to the neglect of internal phrase structures and bilingual constraints at different levels. To address this issue, we propose to leverage word alignments to learn bilingual phrase structures and representations. In our model, word alignments serve as a semantic bridge between the source and target phrases in two aspects: (1) ensuring that the learned bilingual phrase structures are consistent with word

Experiments

We conducted experiments on NIST Chinese–English translation task to validate the effectiveness of the proposed model.

Result analyses

In order to know how our model improves the SMT system more intuitively, we analyzed the experimental results from four angles.

First, we extracted phrase pairs from the word-aligned NIST MT data sets and constructed two negative examples for each phrase pair using the method described in Section 3.2. Then, we calculated the phrasal similarities between phrase pairs and non-translation ones using different models, and testified whether our model is able to better identify the correct phrase

Conclusion and future work

In this paper, we have studied how to leverage word alignments to learn better bilingual phrase structures and vector representations. Exploiting word alignments to encourage the generations of bilingual phrase structures to satisfy word alignment constraints as much as possible and minimizing the semantic distance between vector representations of original and reconstructed structures, our model is able to not only generate alignment-consistent phrase structures, but also capture different

Acknowledgments

The authors were supported by Beijing Advanced Innovation Center for Language Resources, National Natural Science Foundation of China (Nos. 61672440 and 61622209), Scientific Research Project of National Language Committee of China (Grant No. YB135-49), Natural Science Foundation of Fujian Province of China (No. 2016J05161).

References (46)

  • Y. Bengio et al.

    A neural probabilistic language model

    J. Mach. Learn. Res.

    (2013)
  • J. Bergstra et al.

    Random search for hyper-parameter optimization

    J. Mach. Learn. Res.

    (2012)
  • ChenD. et al.

    A fast and accurate dependency parser using neural networks

    Proceedings of the Empirical Methods in Natural Language Processing

    (2014)
  • ChenS.F. et al.

    An empirical study of smoothing techniques for language modeling

    Proceedings of the Annual Meeting on Association for Computational Linguistics

    (1996)
  • ChoK. et al.

    Learning phrase representations using RNN encoder–decoder for statistical machine translation

    Proceedings of the Empirical Methods in Natural Language Processing

    (2014)
  • ChungJ. et al.

    Empirical evaluation of gated recurrent neural networks on sequence modeling

    NIPS 2014 Workshop on Deep Learning, December

    (2014)
  • J.H. Clark et al.

    Better hypothesis testing for statistical machine translation: controlling for optimizer instability

    Proceedings of the Annual Meeting on Association for Computational Linguistics

    (2011)
  • R. Collobert et al.

    Natural language processing (almost) from scratch

    J. Mach. Learn. Res.

    (2011)
  • CuiL. et al.

    Learning topic representation for SMT with neural networks

    Proceedings of the Annual Meeting on Association for Computational Linguistics

    (2014)
  • J. Devlin et al.

    Fast and robust neural network joint models for statistical machine translation

    Proceedings of the Annual Meeting on Association for Computational Linguistics

    (2014)
  • GaoJ. et al.

    Learning continuous phrase representations for translation modeling

    Proceedings of the Annual Meeting on Association for Computational Linguistics

    (2014)
  • K.M. Hermann et al.

    Multilingual models for compositional distributed semantics

    Proceedings of the Annual Meeting on Association for Computational Linguistics

    (2014)
  • G. Hinton

    A practical guide to training restricted Boltzmann machines

    Neural Networks: Tricks of the Trade (2nd ed.)

    (2012)
  • S. Hochreiter et al.

    Long short-term memory

    Neural Comput.

    (1997)
  • HuB. et al.

    Context-dependent translation selection using convolutional neural network

    Proceedings of the Annual Meeting on Association for Computational Linguistics

    (2015)
  • N. Kalchbrenner et al.

    Recurrent continuous translation models

    Proceedings of the Empirical Methods in Natural Language Processing

    (2013)
  • N. Kalchbrenner et al.

    A convolutional neural network for modelling sentences

    Proceedings of the Annual Meeting on Association for Computational Linguistics

    (2014)
  • KimY.

    Convolutional neural networks for sentence classification

    Proceedings of the Empirical Methods in Natural Language Processing

    (2014)
  • P. Koehn

    Statistical significance tests for machine translation evaluation

    Proceedings of the Empirical Methods in Natural Language Processing

    (2004)
  • T. Kočiský et al.

    Learning bilingual word representations by marginalizing alignments

    Proceedings of the Annual Meeting on Association for Computational Linguistics

    (2014)
  • LiP. et al.

    Recursive autoencoders for ITG-based translation

    Proceedings of the Empirical Methods in Natural Language Processing

    (2013)
  • liul. et al.

    Additive neural networks for statistical machine translation

    Proceedings of the Annual Meeting on Association for Computational Linguistics

    (2013)
  • LiuS. et al.

    A recursive recurrent neural network for statistical machine translation

    Proceedings of the Annual Meeting on Association for Computational Linguistics

    (2014)
  • Cited by (6)

    • Kernel compositional embedding and its application in linguistic structured data classification

      2020, Knowledge-Based Systems
      Citation Excerpt :

      Although the word embeddings are sufficient to compute the similarity between words, it is not trivial to capture the meaning of phrases and sentences. To address this issue, modeling and learning compositionality in phrases have received a lot of attention recently [47,48]. Most of the compositionality algorithms and related datasets capture two-word compositions.

    • P-CNN: Enhancing text matching with positional convolutional neural network

      2019, Knowledge-Based Systems
      Citation Excerpt :

      The deep relevance matching model for ad-hoc retrieval [13] is another example of an interaction-focused model that feeds a neural network with histogram-based features that represent interactions between the query and the document. Su et al. [39] considered word alignments as a semantic bridge between the source and target phrases and proposed two neural networks for bilingual phrase representation learning based on a conventional recursive autocoder. Lee et al. [40] implemented the word attention mechanism in a CNN to identify the main words that contribute to sentiment classification.

    • WAD-X: Improving Zero-shot Cross-lingual Transfer via Adapter-based Word Alignment

      2023, ACM Transactions on Asian and Low-Resource Language Information Processing
    View full text