Deep hierarchical encoding model for sentence semantic matching

https://doi.org/10.1016/j.jvcir.2020.102794Get rights and content

Abstract

Sentence semantic matching (SSM) always plays a critical role in natural language processing. Measuring the intrinsic semantic similarity among sentences is very challenging and has not been substantially addressed. The latest SSM research usually relies on a shallow text representation and interaction between sentence pairs, which might not be enough to capture the complex semantic features and lead to limited performance. To capture more semantic context features and interactions, we propose a hierarchical encoding model (HEM) for sentence representation, further enhanced by a hierarchical matching mechanism for sentence interaction. Given two sentences, HEM generates intermediate and final representations in encoding layer, which are further handled by a novel hierarchical matching mechanism to capture more multi-view interactions in matching layer. The comprehensive experiments demonstrate that our model is capable to capture more sentence semantic features and interactions, which significantly outperforms the existing state-of-the-art neural models on the public real-world dataset.

Introduction

Sentence semantic matching (SSM) is to compare two sentences and identify their semantic relationship, which is a key challenge in natural language processing (NLP) community [1], [2]. SSM is a fundamental task for various upper NLP applications. For instance, SSM has been utilized to infer the matching degree between questions and candidate answers for question answering (QA) [3], and to assess the relevance between queries and documents for information retrieval (IR) [4], [5]. While for natural language inference (NLI), SSM has been employed to predict whether a premise sentence can infer another hypothesis sentence [1], [6], and to judge whether two sentences have the same meaning and intention [2], [7] for semantic equivalence identification (SEI). Apparently, SSM task is significant for many NLP applications because of its irreplaceable role in semantic analysis.

There are two main challenges in SSM research work, (1) learning a proper text representation to capture semantic features within sentences, and (2) inventing a matching mechanism to capture the complex interactions and relationships between sentences.

To learn the proper text representation, various neural network models have already been applied with the prevalence of deep learning technology. For example, convolutional neural network (CNN) and its variants have been utilized to encode texts and achieved good results on SSM based text classification tasks [8]. Similarly, CNN has also been involved in encoding question and answer pairs for QA task, which outperforms traditional models [3], [9], and the attentive CNN model significantly improves NLI performance [10]. CNN’s strength is to automatically capture possible features in different aspects of input data and map them into a fixed-length embedding, which does not depend on any manually assigned features [11], [12], [13], [14]. Though CNN is good at capturing and representing local features with different kernels, it is not applicable for temporal tasks since it ignores the sequence information in texts. However, as word order is important for text semantic representation, most NLP tasks are dependent on the sequential information to make decisions. Therefore, it is necessary and crucial to capture and represent the sequential order features in texts. Recurrent neural network (RNN) is then proposed to consider the sequential information in texts [15]. As an extension model of RNN, long short-term memory (LSTM) is able to capture both long-term and short-term dependencies. LSTM is actually almost prevalent in NLP community since the distinguished performance. Many proposed LSTM based deep learning models are contributing a lot to various NLP tasks, including machine translation, text classification, question answering [16]. These deep learning based models are great at learning local or sequential semantic features. However, most of them only generate a final embedding without considering the hierarchical representation, which may lose some important intermediate encoding information [17].

To solve the second matching challenge, most existing work applies cosine similarity, Euclidean distance, Manhattan distance [3], [18], [19], [20]. Though these matching algorithms are fast, they fail to capture the hierarchical interactions, which limits the performance. Most importantly, the matching algorithms need to be adjusted in line with the specific text representation rather than independently designed. These existing approaches only lead to limited improvement as they simply consider the final information while ignoring the intermediate information in the sentence representation and interaction stages. However, both the intermediate and final information are actually crucial for sentence representation and matching strategy. Arguably, we believe that the hierarchical information deserve more research interests for SSM task. Both the intermediate and final semantic encoding should be utilized in sentence representation and matching algorithms.

Inspired by the work of deep residual learning and multi-view representation learning [17], [21], [22], [23], [24], [25], in order to thoroughly address the two challenges discussed above, we propose a hierarchical encoding model (HEM) for sentence representation on SSM task, further enhanced by a hierarchical matching mechanism for sentence interaction. Given two sentences, i.e., P and Q, we first utilize two bidirectional LSTM modules to encode them and concatenate the output embeddings. The concatenated embedding is the intermediate representation for the sentences. For each sentence, the intermediate embedding is then transferred to one CNN module, whose output is the final representation. Therefore, the hierarchical representation actually includes the intermediate and final representations. Then the hierarchical representation is further transferred to a hierarchical matching layer to generate a matching vector consisting of their complex interactions between sentences P and Q. Finally, according to the matching vector, a fully-connected multi-layer perceptron layer is utilized to judge the matching degree between sentences.

In this paper, we detail the hierarchical encoding model (HEM) for SSM task from methodology to implementation. To verify its superiority, comprehensive experiments are conducted on the real-world public dataset, i.e., BQ corpus [2]. The comparative experimental results demonstrate that HEM significantly outperforms the existing state-of-the-art models.

Our contributions are summarized as follows:

  • We propose a novel deep neural architecture for sentence semantic matching task, which includes embedding layer, hierarchical encoding layer, hierarchical matching layer and prediction layer. With the architecture, the deep hierarchical features and interactions could be captured and utilized to make a more reasonable prediction on sentence matching.

  • We propose a novel hierarchical encoding model (HEM) for sentence representation. HEM exploits the intermediate and final representations in encoding layer, which contributes to capture the deep semantic features within sentences.

  • We propose a new hierarchical matching mechanism for sentence interaction. The mechanism can fully consider the multi-view interactions in line with the hierarchical representation generated by HEM.

  • We present an extensive empirical study on SSM task which demonstrate that our proposed approach outperforms the state-of-the-art models. Our source code is publicly available on GitHub.1

The rest of the paper is structured as follows. We introduce the related SSM work in Section 2, and propose our new hierarchical encoding model and hierarchical matching mechanism in Section 3. Section 4 demonstrates the empirical experimental results, followed by the conclusions in Section 5.

Section snippets

Related work

Sentence semantic matching has attracted more and more attention in recent years [1], [26], [27], [28], [29]. Many NLP tasks such as IR, QA, NLI and etc., can be treated as SSM problem, which is a key challenge in NLP community. The key of SSM task lies on: (1) how to model the sentence and learn its representation, and (2) how to capture the complex interactions and relationships between sentences.

Most of the previous work on sentence representation focuses on the manually defined features,

Hierarchical encoding model for sentence matching

We propose a deep hierarchical encoding model for SSM, which includes embedding layer, encoding layer, matching layer and prediction layer, as shown in Fig. 1. In embedding layer, the two sentences, i.e., P and Q, are converted into vector representations. In encoding layer, the input sentences are further handled by enhanced BiLSTM module and hierarchical encoding module, which generate the intermediate and final representation of sentences in turn. In matching layer, in order to capture more

Experiments

To evaluate the performance of our proposed model, HEM and its customized variants are compared with seven state-of-the-art methods for SSM task on a public dataset.

Conclusion

In this paper, we approach the two key challenges in SSM task: (1) learning text presentation, and (2) inventing novel matching mechanism to capture complex interactions between sentences. We first propose a novel deep neural architecture for SSM, which is able to capture hierarchical features and interactions between sentences. We then present a hierarchical encoding model (HEM) for sentence representation, which combines the intermediate and final representation together in encoding layer, to

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

The research work is supported by National Nature Science Foundation of China under Grant No. 61502259, National Key Research and Development Program of China under Grant No. 2018YFC0831700, and Taishan Scholar Program of Shandong Province in China (Directed by Prof. Yinglong Wang).

References (49)

  • A. Conneau, H. Schwenk, L. Barrault, Y. Lecun, Very deep convolutional networks for text classification, in:...
  • S. Bai, J.Z. Kolter, V. Koltun, An empirical evaluation of generic convolutional and recurrent networks for sequence...
  • W. Yin et al.

    Attentive convolution: equipping cnns with rnn-style attention mechanisms

    Trans. Assoc. Comput. Linguist.

    (2018)
  • N. Kalchbrenner, E. Grefenstette, P. Blunsom, A convolutional neural network for modelling sentences, in: Proceedings...
  • Q. Zhou et al.

    Multi-scale deep context convolutional neural networks for semantic segmentation

    World Wide Web

    (2019)
  • D. Zeng et al.

    Aspect based sentiment analysis by a linguistically regularized cnn with gated mechanism

    J. Intell. Fuzzy Syst.

    (2019)
  • T. Mikolov, M. Karafiát, L. Burget, J. Černockỳ, S. Khudanpur, Recurrent neural network based language model, in:...
  • R. Johnson, T. Zhang, Supervised and semi-supervised text categorization using lstm for region embeddings, in:...
  • C. Wang, F. Jiang, H. Yang, A hybrid framework for text modeling with convolutional rnn, in: Proceedings of the 23rd...
  • S. Wan, Y. Lan, J. Guo, J. Xu, L. Pang, X. Cheng, A deep architecture for semantic matching with multiple positional...
  • J. Mueller, A. Thyagarajan, Siamese recurrent architectures for learning sentence similarity, in: Proceedings of the...
  • Z. Wang, W. Hamza, R. Florian, Bilateral multi-perspective matching for natural language sentences, in: Proceedings of...
  • K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference...
  • R. Lan et al.

    Prior knowledge-based probabilistic collaborative representation for visual recognition

    IEEE Trans. Cybern.

    (2020)
  • Cited by (0)

    View full text