Interpretable duplicate question detection models based on attention mechanism

doi:10.1016/j.ins.2020.07.048

Information Sciences

Volume 543, 8 January 2021, Pages 259-272

https://doi.org/10.1016/j.ins.2020.07.048 Get rights and content

Abstract

Recently, there exist growing concerns about the interpretability of deep learning models. While few of these models have been applied to a duplicate question detection task, which aims at finding semantically equivalent question pairs of question answering forum. In this paper, based on an attention mechanism, we propose two modularized interpretable deep neural network models for such tasks. During the word precessing procedure, a filter operation is employed to enhance the relevant information contained in the pre-trained word embeddings. Regarding the word matching and sentence representation process, vanilla attention and structured attention mechanisms are utilized, respectively. Benefiting from the interpretability of attention techniques, our models can illustrate how the words match between sentence pairs and what aspects of the sentences are extracted to have an effect on the final decision. The attention visualization furnishes us with detailed representation at word and sentence level. And experimental results show that our models are comparable with other reported models.

Introduction

With the development of question answering community like Quora,¹ Reddit² and Stack Overflow,³ more and more individuals tend to post and answer questions in such forums. However, more and more newly posted questions cannot be answered quickly for the reason that fews have a proper recommended answer due to the lack of popularity. In practice, many newly posted questions have the similar or even the same answers as the stored ones’. Specifically, many questions, posted in different formulations, are of semantic equivalence, thus similar answers must be well matched to the corresponding ones. Therefore, to answer a newly posted question as quickly as possible, we can directly take advantage of the answers to the duplicate or nearly duplicate questions stored in the forums, which can bring a great convenience to question answering forum.

According to [3], two questions are regarded as duplicate question pair if they share common answers. This task is similar to the text similarity one. While the text similarity task has graded outputs indicating how similar they are, the output of duplicate question detection is yes $/$ No. Thus this task can be viewed as a subtask of a text similarity identification that has a binary output and only contains the interrogative text.

For the text similarity task, many recently proposed deep learning models have achieved the state-of-the-art results [8], [26], [24], [9]. However, the existing methods have such limitations: (1) In many models, the final matching score is calculated through two individual sentence vectors. Considering each sentence pair as a single input, we argue that putting two sentences together to capture the integral information must be a more natural method. (2) For most of the sentence pair modeling models, the generated sentence vectors only contain single-aspect information. Nevertheless, multi-aspect information can be extracted from one sentence focusing on different part of it, which could definitely enrich the sentence representation. (3) Recently, the popularity of model interpretability is increasing [15], [4], which is very helpful for improving the quality of the online question answering forums. But many existing models proposed in [26], [3], [1] have no ability to tell us which words or phrases of two sentences contribute more to the final decision.

To address the aforementioned limitations, in this paper, two interpretable matching models based on attention mechanism are proposed to implement sentence pairs. Our models first match words of one sentence with words of the other sentence, then perform sentence level interaction to generate similarity vectors (see Fig. 1). The final decision will be made based on the vectors. Main contributions of this work are summarized as follows: (1) A filter operation is proposed to eliminate the redundant information of pre-trained word embeddings. (2) Based on structured attention [14], two models are proposed to enhance the sentence level information. (3) The proposed models are interpretable deep models, which will provide the semantic matching process illustrated.

The rest of the paper is organized as follows. Section 2 introduces the related work. Section 3 illustrates and details the proposed model architecture. Section 4 and Section 5 show the datasets and the experimental results. Final conclusion and future work are described in Section 6.

Section snippets

Related work

In natural language processing and information retrieval, there exist a large number of models for sentence pair modeling task, which includes the duplicate question detection task.

For the sentence pair modeling problem, many models apply the recurrent neural network (RNN) architecture. Due to the effectiveness of attention-based models in many NLP tasks [2], [22], some recent models try to utilize the attention mechanism to capture word interaction information to improve model performance.

The interpretable models

Let $P = {p_{1}, p_{2}, \dots, p_{n}}$ and $Q = {q_{1}, q_{2}, \dots, q_{m}}$ be the sentence pair, where $p_{i}, q_{j} \in R^{d}$ are pre-trained word embeddings, and n and m are the length of P and Q, respectively. Their relationship is represented by the label y. The overall architecture of the proposed model is depicted in Fig. 1.

For the sake of simplicity, the dashed lines is utilized to illustrate the BiLSTM layer, the sequential processing units which will be formulated in detail. We propose different attention-based matching methods at

Datasets

The proposed models (InteMatch and SenMatch) are tested on the different datasets which are collected from different question answering forums.

Parameter settings

The word embeddings are initialized with the pre-trained 300-dimensional GloVe word vectors [18]. And the out-of-vocabulary (OOV) words are randomly initialized. During the training process, the word embeddings are set fixed. We employ the Adam optimization method [12] to minimize the cross-entropy loss and update trainable parameters. The hidden size is set as 300 for BiLSTM layers. We apply dropout for all feedforward layers, and the dropout ratio is set as 0.1. The batch size for Quora

Conclusion and future work

In this paper, we propose two interpretable sentence matching models based on attention mechanism. Compared with other methods (w/o the attention mechanism), the proposed models achieve comparable performance on the large-scale duplicate question detection dataset and the best performance on almost the small-scale datasets.

In our models, we apply a filter operation to preprocess pre-trained word embeddings, which avoids discussing whether to update the word embeddings during training phrase.

CRediT authorship contribution statement

Qifeng Zhou: Conceptualization, Methodology, Writing - review & editing, Supervision. Xiang Liu: Methodology, Software, Writing - original draft. Qing Wang: Writing - review & editing, Validation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported in part by the National Science Foundation of Fujian Province (China) under Grant No. 2017J01118 and Shenzhen Science and Technology Planning Program under Grant No. JCYJ20170307141019252.

References (27)

J. Antonio Rodrigues, C. Saedi, V. Maraev, J. Silva, A. Branco, Ways of asking and replying in duplicate question...
D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to align and translate. CoRR...
D. Bogdanova et al.
Detecting semantically equivalent questions in online user forums
S. Chakraborty, R. Tomsett, R. Raghavendra, D. Harborne, M. Alzantot, F. Cerutti, M.B. Srivastava, A.D. Preece, S....
Q. Chen et al.
Enhanced LSTM for natural language inference
A. Conneau et al.
Supervised learning of universal sentence representations from natural language inference data
J. Devlin et al.
BERT: Pre-training of deep bidirectional transformers for language understanding
R. Ghaeini et al.
Dr-bilstm: Dependent reading bidirectional lstm for natural language inference
Y. Gong, H. Luo, J. Zhang, Natural language inference over interaction space. CoRR abs/1709.04348. 2017....
H. He, J. Lin, Pairwise word interaction modeling with deep neural networks for semantic similarity measurement, in:...

E. Hoffer et al.

Deep metric learning using triplet network

D.P. Kingma, J. Ba, Adam: A method for stochastic optimization. CoRR abs/1412.6980. 2014....

W. Lan et al.

Neural network models for paraphrase identification, semantic textual similarity, natural language inference, and question answering, in

Cited by (0)

View full text