Abstract
Deception occurring in multi-turn question answering (QA) circumstances such as interviews, court depositions, and online marketplaces, can cause serious consequences. Due to the lack of proper datasets and difficulty of finding deceptive signals, existing deception detection methods haven’t utilized the QA contexts to detect deception. Previous methods that mainly focus on context-free deception detection cannot be applied to text-based QA contexts. Therefore, we design a novel Context Selector Network (CSN) to address the challenge of modeling the context-sensitive dependencies implied in multi-turn QA. We utilize BERT to obtain sentence embeddings first and then design a context selector to explore crucial deceptive signals implied in the QA contexts automatically. Towards real-life scenarios, we collect a high-quality dataset containing multi-turn QAs that consist of sequential dependent QA pairs. Compared with several state-of-the-art baselines, experimental results show the impressive effectiveness of the proposed model.
1 Introduction
Deception often occurs in certain contexts of daily life, which can cause severe consequences and losses to individuals and society. Automatic deception detection methods towards multi-turn QA can benefit many applications, such as criminal interrogation, court depositions, interviews, and online marketplaces. However, text-based context deception detection has not been explored sufficiently [29] mainly due to the lack of proper datasets and difficulty of finding deceptive signals. To alleviate this problem, we focus on deception detection in a multi-turn QA which aims at classifying each QA pair as deception or not, through the analysis of the context.
Existing deception detection methods heavily rely on hand-crafted features including verbal [7, 11, 19, 21, 22, 27, 31, 32] and non-verbal [6, 18, 25, 28] cues explored from different modals, ignoring the use of semantic information implied in contexts and could not be applied to multi-turn QA data. Some tasks such as dialogue system [15] and multi-turn question answering [14, 30] seem to be similar to our task. However, they cannot be regarded as classification tasks, and cannot be directly applied to our task which is formed as a sentence-pair classification task. Thus, it is necessary to propose a novel approach for deceptive QA pairs recognition.
Intuitively, information implied in contexts is needed to understand the subjective beliefs of a speaker, which is an essential step to detect deceit [13]. For example, we cannot judge which QA pairs in Table 1 are deceptive or not without the given contexts. Furthermore, the features of deception are implicit and difficult to be detected. Due to the sparsity and complexity of latent deceptive signals, treating all of the context information equally will obstruct the model performance. As shown in Table 1, Turn-5 is relatively less relevant to Turn-2 while Turn-1, 3, and 4 are closely related to Turn-2. Taking all of the contexts into account probably hurt the model’s ability to recognize deception.
We propose two hypotheses: (1) QA context is conducive to detect deceit. (2) Noises implied in QA context hinder the accurate identification of deception. To address these two assumptions, we use BERT [5] to get context-independent sentence embeddings and BiGRU [3] to get context-aware sentence embeddings. More importantly, a novel context selector is proposed to filter out noise in the contexts. Due to the lack of a proper dataset, we construct a multi-turn QA dataset containing sequential dependent QA pairs for the experiments. We design different questionnaires covering six daily life topics to collect deceptive and non-deceptive data. Our contributions are:
(1) We make the first attempt to tackle the multi-turn QA-style deception detection problem, and design a novel Context Selection Network (CSN) to explore deceptive signals implied in the contexts effectively.
(2) To fill the gap of deception detection in multi-turn QA, a newly collected dataset Deception QA is presented for the target task.
(3) Comparing with several deep learning-based baselines, our model achieves the best performance on the collected dataset, showing its effectiveness.
2 Related Work
2.1 Deception Detection
To address the problem of automatic deception detection, researchers have carried out a series of studies in different scenarios such as the social network scenario and daily life scenario.
Social network-based deception detection has been studied for a long period in the research community. Most of them utilize propagation pattern of deceptive information [2] and interactions between multiple users [17] to detect deception. However, these features don’t exist in a multi-turn QA under the daily life situation. As a result, these methods cannot be applied to this new kind of task directly.
In addition, deception often occurs in the daily life scenario. Researchers have analyzed the features that can be used to detect deception. These features can be classified as linguistic features and interactions between individuals.
Linguistic Features: Some researches have shown the effectiveness of features derived from text analysis, which includes basic linguistic representations such as n-grams and Linguistic Inquiry and Word Count (LIWC) [19, 22], and more complex linguistic features derived from syntactic CFG trees and part of speech tags [7, 31]. Based on these research findings, many studies focused on text-based methods, recognizing deceptive languages in games [1, 27], online reviews [20], news articles [26], and interviews [12].
Interactions Between Individuals: Apart from linguistic features implied in texts, interactions between individuals can also have a beneficial effect on detecting deceit. Tsunomori et al. [29] examined the effect of question types and individuals’ behaviors on participants. Findings of the study show that specific questions led to more salient deceptive behavior patterns in participants which resulted in better deception detection performance.
These studies show that linguistic features and interaction between individuals in contexts contribute to deception detection. Therefore, deception detection in a text-based multi-turn QA is significant and reasonable. Although deceptive behavior often occurs in a multi-turn QA under daily life situation, due to the difficulty of finding deceptive signals and deception data collection and annotation, no work has been done on cues of deception drawing from text-based QA contexts. Unlike all the prior studies, this paper focuses on a novel task, that is, deception detection in a multi-turn QA. To the best of our knowledge, our work is the first attempt to perform deception detection in multi-turn QA.
2.2 Datasets Comparison
There have been a few of datasets based on different modalities developed for deception detection, such as text-based datasets [19, 21, 27, 32], audio-based datasets [9, 11] and multimodal-based datasets [24, 25, 28].
Some researchers proposed text-based datasets for deception detection. Ott et al. [21] developed the Ott Deceptive Opinion Spam corpus, which consists of 800 true reviews and 800 deceptive reviews. Mihalces et al. [19] collected data from three written deception tasks. Zhou and Sung [32] collected 1192 Mafia games from a popular Chinese website. de Ruiter and Kachergis [27] proposed the Mafiascum dataset, a collection of over 700 games of Mafia.
In addition to text-based datasets, some studies have developed audio-based datasets. Hirschberg et al. [9] were the first to propose audio-based corpus, which consists of 32 interviews averaging 30 min. Levitan et al. [11] collected a much larger corpus than it. However, these two datasets are not public available and free. Furthermore, it is hard to model the contextual semantics only based on the audio modality.
The multimodal datasets were all collected from public multimedia sources, such as public court trials [24], street interviews aired in television shows [25], and the Box of Lies game in a TV show [28]. The data cannot be annotated by the people who express deception or non-deception. The researchers labeled the data themselves after data collection, which may introduce human bias. Existing public multimedia sources also cannot provide adequate labeled samples for deep learning based deception detection methods. Moreover, compared with text data, processing multimodal data requires more computing resource.
3 Model
3.1 Problem Formalization
Suppose that we have a dataset \(D={\{U_i,Y_i}\}^N_{i=1}\), where \(U_i={\{q_{il}, a_{il}}\}^L_{l=1}\) represents a multi-turn QA with L QA pairs and every sentence in a multi-turn QA contains T words. N is the number of multi-turn QAs in the dataset. \(Y_i={\{y_{il}}\}^L_{l=1}\) where \(y_{il}\in {\{0, 1}\}\) denotes the label of a QA pair. \(y_{il}=1\) means \({\{q_{il}, a_{il}\}}\) is deceptive, otherwise \(y_{il}=0\). Given the dataset, the goal of deception detection is to learn a classifier \(f: U\rightarrow Y\), where U and Y are the sets of QA pairs and labels respectively, to predict the label of QA pairs based on the context information in a multi-turn QA.
3.2 Model Overview
We propose CSN that generates context-independent sentence embeddings first, and then selects contexts for the target question and answer respectively to filter out the noise, and then utilizes the context encoder to get context-aware sentence embeddings. As illustrated in Fig. 1, the proposed model consists of Word Encoder, Context Selector, Context Encoder, and Question Answer Pair Classifier.
3.3 Word Encoder
Since the form of data collection is to design questions first and then collect corresponding answers, we treat a multi-turn QA as a combination of one question sequence and one answer sequence. The \(l_{th}\) question and answer with T words in the \(i_{th}\) multi-turn QA are defined as \({\{w^Q_{l1}, ..., w^Q_{lT}}\}\) and \({\{w^A_{l1}, ..., w^A_{lT}}\}\) respectively. We feed both sentences into the pre-trained BERT and obtain context-independent sentence embeddings, which are defined as \(g^Q_{l}, g^A_{l}\) for the question and answer respectively. In the experiments, we also replace BERT with BiGRU which proves the effectiveness of BERT.
3.4 Context Selector
Given a multi-turn QA and its sentence representations, we treat the questions and answers as two contexts: \(\mathcal {Q}={\{g^Q_{l}}\}^L_{l=1}, \mathcal {A}={\{g^A_{l}}\}^L_{l=1}\). We design a context selector to select contexts for target question and answer respectively in order to eliminate the influence of noise in the context.
We treat the answer of the QA pair to be predicted as key: \(g^A_{l}\), to select the corresponding answer contexts. We use cosine similarity to measure text similarity between the answer key \(g^A_{l}\) and the answer context \(\mathcal {A}\), which is formulated as:
where \(s_{A_{l}}\) is the relevance score.
Then we use the score to form a mask matrix for each answer and assign the same mask matrix to the question contexts, aiming to retain the consistency of the masked answer sequence and question sequence, which is formulated as:
where \(\odot \) is element-wise multiplication; \(\sigma \) is the sigmoid function; \(\gamma \) is the threshold and will be tuned according to the dataset. The sentences whose scores are below \(\gamma \) will be filtered out. \(Q_{l}\) and \(A_{l}\) are the final contexts for \(q_{l}\) and \(a_{l}\).
The context selector can make the model focus on the more relevant contexts through filtering out the noise contexts and thus benefits the model of exploring context-sensitive dependencies implied in the multi-turn QA.
3.5 Context Encoder
Given the selected contexts of the target question and answer, we feed them to two BiGRUs respectively:
where \(\tilde{Q}_{l}\) and \(\tilde{A}_{l}\) represent the outputs of the \(q_{l}\) and \(a_{l}\) at the corresponding position in the two bidirectional GRUs. \(\tilde{Q}_{l}\) and \(\tilde{A}_{l}\) denote the context-aware embeddings of \(q_{l}\) and \(a_{l}\) respectively.
We use the two context encoders to model context dependencies between multiple answers and questions respectively. In this way, we can make full use of deceptive signals implied in the contexts to recognize deceptive QA pairs.
3.6 Question Answer Pair Classifier
Then, the context-aware embeddings of the target question and answer are concatenated to obtain the final QA pair representation:
Finally, the representation of the QA pair is fed into a softmax classifier:
where W and b are trainable parameters.
The loss function is defined as the cross-entropy error over all labeled QA pairs:
where N is the number of multi-turn QAs; L is the number of QA pairs in a multi-turn QA and \(y_{il}\) is the ground-truth label of the QA pair.
4 Deception QA Dataset Design
Our goal is to build a Chinese text-based collection of deception and non-deception data in the form of multi-turn QA, which allows us to analyze contextual dependencies between QA pairs concerning deception. We design questionnaires related to different topics about daily life and then recruit subjects to answer these questions.
4.1 Questionnaires Design
To collect deceptive and non-deceptive data, we design six different questionnaires covering six topics related to daily life. These six themes are sports, music, tourism, film and television, school, and occupation. For each questionnaire, we design different questions. The number of questions in each questionnaire varies from seven to ten. Specially, the first question in the questionnaire is directly related to the corresponding theme as shown in Table 1. The following questions are designed subtly so that they can be viewed as follow-up questions for the first question. There are also progressive dependencies between these questions.
4.2 Answers Collection
To obtain deceptive and non-deceptive data, we recruit 318 subjects from universities and companies to fill in the six questionnaires. The numbers of collected multi-turn QAs for each theme are 337, 97, 49, 53, 51, and 49 respectively.
Each subject is asked to answer the same questionnaire twice to make the distribution of deceptive and non-deceptive data as balanced as possible. For the first time, subjects need to tell the truth to the first question. For the second time, they need to tell lies to the same first question. Subjects are allowed to tell the truth or lies to the following questions casually, but the final goal is to convince others that the subjects’ answers are all true. Questions in a questionnaire have sequential dependence, forcing the subjects to change their answers to the first question instead of other questions helps them better organize their expression to answer the following questions. In order to motivate subjects to produce high-quality deceptive and non-deceptive answers, we give them certain monetary rewards.
Similar to previous work [11], we ask the subjects to label their own answers. Subjects are asked to label their answers with “T” or “F”. “T” means what they say is truth and “F” means deceptive.
4.3 Train/Dev/Test Split
We obtain 636 multi-turn QAs and 6113 QA pairs finally. After shuffling all of the multi-turn QAs, we divide the data into train set, development set, and test set randomly according to the ratio of 8:1:1. Table 2 shows dataset statistics.
5 Experiments
5.1 Experimental Settings
Deception QA dataset is a Chinese dataset. JiebaFootnote 1 is employed to segment text into Chinese words and Glove [23] is employed to get pre-trained word embeddings. Moreover, we use Chinese BERT and RoBERTa with whole word maskingFootnote 2 [4]. For the context selector, \(\gamma \) is set to 0.63 according to the valid data. The performance is evaluated using standard Macro-Precision, Macro-Recall, and Macro-F1.
5.2 Baselines
The baselines are divided into two parts, according to whether take the context into consideration or not. Without considering the context, we compare our model with general text classification approaches: BiGRU [3], TextCNN [10], BERT [5] and RoBERTa [16]. Considering the context, we use BiGRU-CC, attBiGRU-CC, TextCNN-BiGRU, DialogueGCN [8], where CC means considering all the contexts and DialogueGCN is the state-of-the-art model of emotion recognition in conversation task. We propose CSN and CSN-BERT/-RoBERTa which have a subtlety-designed context selector to filter noise in the context.
5.3 Results and Analysis
Results in Table 3 can be divided into three parts. From top to the bottom, it shows the results that do not consider the contexts, consider all the contexts and perform contexts selection.
From the first part, we find that methods based on pre-trained language models (PLMs) are almost better than general text classification models. From the second part, we find that approaches considering the contexts perform much better than those who don’t consider the contexts. This proves the effectiveness of the QA context to detect deception.
The model proposed by us achieves the best performance among all of the strong baselines. The Macro-F1 score of CSN-RoBERTa is 5.65% higher than that of RoBERTa and 6.61% higher than that of DialogueGCN. For other sequence-based approaches without the context selector, the Macro-F1 score of CSN-RoBERTa is 11.26% higher than them on average. It indicates that taking all of the contexts including noise can hurt the model performance. Besides context information, noise is another key factor that affects the ability of the model to recognize deception. The results indicate the effectiveness of our model.
From experimental results in Table 4, we can observe that removing the context selector results in performance degradation. The results of the ablation study on three models show that the Macro-F1 values of the models using the context selector is 3.02% higher than those of the models without context selector on average. This proves that the proposed context selector does help to improve the model’s ability to recognize deceptive and non-deceptive QA pairs in a multi-turn QA.
5.4 Case Study
Table 5 shows an example that CSN-RoBERTa successfully predicted Turn-5 as deception by masking Turn-2, Turn-8, and Turn-9 while RoBERTa-BiGRU-CC which takes all of the contexts into consideration misclassified Turn-5.
According to the example, we can find that the masked contexts can be regarded as noise which is less relevant to Turn-5. Turn-2 talked about the time when the subject liked billiards that is relatively irrelevant to the subject’s experience in the game. Turn-7, Turn-8, and Turn-9 all talked about star players which could not provide effective information for judging whether Turn-5 is deceptive. Due to the inaccuracies of the model, only Turn-2, Turn-8, and Turn-9 are masked. This kind of noisy context can confuse the model and make it unable to classify Turn-5 correctly.
6 Conclusion
In this paper, we propose a novel task: deception detection in a multi-turn QA and a context selector network to model context-sensitive dependence. In addition, we build a high-quality dataset for the experiment. Empirical evaluation on the collected dataset indicates that our approach significantly outperforms several strong baseline approaches, showing that the QA contexts and the context selector do help the model effectively explore deceptive features. In the future, we would like to integrate user information to explore deeper deceptive signals in the multi-turn QA.
References
Azaria, A., Richardson, A., Kraus, S.: An agent for deception detection in discussion based environments. In: CSCW, pp. 218–227 (2015)
Bian, T., et al.: Rumor detection on social media with bi-directional graph convolutional networks. In: AAAI, pp. 549–556 (2020)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP, pp. 1724–1734 (2014)
Cui, Y., et al.: Pre-training with whole word masking for chinese BERT. arXiv: 1906.08101 (2019)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT, pp. 4171–4186 (2019)
Ding, M., Zhao, A., Lu, Z., Xiang, T., Wen, J.R.: Face-focused cross-stream network for deception detection in videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7802–7811 (2019)
Feng, S., Banerjee, R., Choi, Y.: Syntactic stylometry for deception detection. In: ACL, pp. 171–175 (2012)
Ghosal, D., Majumder, N., Poria, S., Chhaya, N., Gelbukh, A.F.: Dialoguegcn: A graph convolutional neural network for emotion recognition in conversation. In: EMNLP-IJCNLP, pp. 154–164 (2019)
Hirschberg, J., et al.: Distinguishing deceptive from non-deceptive speech. In: INTERSPEECH, pp. 1833–1836 (2005)
Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP, pp. 1746–1751 (2014)
Levitan, S.I., et al.: Cross-cultural production and detection of deception from speech. In: WMDD@ICMI, pp. 1–8 (2015)
Levitan, S.I., Maredia, A., Hirschberg, J.: Linguistic cues to deception and perceived deception in interview dialogues. In: NAACL-HLT, pp. 1941–1950 (2018)
Li, D., Jr., E.S.: Discriminating deception from truth and misinformation: an intent-level approach. J. Exp. Theor. Artif. Intell. 32(3), 373–407 (2020)
Li, X., et al.: Entity-relation extraction as multi-turn question answering. In: ACL, pp. 1340–1350 (2019)
Liu, L., Zhang, Z., Zhao, H., Zhou, X., Zhou, X.: Filling the gap of utterance-aware and speaker-aware representation for multi-turn dialogue. In: AAAI, pp. 13406–13414 (2021)
Liu, Y., et al.: Roberta: A robustly optimized BERT pretraining approach. CoRR abs/ arXiv: 1907.11692 (2019)
Lu, Y., Li, C.: GCAN: graph-aware co-attention networks for explainable fake news detection on social media. In: ACL, pp. 505–514 (2020)
Mathur, L., Mataric, M.J.: Unsupervised audio-visual subspace alignment for high-stakes deception detection. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, 6–11 June 2021, pp. 2255–2259. IEEE (2021)
Mihalcea, R., Strapparava, C.: The lie detector: Explorations in the automatic recognition of deceptive language. In: ACL, pp. 309–312 (2009)
Ott, M., Cardie, C., Hancock, J.T.: Negative deceptive opinion spam. In: HLT-NAACL, pp. 497–501 (2013)
Ott, M., Choi, Y., Cardie, C., Hancock, J.T.: Finding deceptive opinion spam by any stretch of the imagination. In: ACL, pp. 309–319 (2011). https://www.aclweb.org/anthology/P11-1032/
Pennebaker, J.W., Francis, M.E., Booth, R.J.: Linguistic Inquiry and Word Count. Lawerence Erlbaum Associates, Mahwah, NJ (2001)
Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)
Pérez-Rosas, V., Abouelenien, M., Mihalcea, R., Burzo, M.: Deception detection using real-life trial data. In: ICMI, pp. 59–66 (2015)
Pérez-Rosas, V., Abouelenien, M., Mihalcea, R., Xiao, Y., Linton, C.J., Burzo, M.: Verbal and nonverbal clues for real-life deception detection. In: EMNLP, pp. 2336–2346 (2015)
Pisarevskaya, D.: Deception detection in news reports in the russian language: Lexics and discourse. In: NLPmJ@EMNLP, pp. 74–79 (2017)
de Ruiter, B., Kachergis, G.: The mafiascum dataset: A large text corpus for deception detection. CoRR abs/ arXiv:1811.07851 (2018)
Soldner, F., Pérez-Rosas, V., Mihalcea, R.: Box of lies: Multimodal deception detection in dialogues. In: Burstein, J., Doran, C., Solorio, T. (eds.) NAACL-HLT, pp. 1768–1777 (2019)
Tsunomori, Y., Neubig, G., Sakti, S., Toda, T., Nakamura, S.: An analysis towards dialogue-based deception detection. In: Natural Language Dialog Systems and Intelligent Assistants, pp. 177–187 (2015)
Wang, X.D., Weber, L., Leser, U.: Biomedical event extraction as multi-turn question answering. In: LOUHI@EMNLP, pp. 88–96 (2020)
Xu, Q., Zhao, H.: Using deep linguistic features for finding deceptive opinion spam. In: COLING, pp. 1341–1350 (2012)
Zhou, L., Sung, Y.: Cues to deception in online chinese groups. In: HICSS-41, p. 146 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bao, Y., Ma, Q., Wei, L., Wang, D., Zhou, W., Hu, S. (2022). Deception Detection Towards Multi-turn Question Answering with Context Selector Network. In: Khanna, S., Cao, J., Bai, Q., Xu, G. (eds) PRICAI 2022: Trends in Artificial Intelligence. PRICAI 2022. Lecture Notes in Computer Science, vol 13629. Springer, Cham. https://doi.org/10.1007/978-3-031-20862-1_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-20862-1_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20861-4
Online ISBN: 978-3-031-20862-1
eBook Packages: Computer ScienceComputer Science (R0)