Using linguistically defined specific details to detect deception across domains

Nikolai Vogler; Lisa Pearl

doi:10.1017/S1351324919000408

Using linguistically defined specific details to detect deception across domains

Published online by Cambridge University Press: 01 August 2019

Nikolai Vogler and

Lisa Pearl

Show author details

Nikolai Vogler*: Affiliation:
Language Science and Cognitive Sciences, University of California, 3151 Social Science Plaza A, Irvine, CA92697, USA
Lisa Pearl: Affiliation:
Language Science and Cognitive Sciences, University of California, 3151 Social Science Plaza A, Irvine, CA92697, USA
*: *Corresponding author. Email: nikolai.vogler@gmail.com

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Current automatic deception detection approaches tend to rely on cues that are based either on specific lexical items or on linguistically abstract features that are not necessarily motivated by the psychology of deception. Notably, while approaches relying on such features can do well when the content domain is similar for training and testing, they suffer when content changes occur. We investigate new linguistically defined features that aim to capture specific details, a psychologically motivated aspect of truthful versus deceptive language that may be diagnostic across content domains. To ascertain the potential utility of these features, we evaluate them on data sets representing a broad sample of deceptive language, including hotel reviews, opinions about emotionally charged topics, and answers to job interview questions. We additionally evaluate these features as part of a deception detection classifier. We find that these linguistically defined specific detail features are most useful for cross-domain deception detection when the training data differ significantly in content from the test data, and particularly benefit classification accuracy on deceptive documents. We discuss implications of our results for general-purpose approaches to deception detection.

Keywords

Deception detection Cross-domain Specific details Linguistic features

Type: Article
Information: Natural Language Engineering , Volume 26 , Issue 3 , May 2020 , pp. 349 - 373

DOI: https://doi.org/10.1017/S1351324919000408 [Opens in a new window]
Copyright: © Cambridge University Press 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Almela, Á., Valencia-García, R. and Cantos, P. (2012). Seeing through deception: A computational approach to deceit detection in written communication. In Proceedings of the ACL Workshop on Computational Approaches to Deception Detection, pp. 15–22.Google Scholar

Bachenko, J., Fitzpatrick, E. and Schonwetter, M. (2008). Verification and implementation of language-based deception indicators in civil and criminal narratives. In Proceedings of the 22nd International Conference on Computational Linguistics (COLING)-Volume 1, Stroudsburg, PA: Association for Computational Linguistics. pp. 41–48.Google Scholar

Burgoon, J., Mayew, W.J., Giboney, J.S., Elkins, A.C., Moffitt, K., Dorn, B., Byrd, M. and Spitzley, L. (2016). Which spoken language markers identify deception in high-stakes settings? Evidence from earnings conference calls. Journal of Language and Social Psychology 35(2), 123–157.CrossRef Google Scholar

Burgoon, J.K., Blair, J.P., Qin, T. and Nunamaker, J.F. (2003). Detecting deception through linguistic analysis. In International Conference on Intelligence and Security Informatics, pp. 91–101.CrossRef Google Scholar

Burgoon, J.K., Buller, D.B., Guerrero, L.K., Afifi, W.A. and Feldman, C.M. (1996). Interpersonal deception: XII. Information management dimensions underlying deceptive and truthful messages. Communications Monographs 63(1), 50–69.CrossRef Google Scholar

Burgoon, J.K., Buller, D.B., White, C.H., Afifi, W. and Buslig, A.L.S. (1999). The role of conversational involvement in deceptive interpersonal interactions. Personality and Social Psychology Bulletin 25(6), 669–686.CrossRef Google Scholar

Burgoon, J.K. and Qin, T. (2006). The dynamic nature of deceptive verbal communication. Journal of Language and Social Psychology 25(1), 76–96.CrossRef Google Scholar

Feng, S., Banerjee, R. and Choi, Y. (2012). Syntactic stylometry for deception detection. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2, pp. 171–175.Google Scholar

Feng, V.W. and Hirst, G. (2013). Detecting deceptive opinions with profile compatibility. In International Joint Conference on Natural Language Processing, pp. 338–346.Google Scholar

Fitzpatrick, E. and Bachenko, J. (2009). Building a forensic corpus to test language-based indicators of deception. Language and Computers 71(1), 183–196.Google Scholar

Fitzpatrick, E., Bachenko, J. and Fornaciari, T. (2015). Automatic detection of verbal deception. Synthesis Lectures on Human Language Technologies 8(3), 1–119.CrossRef Google Scholar

Fornaciari, T. and Poesio, M. (2011). Lexical vs. surface features in deceptive language analysis. In Proceedings of the ICAIL 2011 Workshop: Applying Human Language Technology to the Law, pp. 2–8.Google Scholar

Fornaciari, T. and Poesio, M. (2013). Automatic deception detection in Italian court cases. Artificial Intelligence and Law 21(3), 303–340.CrossRef Google Scholar

Fornaciari, T. and Poesio, M. (2014). Identifying fake Amazon reviews as learning from crowds. In Proceedings of the Association for Computational Linguistics, pp. 279–287.CrossRef Google Scholar

Fusilier, D.H., Montes-y-Gómez, M., Rosso, P. and Cabrera, R.G. (2015). Detecting positive and negative deceptive opinions using PU-learning. Information Processing & Management 51(4), 433–443.CrossRef Google Scholar

Graham, Y., Mathur, N. and Baldwin, T. (2014). Randomized significance tests in machine translation. In Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 266–274.CrossRef Google Scholar

Hirschberg, J., Benus, S., Brenier, J., Enos, F., Hoffman, S., Gilman, S., Girand, C., Graciarena, M., Kathol, A., Michaelis, L., Pellom, L.B, Shriberg, E. and Stolcke, A. (2005). Distinguishing deceptive from non-deceptive speech. In 9th European Conference on Speech Communication and Technology, pp. 1833–1836.Google Scholar

Johnson, M.K. and Raye, C.L. (1981). Reality monitoring. Psychological Review 88(1), 67.CrossRef Google Scholar

Kim, S., Lee, S., Park, D. and Kang, J. (2017). Constructing and evaluating a novel crowdsourcing-based paraphrased opinion spam dataset. In Proceedings of the 26th International Conference on World Wide Web, pp. 827–836.CrossRef Google Scholar

Klein, D. and Manning, C.D. (2003). Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, pp. 423–430.CrossRef Google Scholar

Kleinberg, B., Mozes, M., Arntz, A. and Verschuere, B. (2017). Using named entities for computer-automated verbal deception detection. Journal of Forensic Sciences 63(3), 714–723.CrossRef Google Scholar PubMed

Krüger, K., Lukowiak, A., Sonntag, J., Warzecha, S. & Stede, M. (2017). Classifying news versus opinions in newspapers: Linguistic features for domain independence. Natural Language Engineering 23(5), 687–707.CrossRef Google Scholar

Larcker, D.F. and Zakolyukina, A.A. (2012). Detecting deceptive discussions in conference calls. Journal of Accounting Research 50(2), 495–540.CrossRef Google Scholar

Levine, T.R. (2014). Truth-Default Theory (TDT) a theory of human deception and deception detection. Journal of Language and Social Psychology 33(4), 378–392.CrossRef Google Scholar

Li, J., Ott, M., Cardie, C. and Hovy, E. (2014). Towards a general rule for identifying deceptive opinion spam. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1566–1576.CrossRef Google Scholar

Mayzlin, D., Dover, Y. and Chevalier, J. (2014). Promotional reviews: An empirical investigation of online review manipulation. American Economic Review 104(8), 2421–2455.CrossRef Google Scholar

McCornack, S.A. (1992). Information manipulation theory. Communications Monographs 59(1), 1–16.CrossRef Google Scholar

McCornack, S.A. and Parks, M.R. (1986). Deception detection and relationship development: The other side of trust. Annals of the International Communication Association 9(1), 377–389.CrossRef Google Scholar

Mihalcea, R. and Strapparava, C. (2009). The lie detector: Explorations in the automatic recognition of deceptive language. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 309–312.CrossRef Google Scholar

Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S. and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pp. 3111–3119. https://www.cambridge.org/core/journals/natural-language-engineering/article/word2vec/B84AE4446BD47F48847B4904F0B36E0B Google Scholar

Narayan, R., Rout, J.K. and Jena, S.K. (2018). Review spam detection using opinion mining. In Progress in Intelligent Computing Techniques: Theory, Practice, and Applications, pp. 273–279.CrossRef Google Scholar

Newman, M.L., Pennebaker, J.W., Berry, D.S. and Richards, J.M. (2003). Lying words: Predicting deception from linguistic styles. Personality and Social Psychology Bulletin 29(5), 665–675.CrossRef Google Scholar PubMed

Ott, M., Cardie, C. and Hancock, J. (2012). Estimating the prevalence of deception in online review communities. In Proceedings of the 21st International Conference on World Wide Web, pp. 201–210.CrossRef Google Scholar

Ott, M., Cardie, C. and Hancock, J.T. (2013). Negative deceptive opinion spam. In HLT-NAACL, pp. 497–501.Google Scholar

Ott, M., Choi, Y., Cardie, C. and Hancock, J.T. (2011). Finding deceptive opinion spam by any stretch of the imagination. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp. 309–319.Google Scholar

Pearl, L., Lu, K. and Haghighi, A. (2016). The character in the letter: Epistolary attribution in Samuel Richardsons Clarissa. Digital Scholarship in the Humanities 32(2), 355–376.Google Scholar

Pearl, L. and Steyvers, M. (2012). Detecting authorship deception: A supervised machine learning approach using author writeprints. Literary and Linguistic Computing 27(2), 183–196.CrossRef Google Scholar

Pearl, L.S. and Enverga, I. (2014). Can you read my mindprint?: Automatically identifying mental states from language text using deeper linguistic features. Interaction Studies 15(3), 359–387.Google Scholar

Pennebaker, J., Booth, R. and Francis, M. (2007). Linguistic Inquiry and Word Count: LIWC. Austin, TX: LIWC.net.Google Scholar

Pennington, J., Socher, R. and Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543.CrossRef Google Scholar

Pérez-Rosas, V. and Mihalcea, R. (2015). Experiments in open domain deception detection. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1120–1125.CrossRef Google Scholar

Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K. and Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprint https://arXiv.org/abs/1802.05365arXiv:1802.05365.Google Scholar

Plank, B. and Van Noord, G. (2011). Effective measures of domain similarity for parsing. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp. 1566–1576.Google Scholar

Remus, R. (2012). Domain adaptation using domain similarity and domain complexity-based instance selection for cross-domain sentiment analysis. In 2012 IEEE 12th International Conference on Data Mining Workshops (ICDMW), pp. 717–723.CrossRef Google Scholar

Rosso, P. and Cagnina, L.C. (2017). Deception detection and opinion spam. In Cambria, E., Das, D., Bandyopadhyay, S. and Feraco, A. (eds.), A Practical Guide to Sentiment Analysis. Socio-Affective Computing, vol 5. Cham: Springer, p. 155–171.Google Scholar

Rubin, V.L. and Vashchilko, T. (2012). Identification of truth and deception in text: Application of vector space model to rhetorical structure theory. In Proceedings of the Workshop on Computational Approaches to Deception Detection, pp. 97–106.Google Scholar

Ruder, S., Ghaffari, P. and Breslin, J.G. (2017). Data selection strategies for multi-domain sentiment analysis. arXiv preprint https://arXiv.org/abs/1702.02426arXiv:1702.02426.Google Scholar

Santos, E. and Li, D. (2010). On deception detection in multiagent systems. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 40(2), 224–235.CrossRef Google Scholar

Steller, M. and Koehnken, G. (1989). Criteria-based statement analysis.CrossRef Google Scholar

Vrij, A. (2000). Detecting Lies and Deceit: The Psychology of Lying and Implications for Professional Practice. New York: Wiley.Google Scholar

Vrij, A. (2008). Detecting lies and deceit: Pitfalls and opportunities. New York: Wiley.Google Scholar

Yancheva, M. and Rudzicz, F. (2013). Automatic detection of deception in child-produced speech using syntactic complexity features. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 944–953.Google Scholar

Yoo, K.-H. and Gretzel, U. (2009). Comparison of deceptive and truthful travel reviews. In: Höpken, W., Gretzel, U. and Law, R. (eds.) Information and Communication Technologies in Tourism, Vienna: Springer, pp. 37–47.Google Scholar

Yu, D., Tyshchuk, Y., Ji, H. and Wallace, W. (2015). Detecting deceptive groups using conversations and network analysis. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), vol. 1, pp. 857–866.CrossRef Google Scholar

Zhou, L., Burgoon, J.K., Nunamaker, J.F. and Twitchell, D. (2004). Automating linguistics-based cues for detecting deception in text-based asynchronous computer-mediated communications. Group Decision and Negotiation 13(1), 81–106.CrossRef Google Scholar

Zuckerman, M., DePaulo, B.M. and Rosenthal, R. (1981). Verbal and nonverbal communication of deception. Advances in Experimental Social Psychology 14, 1–59.Google Scholar

Article contents

Using linguistically defined specific details to detect deception across domains

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests