Hostname: page-component-76fb5796d-vvkck Total loading time: 0 Render date: 2024-04-25T12:47:40.378Z Has data issue: false hasContentIssue false

Using linguistically defined specific details to detect deception across domains

Published online by Cambridge University Press:  01 August 2019

Nikolai Vogler*
Affiliation:
Language Science and Cognitive Sciences, University of California, 3151 Social Science Plaza A, Irvine, CA92697, USA
Lisa Pearl
Affiliation:
Language Science and Cognitive Sciences, University of California, 3151 Social Science Plaza A, Irvine, CA92697, USA
*
*Corresponding author. Email: nikolai.vogler@gmail.com

Abstract

Current automatic deception detection approaches tend to rely on cues that are based either on specific lexical items or on linguistically abstract features that are not necessarily motivated by the psychology of deception. Notably, while approaches relying on such features can do well when the content domain is similar for training and testing, they suffer when content changes occur. We investigate new linguistically defined features that aim to capture specific details, a psychologically motivated aspect of truthful versus deceptive language that may be diagnostic across content domains. To ascertain the potential utility of these features, we evaluate them on data sets representing a broad sample of deceptive language, including hotel reviews, opinions about emotionally charged topics, and answers to job interview questions. We additionally evaluate these features as part of a deception detection classifier. We find that these linguistically defined specific detail features are most useful for cross-domain deception detection when the training data differ significantly in content from the test data, and particularly benefit classification accuracy on deceptive documents. We discuss implications of our results for general-purpose approaches to deception detection.

Type
Article
Copyright
© Cambridge University Press 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Almela, Á., Valencia-García, R. and Cantos, P. (2012). Seeing through deception: A computational approach to deceit detection in written communication. In Proceedings of the ACL Workshop on Computational Approaches to Deception Detection, pp. 1522.Google Scholar
Bachenko, J., Fitzpatrick, E. and Schonwetter, M. (2008). Verification and implementation of language-based deception indicators in civil and criminal narratives. In Proceedings of the 22nd International Conference on Computational Linguistics (COLING)-Volume 1, Stroudsburg, PA: Association for Computational Linguistics. pp. 4148.Google Scholar
Burgoon, J., Mayew, W.J., Giboney, J.S., Elkins, A.C., Moffitt, K., Dorn, B., Byrd, M. and Spitzley, L. (2016). Which spoken language markers identify deception in high-stakes settings? Evidence from earnings conference calls. Journal of Language and Social Psychology 35(2), 123157.CrossRefGoogle Scholar
Burgoon, J.K., Blair, J.P., Qin, T. and Nunamaker, J.F. (2003). Detecting deception through linguistic analysis. In International Conference on Intelligence and Security Informatics, pp. 91101.CrossRefGoogle Scholar
Burgoon, J.K., Buller, D.B., Guerrero, L.K., Afifi, W.A. and Feldman, C.M. (1996). Interpersonal deception: XII. Information management dimensions underlying deceptive and truthful messages. Communications Monographs 63(1), 5069.CrossRefGoogle Scholar
Burgoon, J.K., Buller, D.B., White, C.H., Afifi, W. and Buslig, A.L.S. (1999). The role of conversational involvement in deceptive interpersonal interactions. Personality and Social Psychology Bulletin 25(6), 669686.CrossRefGoogle Scholar
Burgoon, J.K. and Qin, T. (2006). The dynamic nature of deceptive verbal communication. Journal of Language and Social Psychology 25(1), 7696.CrossRefGoogle Scholar
Feng, S., Banerjee, R. and Choi, Y. (2012). Syntactic stylometry for deception detection. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2, pp. 171175.Google Scholar
Feng, V.W. and Hirst, G. (2013). Detecting deceptive opinions with profile compatibility. In International Joint Conference on Natural Language Processing, pp. 338346.Google Scholar
Fitzpatrick, E. and Bachenko, J. (2009). Building a forensic corpus to test language-based indicators of deception. Language and Computers 71(1), 183196.Google Scholar
Fitzpatrick, E., Bachenko, J. and Fornaciari, T. (2015). Automatic detection of verbal deception. Synthesis Lectures on Human Language Technologies 8(3), 1119.CrossRefGoogle Scholar
Fornaciari, T. and Poesio, M. (2011). Lexical vs. surface features in deceptive language analysis. In Proceedings of the ICAIL 2011 Workshop: Applying Human Language Technology to the Law, pp. 28.Google Scholar
Fornaciari, T. and Poesio, M. (2013). Automatic deception detection in Italian court cases. Artificial Intelligence and Law 21(3), 303340.CrossRefGoogle Scholar
Fornaciari, T. and Poesio, M. (2014). Identifying fake Amazon reviews as learning from crowds. In Proceedings of the Association for Computational Linguistics, pp. 279287.CrossRefGoogle Scholar
Fusilier, D.H., Montes-y-Gómez, M., Rosso, P. and Cabrera, R.G. (2015). Detecting positive and negative deceptive opinions using PU-learning. Information Processing & Management 51(4), 433443.CrossRefGoogle Scholar
Graham, Y., Mathur, N. and Baldwin, T. (2014). Randomized significance tests in machine translation. In Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 266274.CrossRefGoogle Scholar
Hirschberg, J., Benus, S., Brenier, J., Enos, F., Hoffman, S., Gilman, S., Girand, C., Graciarena, M., Kathol, A., Michaelis, L., Pellom, L.B, Shriberg, E. and Stolcke, A. (2005). Distinguishing deceptive from non-deceptive speech. In 9th European Conference on Speech Communication and Technology, pp. 1833–1836.Google Scholar
Johnson, M.K. and Raye, C.L. (1981). Reality monitoring. Psychological Review 88(1), 67.CrossRefGoogle Scholar
Kim, S., Lee, S., Park, D. and Kang, J. (2017). Constructing and evaluating a novel crowdsourcing-based paraphrased opinion spam dataset. In Proceedings of the 26th International Conference on World Wide Web, pp. 827836.CrossRefGoogle Scholar
Klein, D. and Manning, C.D. (2003). Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, pp. 423430.CrossRefGoogle Scholar
Kleinberg, B., Mozes, M., Arntz, A. and Verschuere, B. (2017). Using named entities for computer-automated verbal deception detection. Journal of Forensic Sciences 63(3), 714723.CrossRefGoogle ScholarPubMed
Krüger, K., Lukowiak, A., Sonntag, J., Warzecha, S. & Stede, M. (2017). Classifying news versus opinions in newspapers: Linguistic features for domain independence. Natural Language Engineering 23(5), 687707.CrossRefGoogle Scholar
Larcker, D.F. and Zakolyukina, A.A. (2012). Detecting deceptive discussions in conference calls. Journal of Accounting Research 50(2), 495540.CrossRefGoogle Scholar
Levine, T.R. (2014). Truth-Default Theory (TDT) a theory of human deception and deception detection. Journal of Language and Social Psychology 33(4), 378392.CrossRefGoogle Scholar
Li, J., Ott, M., Cardie, C. and Hovy, E. (2014). Towards a general rule for identifying deceptive opinion spam. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 15661576.CrossRefGoogle Scholar
Mayzlin, D., Dover, Y. and Chevalier, J. (2014). Promotional reviews: An empirical investigation of online review manipulation. American Economic Review 104(8), 24212455.CrossRefGoogle Scholar
McCornack, S.A. (1992). Information manipulation theory. Communications Monographs 59(1), 116.CrossRefGoogle Scholar
McCornack, S.A. and Parks, M.R. (1986). Deception detection and relationship development: The other side of trust. Annals of the International Communication Association 9(1), 377389.CrossRefGoogle Scholar
Mihalcea, R. and Strapparava, C. (2009). The lie detector: Explorations in the automatic recognition of deceptive language. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 309312.CrossRefGoogle Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S. and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pp. 31113119. https://www.cambridge.org/core/journals/natural-language-engineering/article/word2vec/B84AE4446BD47F48847B4904F0B36E0BGoogle Scholar
Narayan, R., Rout, J.K. and Jena, S.K. (2018). Review spam detection using opinion mining. In Progress in Intelligent Computing Techniques: Theory, Practice, and Applications, pp. 273279.CrossRefGoogle Scholar
Newman, M.L., Pennebaker, J.W., Berry, D.S. and Richards, J.M. (2003). Lying words: Predicting deception from linguistic styles. Personality and Social Psychology Bulletin 29(5), 665675.CrossRefGoogle ScholarPubMed
Ott, M., Cardie, C. and Hancock, J. (2012). Estimating the prevalence of deception in online review communities. In Proceedings of the 21st International Conference on World Wide Web, pp. 201210.CrossRefGoogle Scholar
Ott, M., Cardie, C. and Hancock, J.T. (2013). Negative deceptive opinion spam. In HLT-NAACL, pp. 497501.Google Scholar
Ott, M., Choi, Y., Cardie, C. and Hancock, J.T. (2011). Finding deceptive opinion spam by any stretch of the imagination. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp. 309319.Google Scholar
Pearl, L., Lu, K. and Haghighi, A. (2016). The character in the letter: Epistolary attribution in Samuel Richardsons Clarissa. Digital Scholarship in the Humanities 32(2), 355376.Google Scholar
Pearl, L. and Steyvers, M. (2012). Detecting authorship deception: A supervised machine learning approach using author writeprints. Literary and Linguistic Computing 27(2), 183196.CrossRefGoogle Scholar
Pearl, L.S. and Enverga, I. (2014). Can you read my mindprint?: Automatically identifying mental states from language text using deeper linguistic features. Interaction Studies 15(3), 359387.Google Scholar
Pennebaker, J., Booth, R. and Francis, M. (2007). Linguistic Inquiry and Word Count: LIWC. Austin, TX: LIWC.net.Google Scholar
Pennington, J., Socher, R. and Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 15321543.CrossRefGoogle Scholar
Pérez-Rosas, V. and Mihalcea, R. (2015). Experiments in open domain deception detection. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 11201125.CrossRefGoogle Scholar
Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K. and Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprint https://arXiv.org/abs/1802.05365arXiv:1802.05365.Google Scholar
Plank, B. and Van Noord, G. (2011). Effective measures of domain similarity for parsing. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp. 15661576.Google Scholar
Remus, R. (2012). Domain adaptation using domain similarity and domain complexity-based instance selection for cross-domain sentiment analysis. In 2012 IEEE 12th International Conference on Data Mining Workshops (ICDMW), pp. 717723.CrossRefGoogle Scholar
Rosso, P. and Cagnina, L.C. (2017). Deception detection and opinion spam. In Cambria, E., Das, D., Bandyopadhyay, S. and Feraco, A. (eds.), A Practical Guide to Sentiment Analysis. Socio-Affective Computing, vol 5. Cham: Springer, p. 155171.Google Scholar
Rubin, V.L. and Vashchilko, T. (2012). Identification of truth and deception in text: Application of vector space model to rhetorical structure theory. In Proceedings of the Workshop on Computational Approaches to Deception Detection, pp. 97106.Google Scholar
Ruder, S., Ghaffari, P. and Breslin, J.G. (2017). Data selection strategies for multi-domain sentiment analysis. arXiv preprint https://arXiv.org/abs/1702.02426arXiv:1702.02426.Google Scholar
Santos, E. and Li, D. (2010). On deception detection in multiagent systems. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 40(2), 224235.CrossRefGoogle Scholar
Steller, M. and Koehnken, G. (1989). Criteria-based statement analysis.CrossRefGoogle Scholar
Vrij, A. (2000). Detecting Lies and Deceit: The Psychology of Lying and Implications for Professional Practice. New York: Wiley.Google Scholar
Vrij, A. (2008). Detecting lies and deceit: Pitfalls and opportunities. New York: Wiley.Google Scholar
Yancheva, M. and Rudzicz, F. (2013). Automatic detection of deception in child-produced speech using syntactic complexity features. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 944953.Google Scholar
Yoo, K.-H. and Gretzel, U. (2009). Comparison of deceptive and truthful travel reviews. In: Höpken, W., Gretzel, U. and Law, R. (eds.) Information and Communication Technologies in Tourism, Vienna: Springer, pp. 3747.Google Scholar
Yu, D., Tyshchuk, Y., Ji, H. and Wallace, W. (2015). Detecting deceptive groups using conversations and network analysis. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), vol. 1, pp. 857866.CrossRefGoogle Scholar
Zhou, L., Burgoon, J.K., Nunamaker, J.F. and Twitchell, D. (2004). Automating linguistics-based cues for detecting deception in text-based asynchronous computer-mediated communications. Group Decision and Negotiation 13(1), 81106.CrossRefGoogle Scholar
Zuckerman, M., DePaulo, B.M. and Rosenthal, R. (1981). Verbal and nonverbal communication of deception. Advances in Experimental Social Psychology 14, 159.Google Scholar