Elsevier

Knowledge-Based Systems

Volume 249, 5 August 2022, 108952
Knowledge-Based Systems

A lexical psycholinguistic knowledge-guided graph neural network for interpretable personality detection

https://doi.org/10.1016/j.knosys.2022.108952Get rights and content

Abstract

With the blossoming of online social media, personality detection based on user-generated content has a significant impact on information scientific and industrial applications. Most existing approaches rely heavily on semantic features or superficial psycholinguistic statistical features calculated by existing tools and fail to effectively exploit psycholinguistic knowledge that can help determine and interpret peoples personality traits. In this paper, we propose a novel lexical psycholinguistic knowledge-guided graph neural model for interpretable personality detection, which leverages the personality lexicons as a bridge for injecting relevant external knowledge to enrich the semantics of a document. Specifically, we learn a kind of personality-aware word embedding, that encodes psycholinguistic information in the continuous representations of words. Then, a Heterogeneous Personality word graph is constructed by aligning the personality lexicons with the personality knowledge graph, which is fed into a Message-passing graph Network (HPMN) to extract explicit lexicon and knowledge relations through the interactions among heterogeneous graph nodes. Finally, through a carefully designed readout function, all heterogeneous nodes are selectively incorporated as knowledge-guided document embeddings for user-generated text personality understanding and interpretation. Experiments show that our model effectively detects personality traits. Moreover, it provides a certain level of support for lexical hypotheses in psycholinguistic research from a computational linguistics perspective.

Introduction

With the rapid development of social media platforms, people can access and analyze much user-generated content (UGC) to automatically identify authors personality traits. Many studies have shown that automatic personality detection systems play an essential role in various applications, such as user interest mining [1], information dissemination [2], recommendation systems [3], [4], [5], and intelligent machine design [6]. Therefore, analyzing and detecting users’ personality traits is significant for grasping users’ current and future psychologies and predicting their reactions and behaviors.

Personality detection research based on user-generated text is mainly divided into psycholinguistic lexicon-based, neural language model-based, and interpretability research. Earlier researchers captured psycholinguistic lexicon statistics features such as Linguistic Inquiry and Word Count (LIWC) [7] and Medical Research Council (MRC) [8] features in texts for personality detection [9], [10]. However, the obtaining artificial features are a costly operation, and a statistical analysis cannot effectively represent the original semantics. To avoid feature engineering, deep neural models are employed to learn text-distributed representations from end to end, and the resulting detection accuracy is greatly improved [11], [12], [13]. However, neural language model embeddings lack the ability to explain personality. Recently, some researchers combined common knowledge to detect personality [14], [15], providing some ability to explain personality and contributing to the analysis of personality traits. The latest researchers employed interpretable machine learning to clearly quantify the impacts of various psycholinguistic statistical features [16], [17]. However, these methods do not deeply exploit psycholinguistic domain knowledge and fail to effectively integrate psycholinguistic knowledge and text semantics into the associated neural models.

In the psychology field, personality traits are defined as attribute combinations of individual thoughts and emotions to explain the differences in human behaviors [18]. The generally used measurement metric are the Big Five personality, including openness, conscientiousness, extroversion, agreeableness, and neuroticism [19]. The relationship between personality and language has been studied for a long time. Psycholinguistics found an interesting phenomenon in empirical research: personality traits affect people’s use of language, which refers to their choice of vocabulary. Specifically, the LIWC lexicon [20], [21] and some personality adjectives [22] (Personality Adjectives Check List)1 have linear correlations with each personality trait. In addition, people with the same personality traits usually have the same fixed emotional polarities [23]. The details regarding this topic are described in Appendix. Fig. 1 shows a visual example of a neurotic user’s psycholinguistic knowledge. From the words “hate”, “murder”, and “hell”, we can roughly infer that he/she is a neurotic user. Based on the relationship between the synonym “damn”, emotional polarity, and personality traits, this inference is more confident to be confirmed. It can be seen that conducting personality detection research from the lexical psycholinguistic knowledge perspective can bring rich domain structure knowledge rather than superficial psycholinguistic statistical information. Although research on personality detection has achieved remarkable results, some challenges still remain.

  • Fusion of text semantics and psycholinguistic knowledge: It is a challenge to fully fuse lexical psycholinguistic knowledge and text semantics while accurately representing the personality traits derived from the user’s language.

  • Interpretability of personality detection: It is a challenge to utilize personality psychology knowledge to realize explainable personality detection in neural models.

To meet the above challenges, we propose a novel lexical psycholinguistic knowledge-guided graph neural network model for interpretable personality detection. Our model enriches personality document representations by incorporating heterogeneous external knowledge through the use of personality lexicons as intermediaries. In particular, instead of directly using previous pretrained word embeddings, we first refine a kind of personality-aware word embedding via position encoding and an attention mechanism. Second, to fully fuse knowledge and semantics, we align the personality lexicons with the constructed personality knowledge graph and automatically build a heterogeneous personality word graph for each user. Then, we develop a Heterogeneous Personality Message-passing graph neural Network (HPMN) and perform interactions among the word nodes, emotion and personality heterogeneous nodes in directed edges. Finally, regarding the interpretability of personality traits, we design a graph-level readout function, which delicately selects all heterogeneous nodes for incorporation as knowledge-guided document embeddings to achieve user-generated text personality understanding and interpretation. Therefore, personality detection is transformed into a heterogeneous word graph classification problem. After conducting a verification on 4 public personality datasets, the results show that our model can effectively improve the accuracy of personality detection and pay more attention to critical knowledge.

In summary, our contributions can be summarized as follows.

  • To the best of our knowledge, this is the first work that integrates lexical psycholinguistic knowledge and text semantics information into a neural model to achieve interpretable personality detection. Moreover, it provides support for lexical hypotheses in psycholinguistic research from a computational linguistic perspective.

  • Our model incorporates the distribution representations of words and the lexical knowledge by learning personality-aware word embeddings. In addition, we construct a heterogeneous personality word graph and develop a message-passing network, which extracts explicit lexicon and knowledge relations via the interactions among heterogeneous graph nodes. All heterogeneous nodes are selectively incorporated as knowledge-guided document embeddings for personality understanding and interpretation through a carefully designed graph readout layer.

  • Experiment results on four public datasets demonstrate that our model outperforms the state-of-the-art techniques in terms of personality detection. Our model can help various types of social software mine user information and help psychologists study and analyze personality traits in depth.

The rest of this paper is organized as follows. Section 2 introduces the work related to personality detection. Section 3 provides the problem formulation and Section 4 describes the proposed method. Further, Section 5 presents and analyzes the experimental results obtained on 4 public datasets. Finally, Section 6 outlines the conclusion and future research.

Section snippets

Related work

Due to the wide potential application value, personality detection has gradually attracted the attention of computer science researchers [24], [25]. Although personality detection in social networks is in its infancy, scholars have achieved fruitful results from multiple research perspectives. Aiming at the challenges mentioned in the previous section, this section focuses on the achievements of scholars in terms of four aspects: (1) psycholinguistic lexicon-based, (2) neural language

Problem formulation

Personality detection can be formulated as a user-level multilabel classification problem. Mathematically, given a user- generated document D={s1,s2,,sn}, where si={wi1,wi2,,wim} is the ith sentence with m words. Our goal is to detect T personality traits Y ={yt}t=1T for this user based on document D, where yt{0,1} is a binary variable.

Proposed method

In this section, we present our GNN-based personality detection model guided by lexical psycholinguistic knowledge. Our model takes full advantage of personality lexicons as a bridge to enrich the representations of personality documents with the incorporation of heterogeneous external knowledge. As illustrated in Fig. 2, our model contains three main parts.

  • (1)

    Personality-aware word embedding: To fully fuse lexical psycholinguistic knowledge and text semantics, we design personality word position

Experimental settings

In this section, we introduce the datasets used in the experiment and present the baseline methods. After introducing the parameter set, we present the evaluation index used to evaluate the performance of the models.

Conclusion and future research

In this paper, we present a novel personality detection model with lexical psycholinguistic knowledge guild, which not only achieves accurate personality detection results for social media texts but also enables us to explore the interpretability of personality traits via word knowledge. First, we summarize a personality dictionary containing 2043 words and learn personality-aware word embeddings to refine more accurate word vectors. Then, in combination with the background psychological

CRediT authorship contribution statement

Yangfu Zhu: Conceptualization, Methodology, Software, Writing – original draft. Linmei Hu: Methodology, Writing – review & editing. Nianwen Ning: Writing – review & editing. Wei Zhang: Writing – review & editing. Bin Wu: Supervision, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by the NSFC-General Technology Basic Research Joint Funds, China under Grant (U1936220), the National Natural Science Foundation of China under Grant (61972047) and the National Key Research and Development Program of China (2018YFC0831500).

References (44)

  • ZhaoP. et al.

    Modeling sentiment dependencies with graph convolutional networks for aspect-level sentiment classification

    Knowl.-Based Syst.

    (2020)
  • T. Shen, J. Jia, Y. Li, Y. Ma, Y. Bu, H. Wang, B. Chen, T.-S. Chua, W. Hall, Peia: Personality and emotion integrated...
  • XuC. et al.

    Recommendation by users’ multimodal preferences for smart city applications

    IEEE Trans. Ind. Inf.

    (2020)
  • GuoA. et al.

    From affect, behavior, and cognition to personality: an integrated personal character model for individual-like intelligent artifacts

    World Wide Web

    (2020)
  • PennebakerJ.W. et al.

    Linguistic inquiry and word count: LIWC 2001

    Mahway: Lawrence Erlbaum Assoc.

    (2001)
  • ColtheartM.

    The MRC psycholinguistic database

    Q. J. Exp. Psychol. A

    (1981)
  • P.-H. Arnoux, A. Xu, N. Boyette, J. Mahmud, R. Akkiraju, V. Sinha, 25 tweets to know you: A new model to predict...
  • XueD. et al.

    Deep learning-based personality recognition from text posts of online social networks

    Appl. Intell.

    (2018)
  • MajumderN. et al.

    Deep learning-based document modeling for personality detection from text

    IEEE Intell. Syst.

    (2017)
  • SunX. et al.

    Who am I? Personality detection based on deep learning for texts

  • PoriaS. et al.

    Common sense knowledge based personality recognition from text

  • MehtaY. et al.

    Bottom-up and top-down: Predicting personality with psycholinguistic and language model features

  • Cited by (8)

    View all citing articles on Scopus
    View full text