A lexical psycholinguistic knowledge-guided graph neural network for interpretable personality detection
Introduction
With the rapid development of social media platforms, people can access and analyze much user-generated content (UGC) to automatically identify authors personality traits. Many studies have shown that automatic personality detection systems play an essential role in various applications, such as user interest mining [1], information dissemination [2], recommendation systems [3], [4], [5], and intelligent machine design [6]. Therefore, analyzing and detecting users’ personality traits is significant for grasping users’ current and future psychologies and predicting their reactions and behaviors.
Personality detection research based on user-generated text is mainly divided into psycholinguistic lexicon-based, neural language model-based, and interpretability research. Earlier researchers captured psycholinguistic lexicon statistics features such as Linguistic Inquiry and Word Count (LIWC) [7] and Medical Research Council (MRC) [8] features in texts for personality detection [9], [10]. However, the obtaining artificial features are a costly operation, and a statistical analysis cannot effectively represent the original semantics. To avoid feature engineering, deep neural models are employed to learn text-distributed representations from end to end, and the resulting detection accuracy is greatly improved [11], [12], [13]. However, neural language model embeddings lack the ability to explain personality. Recently, some researchers combined common knowledge to detect personality [14], [15], providing some ability to explain personality and contributing to the analysis of personality traits. The latest researchers employed interpretable machine learning to clearly quantify the impacts of various psycholinguistic statistical features [16], [17]. However, these methods do not deeply exploit psycholinguistic domain knowledge and fail to effectively integrate psycholinguistic knowledge and text semantics into the associated neural models.
In the psychology field, personality traits are defined as attribute combinations of individual thoughts and emotions to explain the differences in human behaviors [18]. The generally used measurement metric are the Big Five personality, including openness, conscientiousness, extroversion, agreeableness, and neuroticism [19]. The relationship between personality and language has been studied for a long time. Psycholinguistics found an interesting phenomenon in empirical research: personality traits affect people’s use of language, which refers to their choice of vocabulary. Specifically, the LIWC lexicon [20], [21] and some personality adjectives [22] (Personality Adjectives Check List)1 have linear correlations with each personality trait. In addition, people with the same personality traits usually have the same fixed emotional polarities [23]. The details regarding this topic are described in Appendix. Fig. 1 shows a visual example of a neurotic user’s psycholinguistic knowledge. From the words “hate”, “murder”, and “hell”, we can roughly infer that he/she is a neurotic user. Based on the relationship between the synonym “damn”, emotional polarity, and personality traits, this inference is more confident to be confirmed. It can be seen that conducting personality detection research from the lexical psycholinguistic knowledge perspective can bring rich domain structure knowledge rather than superficial psycholinguistic statistical information. Although research on personality detection has achieved remarkable results, some challenges still remain.
- •
Fusion of text semantics and psycholinguistic knowledge: It is a challenge to fully fuse lexical psycholinguistic knowledge and text semantics while accurately representing the personality traits derived from the user’s language.
- •
Interpretability of personality detection: It is a challenge to utilize personality psychology knowledge to realize explainable personality detection in neural models.
To meet the above challenges, we propose a novel lexical psycholinguistic knowledge-guided graph neural network model for interpretable personality detection. Our model enriches personality document representations by incorporating heterogeneous external knowledge through the use of personality lexicons as intermediaries. In particular, instead of directly using previous pretrained word embeddings, we first refine a kind of personality-aware word embedding via position encoding and an attention mechanism. Second, to fully fuse knowledge and semantics, we align the personality lexicons with the constructed personality knowledge graph and automatically build a heterogeneous personality word graph for each user. Then, we develop a Heterogeneous Personality Message-passing graph neural Network (HPMN) and perform interactions among the word nodes, emotion and personality heterogeneous nodes in directed edges. Finally, regarding the interpretability of personality traits, we design a graph-level readout function, which delicately selects all heterogeneous nodes for incorporation as knowledge-guided document embeddings to achieve user-generated text personality understanding and interpretation. Therefore, personality detection is transformed into a heterogeneous word graph classification problem. After conducting a verification on 4 public personality datasets, the results show that our model can effectively improve the accuracy of personality detection and pay more attention to critical knowledge.
In summary, our contributions can be summarized as follows.
- •
To the best of our knowledge, this is the first work that integrates lexical psycholinguistic knowledge and text semantics information into a neural model to achieve interpretable personality detection. Moreover, it provides support for lexical hypotheses in psycholinguistic research from a computational linguistic perspective.
- •
Our model incorporates the distribution representations of words and the lexical knowledge by learning personality-aware word embeddings. In addition, we construct a heterogeneous personality word graph and develop a message-passing network, which extracts explicit lexicon and knowledge relations via the interactions among heterogeneous graph nodes. All heterogeneous nodes are selectively incorporated as knowledge-guided document embeddings for personality understanding and interpretation through a carefully designed graph readout layer.
- •
Experiment results on four public datasets demonstrate that our model outperforms the state-of-the-art techniques in terms of personality detection. Our model can help various types of social software mine user information and help psychologists study and analyze personality traits in depth.
The rest of this paper is organized as follows. Section 2 introduces the work related to personality detection. Section 3 provides the problem formulation and Section 4 describes the proposed method. Further, Section 5 presents and analyzes the experimental results obtained on 4 public datasets. Finally, Section 6 outlines the conclusion and future research.
Section snippets
Related work
Due to the wide potential application value, personality detection has gradually attracted the attention of computer science researchers [24], [25]. Although personality detection in social networks is in its infancy, scholars have achieved fruitful results from multiple research perspectives. Aiming at the challenges mentioned in the previous section, this section focuses on the achievements of scholars in terms of four aspects: (1) psycholinguistic lexicon-based, (2) neural language
Problem formulation
Personality detection can be formulated as a user-level multilabel classification problem. Mathematically, given a user- generated document , where is the th sentence with words. Our goal is to detect personality traits for this user based on document , where is a binary variable.
Proposed method
In this section, we present our GNN-based personality detection model guided by lexical psycholinguistic knowledge. Our model takes full advantage of personality lexicons as a bridge to enrich the representations of personality documents with the incorporation of heterogeneous external knowledge. As illustrated in Fig. 2, our model contains three main parts.
- (1)
Personality-aware word embedding: To fully fuse lexical psycholinguistic knowledge and text semantics, we design personality word position
Experimental settings
In this section, we introduce the datasets used in the experiment and present the baseline methods. After introducing the parameter set, we present the evaluation index used to evaluate the performance of the models.
Conclusion and future research
In this paper, we present a novel personality detection model with lexical psycholinguistic knowledge guild, which not only achieves accurate personality detection results for social media texts but also enables us to explore the interpretability of personality traits via word knowledge. First, we summarize a personality dictionary containing 2043 words and learn personality-aware word embeddings to refine more accurate word vectors. Then, in combination with the background psychological
CRediT authorship contribution statement
Yangfu Zhu: Conceptualization, Methodology, Software, Writing – original draft. Linmei Hu: Methodology, Writing – review & editing. Nianwen Ning: Writing – review & editing. Wei Zhang: Writing – review & editing. Bin Wu: Supervision, Funding acquisition.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work is supported by the NSFC-General Technology Basic Research Joint Funds, China under Grant (U1936220), the National Natural Science Foundation of China under Grant (61972047) and the National Key Research and Development Program of China (2018YFC0831500).
References (44)
- et al.
Mining user interest based on personality-aware hybrid filtering in social networks
Knowl.-Based Syst.
(2020) - et al.
Reposting negative information on microblogs: Do personality traits matter?
Inf. Process. Manage.
(2020) - et al.
Cross-domain recommendation with user personality
Knowl.-Based Syst.
(2021) - et al.
Personality prediction system from Facebook users
Procedia Comput. Sci.
(2017) - et al.
A sentiment-aware deep learning approach for personality detection from text
Inf. Process. Manage.
(2021) - et al.
Knowledge of words: An interpretable approach for personality recognition from social media
Knowl.-Based Syst.
(2020) Personality in 100,000 words: A large-scale analysis of personality and word use among bloggers
J. Res. Personal.
(2010)- et al.
Dynamic network embedding survey
Neurocomputing
(2022) - et al.
FSS-GCN: A graph convolutional networks with fusion of semantic and structure for emotion cause analysis
Knowl.-Based Syst.
(2021) - et al.
Jkt: A joint graph convolutional network based deep knowledge tracing
Inform. Sci.
(2021)
Modeling sentiment dependencies with graph convolutional networks for aspect-level sentiment classification
Knowl.-Based Syst.
Recommendation by users’ multimodal preferences for smart city applications
IEEE Trans. Ind. Inf.
From affect, behavior, and cognition to personality: an integrated personal character model for individual-like intelligent artifacts
World Wide Web
Linguistic inquiry and word count: LIWC 2001
Mahway: Lawrence Erlbaum Assoc.
The MRC psycholinguistic database
Q. J. Exp. Psychol. A
Deep learning-based personality recognition from text posts of online social networks
Appl. Intell.
Deep learning-based document modeling for personality detection from text
IEEE Intell. Syst.
Who am I? Personality detection based on deep learning for texts
Common sense knowledge based personality recognition from text
Bottom-up and top-down: Predicting personality with psycholinguistic and language model features
Cited by (8)
PCENet: Psychological Clues Exploration Network for Multimodal Personality Assessment
2023, International Conference on Information and Knowledge Management, ProceedingsA deep learning approach to text-based personality prediction using multiple data sources mapping
2023, Neural Computing and Applications