Abstract
In this paper, we address a novel task, namely, cognition-driven multimodal personality classification (CMPC), aiming to infer personality traits (e.g., romantic, humorous, and gloomy) shown in real time by a human being from the perspective of cognitive psychology. Specifically, this task is motivated by a cognitive difference phenomenon that humans with different personality traits tend to give different personality-oriented textual descriptions when observing an image. In particular, to tackle the inherent noise challenges in this CMPC task, we propose a tailored reinforcement learning approach, namely, multi-agent SelectNet, aiming to integrate the opinion-word and image-region selection strategies to select informative opinion-word and image-region features for CMPC. To justify the effectiveness of our approach, we construct six kinds of multimodal personality classification datasets and conduct extensive experiments on the datasets. Experimental results demonstrate that our approach can significantly outperform other strong competitors, including the state-of-the-art unimodal and multimodal approaches.
Similar content being viewed by others
References
Goldberg L R. An alternative “description of personality”: the big-five factor structure. J Personal Social Psychol, 1990, 59: 1216–1229
Ríssola E A, Bahrainian S A, Crestani F. Personality recognition in conversations using capsule neural networks. In: Proceedings of Web Intelligence, Thessaloniki, 2019. 180–187
Li Y N, Wan J, Miao Q G, et al. CR-Net: a deep classification-regression network for multimodal apparent personality analysis. Int J Comput Vis, 2020, 128: 2763–2780
Carver C S, Scheier M F. Control theory: a useful conceptual framework for personality-social, clinical, and health psychology. Psychol Bull, 1982, 92: 111–135
Wang J J, Li J, Li S S, et al. Aspect sentiment classification with both word-level and clause-level attention networks. In: Proceedings of International Joint Conference on Artificial Intelligence, Stockholm, 2018. 4439–4445
Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirec-tional transformers for language understanding. In: Proceedings of North American Chapter of the Association for Computational Linguistics, Minneapolis, 2019. 4171–4186
He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition. In: Proceedings of Computer Vision and Pattern Recognition, Las Vegas, 2016. 770–778
Liu F, Nowson S, Perez J. A language-independent and compositional model for personality trait recognition from short texts. In: Proceedings of European Chapter of the Association for Computational Linguistics, Valencia, 2017. 754–764
Yamada K, Sasano R, Takeda K. Incorporating textual information on user behavior for personality prediction. In: Proceedings of Association for Computational Linguistics, Florence, 2019. 177–182
Arnoux P H, Xu A B, Boyette N, et al. 25 Tweets to know you: a new model to predict personality with social media. In: Proceedings of International Conference on Web and Social Media, Montreal, 2017. 472–475
Sun X G, Liu B, Cao J X, et al. Who am I? Personality detection based on deep learning for texts. In: Proceedings of International Conference on Communications, Kansas City, 2018. 1–6
da Silva B B C, Paraboni I. Personality recognition from facebook text. In: Proceedings of the Portuguese Language, Canela, 2018. 107–114
Pizzolli D, Strapparava C. Personality traits recognition in literary texts. In: Proceedings of Storytelling Workshop, 2019. 107–111
Liu L Q, Preotiuc-Pietro D, Samani Z R, et al. Analyzing personality through social media profile picture choice. In: Proceedings of International Conference on Web and Social Media, Cologne, 2016. 211–220
Ferwerda B, Tkalcic M. Predicting users’ personality from instagram pictures: using visual and/or content features? In: Proceedings of User Modeling, Adaptation and Personalization, Singapore, 2018. 157–161
Moubayed N A, Vazquez-Alvarez Y, McKay A, et al. Face-based automatic personality perception. In: Proceedings of ACM-MM, Orlando, 2014. 1153–1156
Xu J, Tian W J, Fan Y Y, et al. Personality trait prediction based on 2.5D face feature model. In: Proceedings of Cloud Computing and Security, Haikou, 2018. 611–623
Kampman O, Barezi E J, Bertero D, et al. Investigating audio, video, and text fusion methods for end-to-end automatic personality prediction. In: Proceedings of Association for Computational Linguistics, Melbourne, 2018. 606–611
Farnadi G, Tang J, de Cock M, et al. User profiling through deep multimodal fusion. In: Proceedings of Web Search and Data Mining, Marina Del Rey, 2018. 171–179
Lei T, Barzilay R, Jaakkola T S. Rationalizing neural predictions. In: Proceedings of Empirical Methods in Natural Language Processing, Austin, 2016. 107–117
Guo H Y. Generating text with deep reinforcement learning. 2015. ArXiv:1510.09202
Huang Q Y, Gan Z, Celikyilmaz A, et al. Hierarchically structured reinforcement learning for topically coherent visual story generation. In: Proceedings of Association for the Advance of Artificial Intelligence, Honolulu, 2019. 8465–8472
Li J W, Monroe W, Ritter A, et al. Deep reinforcement learning for dialogue generation. In: Proceedings of Empirical Methods in Natural Language Processing, Austin, 2016. 1192–1202
Takanobu R, Zhang T Y, Liu J X, et al. A hierarchical framework for relation extraction with reinforcement learning. In: Proceedings of Association for the Advance of Artificial Intelligence, Honolulu, 2019. 7072–7079
Wang H, Li S Y, Pan R, et al. Incorporating graph attention mechanism into knowledge graph reasoning based on deep reinforcement learning. In: Proceedings of Empirical Methods in Natural Language Processing, Hong Kong, 2019. 2623–2631
Zhang T Y, Huang M L, Zhao L. Learning structured representation for text classification via reinforcement learning. In: Proceedings of Association for the Advance of Artificial Intelligence, New Orleans, 2018. 6053–6060
Feng J, Li H, Huang M L, et al. Learning to collaborate: multi-scenario ranking via multi-agent reinforcement learning. In: Proceedings of World Wide Web, Lyon, 2018. 1939–1948
Gui T, Zhu L, Zhang Q, et al. Cooperative multimodal approach to depression detection in twitter. In: Proceedings of Association for the Advance of Artificial Intelligence, Honolulu, 2019. 110–117
Littman M L. Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of International Conference of Machine Learning, New Brunswick, 1994. 157–163
Sutton R S, McAllester D A, Singh S P, et al. Policy gradient methods for reinforcement learning with function approximation. In: Proceedings of Neural Information Processing Systems, Denver, 1999. 1057–1063
Wu Y H, Schuster M, Chen Z F, et al. Google’s neural machine translation system: bridging the gap between human and machine translation. 2016. ArXiv:1609.08144
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of Neural Information Processing Systems, Long Beach, 2017. 6000–6010
Shen T, Zhou T Y, Long G D, et al. Reinforced self-attention network: a hybrid of hard and soft attention for sequence modeling. In: Proceedings of International Joint Conference on Artificial Intelligence, Stockholm, 2018. 4345–4352
Lu J S, Xiong C M, Parikh D, et al. Knowing when to look: adaptive attention via a visual sentinel for image captioning. In: Proceedings of Computer Vision and Pattern Recognition, Honolulu, 2017. 3242–3250
Ren S Q, He K M, Girshick R B, et al. Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of Neural Information Processing Systems, Montreal, 2015. 91–99
Sutton R S, Barto A G. Reinforcement learning: an introduction. IEEE Trans Neural Netw, 1998, 9: 1054–1054
Yeung S, Ramanathan V, Russakovsky O, et al. Learning to learn from noisy web videos. In: Proceedings of Computer Vision and Pattern Recognition, Honolulu, 2017. 7455–7463
Williams R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn, 1992, 8: 229–256
Shuster K, Humeau S, Hu H X, et al. Engaging image captioning via personality. In: Proceedings of Computer Vision and Pattern Recognition, Long Beach, 2019. 12516–12526
Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of Artificial Intelligence and Statistics, Chia Laguna Resort, 2010. 249–256
Kingma D P, Ba J. ADAM: a method for stochastic optimization. In: Proceedings of International Conference on Learning Representations, San Diego, 2015
Yang Y M, Liu X. A re-examination of text categorization methods. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, 1999. 42–49
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of International Conference on Learning Representations, San Diego, 2015
Olgun M, Onarcan A O, Özkan K, et al. Wheat grain classification by using dense SIFT features with SVM classifier. Comput Electron Agr, 2016, 122: 185–190
Nam H, Ha J W, Kim J. Dual attention networks for multimodal reasoning and matching. In: Proceedings of Computer Vision and Pattern Recognition, Honolulu, 2017. 2156–2164
Zhang Q, Fu J L, Liu X Y, et al. Adaptive co-attention network for named entity recognition in tweets. In: Proceedings of Association for the Advance of Artificial Intelligence, New Orleans, 2018. 5674–5681
Kim W, Son B, Kim I. ViLT: vision-and-Language transformer without convolution or region supervision. 2021. ArXiv:2102.03334
Yu F, Tang J J, Yin W C, et al. ERNIE-ViL: knowledge enhanced vision-language representations through scene graph. 2020. ArXiv:2006.16934
Qi D, Su L, Song J, et al. ImageBERT: cross-modal pre-training with large-scale weak-supervised image-text data. 2020. ArXiv:2001.07966
Zheng Y T, Huang D, Liu S T, et al. Cross-domain object detection through coarse-to-fine feature adaptation. In: Proceedings of Computer Vision and Pattern Recognition, Seattle, 2020. 13763–13772
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grant Nos. 62006166, 62076175, 62076176), China Postdoctoral Science Foundation (Grant No. 2019M661930), and Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD). We thank our anonymous reviewers for their helpful comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Supporting information
Appendixes A and B. The supporting information is available online at info.scichina.com and link.springer.com. The supporting materials are published as submitted, without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.
Supplementary File
Rights and permissions
About this article
Cite this article
Gao, X., Wang, J., Li, S. et al. Cognition-driven multimodal personality classification. Sci. China Inf. Sci. 65, 202104 (2022). https://doi.org/10.1007/s11432-020-3307-3
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-020-3307-3