CR-Net: A Deep Classification-Regression Network for Multimodal Apparent Personality Analysis

Li, Yunan; Wan, Jun; Miao, Qiguang; Escalera, Sergio; Fang, Huijuan; Chen, Huizhou; Qi, Xiangda; Guo, Guodong

doi:10.1007/s11263-020-01309-y

CR-Net: A Deep Classification-Regression Network for Multimodal Apparent Personality Analysis

Published: 17 March 2020

Volume 128, pages 2763–2780, (2020)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Yunan Li^1,2,
Jun Wan^3,4,
Qiguang Miao^1,2,
Sergio Escalera⁵,
Huijuan Fang^1,2,
Huizhou Chen^1,2,
Xiangda Qi^1,2 &
…
Guodong Guo^6,7

2607 Accesses
24 Citations
3 Altmetric
Explore all metrics

Abstract

First impressions strongly influence social interactions, having a high impact in the personal and professional life. In this paper, we present a deep Classification-Regression Network (CR-Net) for analyzing the Big Five personality problem and further assisting on job interview recommendation in a first impressions setup. The setup is based on the ChaLearn First Impressions dataset, including multimodal data with video, audio, and text converted from the corresponding audio data, where each person is talking in front of a camera. In order to give a comprehensive prediction, we analyze the videos from both the entire scene (including the person’s motions and background) and the face of the person. Our CR-Net first performs personality trait classification and applies a regression later, which can obtain accurate predictions for both personality traits and interview recommendation. Furthermore, we present a new loss function called Bell Loss to address inaccurate predictions caused by the regression-to-the-mean problem. Extensive experiments on the First Impressions dataset show the effectiveness of our proposed network, outperforming the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial intelligence in recommender systems

Article Open access 01 November 2020

A review on the long short-term memory model

Article 13 May 2020

Can I show my skills? Affective responses to artificial intelligence in the recruitment process

Article Open access 16 January 2022

Notes

Note that our aim is to perform an analysis of our network and loss proposal in order to enhance first impressions recognition. We do not argue that interview recommendation variable has a direct application in real scenarios. Different jobs require different competences and studying automatic recommendation of job profiles is out of the scope of this work.
https://en.wikipedia.org/wiki/Regression_toward_the_mean.
Images are from Lisa Feldman Barrett’s Keynote speech “From Essences to Predictions: Understanding the Nature of Emotion” on European Society for Cognitive and Affective Neuroscience 2018.

References

Barrick, M. R., & Mount, M. K. (1991). The big five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44(1), 1–26.
Article Google Scholar
Basu, A., Dasgupta, A., Thyagharajan, A., Routray, A., Guha, R., & Mitra, P. (2018). A portable personality recognizer based on affective state classification using spectral fusion of features. IEEE Transactions on Affective Computing, 9(3), 330–342.
Article Google Scholar
Bekhouche, S. E., Dornaika, F., Ouafi, A., & Taleb-Ahmed, A. (2017). Personality traits and job candidate screening via analyzing facial videos. In 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW) (pp. 1660–1663). IEEE.
Bland, J. M., & Altman, D. G. (1994a). Regression towards the mean. BMJ: British Medical Journal, 308(6942), 1499.
Article Google Scholar
Bland, J. M., & Altman, D. G. (1994b). Statistics notes: Some examples of regression towards the mean. BMJ, 309(6957), 780.
Article Google Scholar
Chen, S., Zhang, C., & Dong, M. (2018). Deep age estimation: From classification to ranking. IEEE Transactions on Multimedia, 20(8), 2209–2222.
Article Google Scholar
Corr, P. J., & Matthews, G. (2009). The Cambridge handbook of personality psychology, chap. MethodsofPersonalityAssessment (pp. 110–126). Cambridge: Cambridge University Press.
Correa, J. A. M., Abadi, M. K., Sebe, N., & Patras, I. (2018). Amigos: A dataset for affect, personality and mood research on individuals and groups. IEEE Transactions on Affective Computing. https://doi.org/10.1109/TAFFC.2018.2884461.
Escalante, H. J., Kaya, H., Salah, A. A., Escalera, S., Gucluturk, Y., Guclu, U., et al. (2018). Explaining first impressions: Modeling, recognizing, and explaining apparent personality from videos. arXiv preprint arXiv:1802.00745.
Escalante, H. J., Ponce-López, V., Wan, J., Riegler, M. A., Chen, B., Clapés, A., et al. (2016). Chalearn joint contest on multimedia challenges beyond visual analysis: An overview. In ICPR (pp. 67–73).
Eyben, F., Wöllmer, M., & Schuller, B. (2010). Opensmile: The munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM international conference on multimedia (pp. 1459–1462). ACM.
Gao, B. B., Zhou, H. Y., Wu, J., & Geng, X. (2018). Age estimation using expectation of label distribution learning. In IJCAI (pp. 712–718).
Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42.
Article Google Scholar
Güçlütürk, Y., Güçlü, U., Baro, X., Escalante, H. J., Guyon, I., Escalera, S., et al. (2018). Multimodal first impression analysis with deep residual networks. IEEE Transactions on Affective Computing, 9(3), 316–329.
Article Google Scholar
Güçlütürk, Y., Güçlü, U., van Gerven, M. A., & van Lier, R. (2016a). Deep impression: Audiovisual deep residual networks for multimodal apparent personality trait recognition. In European conference on computer vision (pp. 349–358). Berlin: Springer.
Gürpınar, F., Kaya, H., & Salah, A. A. (2016b) Combining deep facial and ambient features for first impression estimation. In European conference on computer vision (pp. 372–385). Berlin: Springer.
Gürpinar, F., Kaya, H., & Salah, A. A. (2016) Multimodal fusion of audio, scene, and face features for first impression estimation. In 2016 23rd International conference on pattern recognition (ICPR) (pp. 43–48). IEEE.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778).
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Article Google Scholar
Huang, G. B., Zhu, Q. Y., & Siew, C. K. (2004). Extreme learning machine: A new learning scheme of feedforward neural networks. In Proceedings of the 2004 IEEE international joint conference on neural networks (vol. 2, pp. 985–990). IEEE.
Huang, S., & Ramanan, D. (2017). Expecting the unexpected: Training detectors for unusual pedestrians with adversarial imposters. In IEEE conference on computer vision and pattern recognition (CVPR) (vol. 1).
Ji, S., Xu, W., Yang, M., & Yu, K. (2013). 3d convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 221–231.
Article Google Scholar
Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision (pp. 694–711). Berlin: Springer.
Kaya, H., Gürpinar, F., & Salah, A. A. (2017). Multi-modal score fusion and decision trees for explainable automatic job candidate screening from video CVS. In CVPR workshops (pp. 1651–1659).
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Kiros, R., Zhu, Y., Salakhutdinov, R. R., Zemel, R., Urtasun, R., Torralba, A., et al. (2015). Skip-thought vectors. In Advances in neural information processing systems (pp. 3294–3302).
Klein, D. N., Kotov, R., & Bufferd, S. J. (2011). Personality and depression: Explanatory models and review of the evidence. Annual Review of Clinical Psychology, 7, 269–295.
Article Google Scholar
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).
Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., et al. (2017). Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4681–4690).
Li, Y., Miao, Q., Tian, K., Fan, Y., Xu, X., Li, R., et al. (2016). Large-scale gesture recognition with a fusion of rgb-d data based on the c3d model. In 2016 23rd International Conference on Pattern Recognition (ICPR) (pp. 25–30). IEEE.
Li, Y., Miao, Q., Tian, K., Fan, Y., Xu, X., Li, R., et al. (2017). Large-scale gesture recognition with a fusion of rgb-d data based on saliency theory and c3d model. IEEE Transactions on Circuits and Systems for Video Technology, 28(10), 2956–2964.
Article Google Scholar
Mairesse, F., & Walker, M. (2007). Personage: Personality generation for dialogue. In Proceedings of the 45th annual meeting of the association of computational linguistics (pp. 496–503).
Mohammadi, G., & Vinciarelli, A. (2015). Automatic personality perception: Prediction of trait attribution based on prosodic features extended abstract. In 2015 International conference on affective computing and intelligent interaction (ACII) (pp. 484–490). IEEE.
Naim, I., Tanveer, M. I., Gildea, D., & Hoque, M.E. (2015). Automated prediction and analysis of job interview performance: The role of what you say and how you say it. In 2015 11th IEEE international conference and workshops on automatic face and gesture recognition (FG) (vol. 1, pp. 1–6). IEEE.
Niu, Z., Zhou, M., Wang, L., Gao, X., & Hua, G. (2016). Ordinal regression with multiple output CNN for age estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4920–4928).
Norman, W. T. (1963). Toward an adequate taxonomy of personality attributes: Replicated factor structure in peer nomination personality ratings. The Journal of Abnormal and Social Psychology, 66(6), 574.
Article Google Scholar
Parkhi, O. M., Vedaldi, A., Zisserman, A., et al. (2015). Deep face recognition. In British machine vision conference (Vol. 1, p. 6).
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., et al. (2017). Automatic differentiation in pytorch.
Pennebaker, J. W., & King, L. A. (1999). Linguistic styles: Language use as an individual difference. Journal of Personality and Social Psychology, 77(6), 1296.
Article Google Scholar
Polzehl, T., Moller, S., & Metze, F. (2010). Automatically assessing personality from speech. In 2010 IEEE fourth international conference on semantic computing (ICSC) (pp. 134–140). IEEE.
Ponce-López, V., Chen, B., Oliu, M., Corneanu, C., Clapés, A., Guyon, I., et al. (2016). Chalearn lap 2016: First round challenge on first impressions-dataset and results. In European conference on computer vision (pp. 400–418). Berlin: Springer.
Rothe, R., Timofte, R., & Van Gool, L. (2015). Dex: Deep expectation of apparent age from a single image. In Proceedings of the IEEE international conference on computer vision workshops (pp. 10–15).
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
Subramaniam, A., Patel, V., Mishra, A., Balasubramanian, P., & Mittal, A. (2016). Bi-modal first impressions recognition using temporally ordered deep audio and stochastic visual features. In European conference on computer vision (pp. 337–348). Berlin: Springer.
Tan, Z., Wan, J., Lei, Z., Zhi, R., Guo, G., & Li, S. Z. (2018). Efficient group-n encoding and decoding for facial age estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(11), 2610–2623.
Article Google Scholar
Ventura, C., Masip, D., & Lapedriza, A. (2017). Interpreting CNN models for apparent personality trait regression. In 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW) (pp. 1705–1713). IEEE.
Vo, N. N., Liu, S., He, X., & Xu, G. (2018). Multimodal mixture density boosting network for personality mining. In Pacific-Asia conference on knowledge discovery and data mining (pp. 644–655). Berlin: Springer.
Wang, X., Yu, K., Dong, C., & Change Loy, C. (2018). Recovering realistic texture in image super-resolution by deep spatial feature transform. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 606–615).
Wei, X. S., Zhang, C. L., Zhang, H., & Wu, J. (2018). Deep bimodal regression of apparent personality traits from short video sequences. IEEE Transactions on Affective Computing, 9(3), 303–315.
Article Google Scholar
Xia, F., Asabere, N. Y., Liu, H., Chen, Z., & Wang, W. (2017). Socially aware conference participant recommendation with personality traits. IEEE Systems Journal, 11(4), 2255–2266.
Article Google Scholar
Zhang, C. L., Zhang, H., Wei, X. S., & Wu, J. (2016). Deep bimodal regression for apparent personality analysis. In European conference on computer vision (pp. 311–324). Berlin: Springer.
Zhang, K., Zhang, Z., Li, Z., & Qiao, Y. (2016). Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10), 1499–1503.
Article Google Scholar
Zhao, G., Ge, Y., Shen, B., Wei, X., & Wang, H. (2018). Emotion analysis for personality inference from eeg signals. IEEE Transactions on Affective Computing, 9(3), 362–371.
Article Google Scholar

Download references

Acknowledgements

The work was supported by the National Key R&D Program of China under Grant #2018YFC0807500, the National Natural Science Foundations of China #61961160704, #61876179, #61772396, #61772392, #61902296, the Fundamental Research Funds for the Central Universities #JBF180301, Xi’an Key Laboratory of Big Data and Intelligent Vision #201805053ZD4CG37, the Science and Technology Development Fund of Macau (#0008/2018/A1, #0025/2019/A1, #0010/2019/AFJ, #0025/2019/AKP), Spanish project TIN2016-74946-P (MINECO/FEDER, UE) and CERCA Programme/Generalitat de Catalunya.

Author information

Authors and Affiliations

School of Computer Science and Technology, Xidian University, Xi’an, China
Yunan Li, Qiguang Miao, Huijuan Fang, Huizhou Chen & Xiangda Qi
Xi’an Key Laboratory of Big Data and Intelligent Vision, Xi’an, China
Yunan Li, Qiguang Miao, Huijuan Fang, Huizhou Chen & Xiangda Qi
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Jun Wan
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
Jun Wan
Universitat de Barcelona and Computer Vision Center, Barcelona, Spain
Sergio Escalera
Institute of Deep Learning, Baidu Research, Beijing, China
Guodong Guo
National Engineering Laboratory for Deep Learning Technology and Application, Beijing, China
Guodong Guo

Authors

Yunan Li
View author publications
You can also search for this author in PubMed Google Scholar
Jun Wan
View author publications
You can also search for this author in PubMed Google Scholar
Qiguang Miao
View author publications
You can also search for this author in PubMed Google Scholar
Sergio Escalera
View author publications
You can also search for this author in PubMed Google Scholar
Huijuan Fang
View author publications
You can also search for this author in PubMed Google Scholar
Huizhou Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiangda Qi
View author publications
You can also search for this author in PubMed Google Scholar
Guodong Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Wan.

Additional information

Communicated by Wenjun Zeng.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Y., Wan, J., Miao, Q. et al. CR-Net: A Deep Classification-Regression Network for Multimodal Apparent Personality Analysis. Int J Comput Vis 128, 2763–2780 (2020). https://doi.org/10.1007/s11263-020-01309-y

Download citation

Received: 14 May 2019
Accepted: 13 February 2020
Published: 17 March 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s11263-020-01309-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CR-Net: A Deep Classification-Regression Network for Multimodal Apparent Personality Analysis

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence in recommender systems

A review on the long short-term memory model

Can I show my skills? Affective responses to artificial intelligence in the recruitment process

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

CR-Net: A Deep Classification-Regression Network for Multimodal Apparent Personality Analysis

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence in recommender systems

A review on the long short-term memory model

Can I show my skills? Affective responses to artificial intelligence in the recruitment process

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation