Skip to main content
Log in

CR-Net: A Deep Classification-Regression Network for Multimodal Apparent Personality Analysis

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

First impressions strongly influence social interactions, having a high impact in the personal and professional life. In this paper, we present a deep Classification-Regression Network (CR-Net) for analyzing the Big Five personality problem and further assisting on job interview recommendation in a first impressions setup. The setup is based on the ChaLearn First Impressions dataset, including multimodal data with video, audio, and text converted from the corresponding audio data, where each person is talking in front of a camera. In order to give a comprehensive prediction, we analyze the videos from both the entire scene (including the person’s motions and background) and the face of the person. Our CR-Net first performs personality trait classification and applies a regression later, which can obtain accurate predictions for both personality traits and interview recommendation. Furthermore, we present a new loss function called Bell Loss to address inaccurate predictions caused by the regression-to-the-mean problem. Extensive experiments on the First Impressions dataset show the effectiveness of our proposed network, outperforming the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. Note that our aim is to perform an analysis of our network and loss proposal in order to enhance first impressions recognition. We do not argue that interview recommendation variable has a direct application in real scenarios. Different jobs require different competences and studying automatic recommendation of job profiles is out of the scope of this work.

  2. https://en.wikipedia.org/wiki/Regression_toward_the_mean.

  3. Images are from Lisa Feldman Barrett’s Keynote speech “From Essences to Predictions: Understanding the Nature of Emotion” on European Society for Cognitive and Affective Neuroscience 2018.

References

  • Barrick, M. R., & Mount, M. K. (1991). The big five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44(1), 1–26.

    Article  Google Scholar 

  • Basu, A., Dasgupta, A., Thyagharajan, A., Routray, A., Guha, R., & Mitra, P. (2018). A portable personality recognizer based on affective state classification using spectral fusion of features. IEEE Transactions on Affective Computing, 9(3), 330–342.

    Article  Google Scholar 

  • Bekhouche, S. E., Dornaika, F., Ouafi, A., & Taleb-Ahmed, A. (2017). Personality traits and job candidate screening via analyzing facial videos. In 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW) (pp. 1660–1663). IEEE.

  • Bland, J. M., & Altman, D. G. (1994a). Regression towards the mean. BMJ: British Medical Journal, 308(6942), 1499.

    Article  Google Scholar 

  • Bland, J. M., & Altman, D. G. (1994b). Statistics notes: Some examples of regression towards the mean. BMJ, 309(6957), 780.

    Article  Google Scholar 

  • Chen, S., Zhang, C., & Dong, M. (2018). Deep age estimation: From classification to ranking. IEEE Transactions on Multimedia, 20(8), 2209–2222.

    Article  Google Scholar 

  • Corr, P. J., & Matthews, G. (2009). The Cambridge handbook of personality psychology, chap. MethodsofPersonalityAssessment (pp. 110–126). Cambridge: Cambridge University Press.

  • Correa, J. A. M., Abadi, M. K., Sebe, N., & Patras, I. (2018). Amigos: A dataset for affect, personality and mood research on individuals and groups. IEEE Transactions on Affective Computing. https://doi.org/10.1109/TAFFC.2018.2884461.

  • Escalante, H. J., Kaya, H., Salah, A. A., Escalera, S., Gucluturk, Y., Guclu, U., et al. (2018). Explaining first impressions: Modeling, recognizing, and explaining apparent personality from videos. arXiv preprint arXiv:1802.00745.

  • Escalante, H. J., Ponce-López, V., Wan, J., Riegler, M. A., Chen, B., Clapés, A., et al. (2016). Chalearn joint contest on multimedia challenges beyond visual analysis: An overview. In ICPR (pp. 67–73).

  • Eyben, F., Wöllmer, M., & Schuller, B. (2010). Opensmile: The munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM international conference on multimedia (pp. 1459–1462). ACM.

  • Gao, B. B., Zhou, H. Y., Wu, J., & Geng, X. (2018). Age estimation using expectation of label distribution learning. In IJCAI (pp. 712–718).

  • Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42.

    Article  Google Scholar 

  • Güçlütürk, Y., Güçlü, U., Baro, X., Escalante, H. J., Guyon, I., Escalera, S., et al. (2018). Multimodal first impression analysis with deep residual networks. IEEE Transactions on Affective Computing, 9(3), 316–329.

    Article  Google Scholar 

  • Güçlütürk, Y., Güçlü, U., van Gerven, M. A., & van Lier, R. (2016a). Deep impression: Audiovisual deep residual networks for multimodal apparent personality trait recognition. In European conference on computer vision (pp. 349–358). Berlin: Springer.

  • Gürpınar, F., Kaya, H., & Salah, A. A. (2016b) Combining deep facial and ambient features for first impression estimation. In European conference on computer vision (pp. 372–385). Berlin: Springer.

  • Gürpinar, F., Kaya, H., & Salah, A. A. (2016) Multimodal fusion of audio, scene, and face features for first impression estimation. In 2016 23rd International conference on pattern recognition (ICPR) (pp. 43–48). IEEE.

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778).

  • Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.

    Article  Google Scholar 

  • Huang, G. B., Zhu, Q. Y., & Siew, C. K. (2004). Extreme learning machine: A new learning scheme of feedforward neural networks. In Proceedings of the 2004 IEEE international joint conference on neural networks (vol. 2, pp. 985–990). IEEE.

  • Huang, S., & Ramanan, D. (2017). Expecting the unexpected: Training detectors for unusual pedestrians with adversarial imposters. In IEEE conference on computer vision and pattern recognition (CVPR) (vol. 1).

  • Ji, S., Xu, W., Yang, M., & Yu, K. (2013). 3d convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 221–231.

    Article  Google Scholar 

  • Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision (pp. 694–711). Berlin: Springer.

  • Kaya, H., Gürpinar, F., & Salah, A. A. (2017). Multi-modal score fusion and decision trees for explainable automatic job candidate screening from video CVS. In CVPR workshops (pp. 1651–1659).

  • Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

  • Kiros, R., Zhu, Y., Salakhutdinov, R. R., Zemel, R., Urtasun, R., Torralba, A., et al. (2015). Skip-thought vectors. In Advances in neural information processing systems (pp. 3294–3302).

  • Klein, D. N., Kotov, R., & Bufferd, S. J. (2011). Personality and depression: Explanatory models and review of the evidence. Annual Review of Clinical Psychology, 7, 269–295.

    Article  Google Scholar 

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).

  • Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., et al. (2017). Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4681–4690).

  • Li, Y., Miao, Q., Tian, K., Fan, Y., Xu, X., Li, R., et al. (2016). Large-scale gesture recognition with a fusion of rgb-d data based on the c3d model. In 2016 23rd International Conference on Pattern Recognition (ICPR) (pp. 25–30). IEEE.

  • Li, Y., Miao, Q., Tian, K., Fan, Y., Xu, X., Li, R., et al. (2017). Large-scale gesture recognition with a fusion of rgb-d data based on saliency theory and c3d model. IEEE Transactions on Circuits and Systems for Video Technology, 28(10), 2956–2964.

    Article  Google Scholar 

  • Mairesse, F., & Walker, M. (2007). Personage: Personality generation for dialogue. In Proceedings of the 45th annual meeting of the association of computational linguistics (pp. 496–503).

  • Mohammadi, G., & Vinciarelli, A. (2015). Automatic personality perception: Prediction of trait attribution based on prosodic features extended abstract. In 2015 International conference on affective computing and intelligent interaction (ACII) (pp. 484–490). IEEE.

  • Naim, I., Tanveer, M. I., Gildea, D., & Hoque, M.E. (2015). Automated prediction and analysis of job interview performance: The role of what you say and how you say it. In 2015 11th IEEE international conference and workshops on automatic face and gesture recognition (FG) (vol. 1, pp. 1–6). IEEE.

  • Niu, Z., Zhou, M., Wang, L., Gao, X., & Hua, G. (2016). Ordinal regression with multiple output CNN for age estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4920–4928).

  • Norman, W. T. (1963). Toward an adequate taxonomy of personality attributes: Replicated factor structure in peer nomination personality ratings. The Journal of Abnormal and Social Psychology, 66(6), 574.

    Article  Google Scholar 

  • Parkhi, O. M., Vedaldi, A., Zisserman, A., et al. (2015). Deep face recognition. In British machine vision conference (Vol. 1, p. 6).

  • Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., et al. (2017). Automatic differentiation in pytorch.

  • Pennebaker, J. W., & King, L. A. (1999). Linguistic styles: Language use as an individual difference. Journal of Personality and Social Psychology, 77(6), 1296.

    Article  Google Scholar 

  • Polzehl, T., Moller, S., & Metze, F. (2010). Automatically assessing personality from speech. In 2010 IEEE fourth international conference on semantic computing (ICSC) (pp. 134–140). IEEE.

  • Ponce-López, V., Chen, B., Oliu, M., Corneanu, C., Clapés, A., Guyon, I., et al. (2016). Chalearn lap 2016: First round challenge on first impressions-dataset and results. In European conference on computer vision (pp. 400–418). Berlin: Springer.

  • Rothe, R., Timofte, R., & Van Gool, L. (2015). Dex: Deep expectation of apparent age from a single image. In Proceedings of the IEEE international conference on computer vision workshops (pp. 10–15).

  • Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

  • Subramaniam, A., Patel, V., Mishra, A., Balasubramanian, P., & Mittal, A. (2016). Bi-modal first impressions recognition using temporally ordered deep audio and stochastic visual features. In European conference on computer vision (pp. 337–348). Berlin: Springer.

  • Tan, Z., Wan, J., Lei, Z., Zhi, R., Guo, G., & Li, S. Z. (2018). Efficient group-n encoding and decoding for facial age estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(11), 2610–2623.

    Article  Google Scholar 

  • Ventura, C., Masip, D., & Lapedriza, A. (2017). Interpreting CNN models for apparent personality trait regression. In 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW) (pp. 1705–1713). IEEE.

  • Vo, N. N., Liu, S., He, X., & Xu, G. (2018). Multimodal mixture density boosting network for personality mining. In Pacific-Asia conference on knowledge discovery and data mining (pp. 644–655). Berlin: Springer.

  • Wang, X., Yu, K., Dong, C., & Change Loy, C. (2018). Recovering realistic texture in image super-resolution by deep spatial feature transform. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 606–615).

  • Wei, X. S., Zhang, C. L., Zhang, H., & Wu, J. (2018). Deep bimodal regression of apparent personality traits from short video sequences. IEEE Transactions on Affective Computing, 9(3), 303–315.

    Article  Google Scholar 

  • Xia, F., Asabere, N. Y., Liu, H., Chen, Z., & Wang, W. (2017). Socially aware conference participant recommendation with personality traits. IEEE Systems Journal, 11(4), 2255–2266.

    Article  Google Scholar 

  • Zhang, C. L., Zhang, H., Wei, X. S., & Wu, J. (2016). Deep bimodal regression for apparent personality analysis. In European conference on computer vision (pp. 311–324). Berlin: Springer.

  • Zhang, K., Zhang, Z., Li, Z., & Qiao, Y. (2016). Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10), 1499–1503.

    Article  Google Scholar 

  • Zhao, G., Ge, Y., Shen, B., Wei, X., & Wang, H. (2018). Emotion analysis for personality inference from eeg signals. IEEE Transactions on Affective Computing, 9(3), 362–371.

    Article  Google Scholar 

Download references

Acknowledgements

The work was supported by the National Key R&D Program of China under Grant #2018YFC0807500, the National Natural Science Foundations of China #61961160704, #61876179, #61772396, #61772392, #61902296, the Fundamental Research Funds for the Central Universities #JBF180301, Xi’an Key Laboratory of Big Data and Intelligent Vision #201805053ZD4CG37, the Science and Technology Development Fund of Macau (#0008/2018/A1, #0025/2019/A1, #0010/2019/AFJ, #0025/2019/AKP), Spanish project TIN2016-74946-P (MINECO/FEDER, UE) and CERCA Programme/Generalitat de Catalunya.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Wan.

Additional information

Communicated by Wenjun Zeng.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Wan, J., Miao, Q. et al. CR-Net: A Deep Classification-Regression Network for Multimodal Apparent Personality Analysis. Int J Comput Vis 128, 2763–2780 (2020). https://doi.org/10.1007/s11263-020-01309-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-020-01309-y

Keywords

Navigation