Abstract
Spontaneous facial expression recognition has gained much attention from researchers in recent years, however most of the existing algorithms still encounter bottlenecks in performance due to too big redundant images data in the video. In this paper, we propose a novel co-salient facial feature extraction algorithm, combined with human visual attention mechanism and group data co-processing technology, which would largely reduce the redundant information in the original images and effectively improve the recognizing accuracy of facial expressions. Firstly, based on human visual mechanism, key frames of expression are dynamically derived from the original videos to capture the temporal dynamics of facial expressions. Secondly, using key sequence frames, salient regions are obtained by multiplicative fusion algorithm and in multi-images co-operative manner. Thirdly, we get rid of these salient regions due to their little deformation and low-correlation to facial expressions, and reduce the number of facial features data. At last, we extract Local Binary Pattern (LBP) features from the remainder of facial features and use Support Vector Machine (SVM) classifier to classify them respectively. Experimental results on dataset Cohn-Kanade plus and MMI showed that our proposed method can effectively improve the recognizing accuracy of spontaneous expression sequence.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
He, M.H.: Spontaneous and artificial expression analysis of recognition studies. ANhui: University of Science and Technology of China, pp. 15–16 (2014)
Dahmane, M., Meunier, J.: Continuous emotion recognition using gabor energy filters. In: D’Mello, S., Graesser, A., Schuller, B., Martin, J.C. (eds.) ACII 2011. LNCS, vol. 6975, pp. 351–358. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24571-8_46
Zhu, Y., Torre, F., Cohn, J.F., Zhang, Y.: Dynamic cascades with bidirectional boost strapping for action unit detection in spontaneous facial behavior. IEEE Trans. Affect. Comput. 2(2), 79–91 (2011)
Glodek, M., et al.: Multiple classifier systems for the classification of audio-visual emotional states. In: D’Mello, S., Graesser, A., Schuller, B., Martin, J.C. (eds.) ACII 2011. LNCS, vol. 6975, pp. 359–368. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24571-8_47
Dahmane, M., Meunier, J.: Continuous emotion recognition using gabor energy filters. In: D’Mello, S., Graesser, A., Schuller, B., Martin, J.C. (eds.) ACII 2011. LNCS, vol. 6975, pp. 351–358. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24571-8_46
Savran, A., Cao, H., Shah, M., Nenkova, A., Verma, R.: Combining video, audio and lexical indicators of affect in spontaneous conversation via paricle filtering. In: Proceedings of the 14th ACM International Conference on Multimodal Interaction Workshops, pp. 485–492 (2012)
Dahmane, M., Meunier, J.: Continuous emotion recognition using gabor energy filters. In: D’Mello, S., Graesser, A., Schuller, Björn, Martin, J.C. (eds.) ACII 2011. LNCS, vol. 6975, pp. 351–358. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24571-8_46
Zhu, Y., De la Torre, F., Cohn, J.F., et al.: Dynamic cascades with bidirectional bootstrapping for action unit detection in spontaneous facial behavior. IEEE Trans. Affect. Comput. 2(2), 79–91 (2011)
Jiang, B., Valstar, M., Martinez, B., et al.: A dynamic appearance descriptor approach to facial actions temporal modeling. IEEE Trans. Cybern. 44(2), 161–174 (2014)
Matas, J., Chum, O., Urban, M., et al.: Robust wide baseline stereo from maximally stable external regions. In: Proceedings of the British Machine Vision Conference, pp. 384–393 (2002)
Tuytelars, T., Van Gool, L.: Matching widely separated views based on affine invariant region. Int. J. Comput. Vis. 59(1), 61–85 (2004). https://doi.org/10.1023/B:VISI.0000020671.28016.e8
Kadir, T., Zisserman, A., Brady, M.: An affine invariant salient region detector. In: Pajdla, T., Matas, J. (eds.) ECCV 2004. LNCS, vol. 3021, pp. 228–241. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24670-1_18
Mikolajczyk, K., Tuytelars, T., Schmid, C., et al.: A comparison of affine region detectors. Int. J. Comput. Vis. 65(1-2), 43–72 (2005). https://doi.org/10.1007/s11263-005-3848-x
Cai, H.P., Lei, L., Chen, T., et al.: A general approach for extracting affine invariant regions. J. Acta Electrinica Sin. 36(4), 672–678 (2008). in Chinese
Ramirez, G.A., Baltrušaitis, T., Morency, L.P.: Modeling latent discriminative dynamic of multi-dimensional affective signals. In: D’Mello, S., Graesser, A., Schuller, B., Martin, J.C. (eds.) ACII 2011. LNCS, vol. 6975, pp. 396–406. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24571-8_51
Nicolle, J., Rapp, V., Bailly, K., Prevost, L., Chetouani, M.: Robust continuous prediction of human emotions using multi-scale dynamic cues. In: Proceedings of the 14th ACM International Conference on Multimodal Interaction Work-shops, pp. 501–508 (2012)
Haber, R., Hershenson, M.: The Psychology of Visual Perception. Holt, Rinehart and Winston, New York (1973)
Huazhu, F., Xiao, C., Zhuowen, T.: Cluster-based co-saliency detection. IEEE Trans. Image Process. 22(10), 3766–3778 (2013)
Lang, C., Nguyen, T.V., Katti, H., Yadati, K., Kankanhalli, M., Yan, S.: Depth matters: influence of depth cues on visual saliency. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7573, pp. 101–115. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33709-3_8
Hu, Y.X., Wan, L.: Heterogeneous image fusion based on visual attention mechanism. Comput. Eng. 41(3), 247–252 (2015)
Shao, H., Wang, Y., et al.: Dynamic sequence emotional recognition based on AAM and optical flow method. Comput. Eng. Des. 38(6), 1642–1656 (2017)
Acknowledgements
This paper is funded by Scientific Project of Guangdong Provincial Transport Department (No. Sci & Tec-2016-02-30), Surface Project of Natural Science Foundation of Guangdong Province (No. 2016A030313703 and 2016A030313713).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, L., Ji, Q., Jiang, W., Ning, D. (2020). Spontaneous Expression Recognition Based on Visual Attention Mechanism and Co-salient Features. In: Chen, X., Yan, H., Yan, Q., Zhang, X. (eds) Machine Learning for Cyber Security. ML4CS 2020. Lecture Notes in Computer Science(), vol 12488. Springer, Cham. https://doi.org/10.1007/978-3-030-62463-7_26
Download citation
DOI: https://doi.org/10.1007/978-3-030-62463-7_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62462-0
Online ISBN: 978-3-030-62463-7
eBook Packages: Computer ScienceComputer Science (R0)