MEC 2016: The Multimodal Emotion Recognition Challenge of CCPR 2016

Li, Ya; Tao, Jianhua; Schuller, Björn; Shan, Shiguang; Jiang, Dongmei; Jia, Jia

doi:10.1007/978-981-10-3005-5_55

Ya Li¹⁶,
Jianhua Tao^16,17,
Björn Schuller^18,19,20,
Shiguang Shan²¹,
Dongmei Jiang²² &
…
Jia Jia²³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 663))

Included in the following conference series:

Chinese Conference on Pattern Recognition

2846 Accesses
18 Citations

Abstract

Emotion recognition is a significant research filed of pattern recognition and artificial intelligence. The Multimodal Emotion Recognition Challenge (MEC) is a part of the 2016 Chinese Conference on Pattern Recognition (CCPR). The goal of this competition is to compare multimedia processing and machine learning methods for multimodal emotion recognition. The challenge also aims to provide a common benchmark data set, to bring together the audio and video emotion recognition communities, and to promote the research in multimodal emotion recognition. The data used in this challenge is the Chinese Natural Audio-Visual Emotion Database (CHEAVD), which is selected from Chinese movies and TV programs. The discrete emotion labels are annotated by four experienced assistants. Three sub-challenges are defined: audio, video and multimodal emotion recognition. This paper introduces the baseline audio, visual features, and the recognition results by Random Forests.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://ufldl.stanford.edu/wiki/index.php/Implementing_PCA/Whitening.

References

McKeown, G., Valstar, M., Cowie, R., Pantic, M., Schröder, M.: The semaine database: annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Trans. Affect. Comput. 3, 5–17 (2012)
Article Google Scholar
Dhall, A., Goecke, R., Lucey, S., Gedeon, T.: Collecting large, richly annotated facial-expression databases from movies. IEEE Multimedia 19, 34–41 (2012)
Article Google Scholar
Dhall, A., Goecke, R., Lucey, S., Gedeon, T.: Static facial expression analysis in tough conditions: data, evaluation protocol and benchmark. In: IEEE International Conference on Computer Vision Workshops, ICCV 2011 Workshops, Barcelona, Spain, pp. 2106–2112, November 2011
Google Scholar
Schuller, B., Valstar, M., Eyben, F., McKeown, G., Cowie, R., Pantic, M.: AVEC 2011-the first international audio/visual emotion challenge. In: Affective Computing and Intelligent Interaction, pp. 415–424 (2011)
Google Scholar
Schuller, B., Steidl, S., Batliner, A.: The interspeech 2009 emotion challenge. In: Interspeech, pp. 312–315 (2009)
Google Scholar
Valstar, M.F., Jiang, B., Mehu, M., Pantic, M., Scherer, K.: The first facial expression recognition and analysis challenge. In: 2011 IEEE International Conference on Automatic Face and Gesture Recognition and Workshops (FG 2011), pp. 921–926 (2011)
Google Scholar
Dhall, A., Ramana Murthy, O., Goecke, R., Joshi, J., Gedeon, T.: Video and image based emotion recognition challenges in the wild: Emotiw 2015. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 423–426 (2015)
Google Scholar
Ringeval, F., Schuller, B., Valstar, M., Jaiswal, S., Marchi, E., Lalanne, D., et al.: AV+EC 2015: the first affect recognition challenge bridging across audio, video, and physiological data. In: Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, pp. 3–8 (2015)
Google Scholar
Ververidis, D., Kotropoulos, C.: Emotional speech recognition: resources, features, and methods. Speech Commun. 48, 1162–1181 (2006)
Article Google Scholar
Wu, C.-H., Lin, J.-C., Wei, W.-L.: Survey on audiovisual emotion recognition: databases, features, and data fusion strategies. APSIPA Trans. Signal Inf. Process. 3, 12 (2014)
Article Google Scholar
Douglas-Cowie, E., Campbell, N., Cowie, R., Roach, P.: Emotional speech: towards a new generation of databases. Speech Commun. 40, 33–60 (2003)
Article MATH Google Scholar
Grimm, M., Kroschel, K., Narayanan, S.: The Vera am Mittag German audio-visual emotional speech database. In: International Conference on Multimedia Computing and Systems/International Conference on Multimedia and Expo, pp. 865–868 (2008)
Google Scholar
Devillers, L., Cowie, R., Martin, J.C., Douglas-Cowie, E., Abrilian, S., Mcrorie, M.: Real life emotions in French and English TV video clips: an integrated annotation protocol combining continuous and discrete approaches. In: International Conference on Language Resources and Evaluation, pp. 1105–1110 (2006)
Google Scholar
Clavel, C., Vasilescu, I., Devillers, L., Richard, G., Ehrette, T., Sedogbo, C.: The SAFE corpus: illustrating extreme emotions in dynamic situations. In: First International Workshop on Emotion: Corpora for Research on Emotion and Affect, pp. 76–79 (2006)
Google Scholar
Bao, W., Li, Y., Gu, M., Yang, M., Li, H., Chao, L., et al.: Building a Chinese natural emotional audio-visual database. In: 2014 International Conference on Signal Processing, pp. 583–587 (2014)
Google Scholar
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., et al.: Emotion recognition in human-computer interaction. IEEE Signal Process. Mag. 18, 32–80 (2001)
Article Google Scholar
Gobl, C., Chasaide, A.N.: The role of voice quality in communicating emotion, mood and attitude. Speech Commun. 40, 189–212 (2003)
Article MATH Google Scholar
Eyben, F., Weninger, F., Gross, F., Schuller, B.: Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 835–838 (2013)
Google Scholar
Xiong, X., Torre, F.D.L.: Supervised descent method and its applications to face alignment. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 532–539 (2013)
Google Scholar
Viola, P., Jones, M.J.: Robust real-time object detection. Int. J. Comput. Vision 57, 87 (2001)
Google Scholar
Zhao, G., Pietikinen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 29, 915–928 (2007)
Article Google Scholar

Download references

Acknowledgement

This work is supported by the National High-Tech Research and Development Program of China (863 Program) (No. 2015AA016305), the National Natural Science Foundation of China (NSFC) (No. 61305003, No. 61425017), the Strategic Priority Research Program of the CAS (Grant XDB02080006), and partly supported by the Major Program for the National Social Science Fund of China (13 & ZD189).

We thank the data providers for their kind permission to make their data for non-commercial, scientific use. Due to space limitations, providers’ information is available in http://www.speakit.cn/. The corpus can be freely achieved at ChineseLDC, http://www.chineseldc.org.

Author information

Authors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Beijing, People’s Republic of China
Ya Li & Jianhua Tao
University of Chinese Academy of Sciences, Beijing, People’s Republic of China
Jianhua Tao
Chair of Complex and Intelligent Systems, University of Passau, Passau, Germany
Björn Schuller
Department of Computing, Imperial College London, London, UK
Björn Schuller
Harbin Institute of Technology, Harbin, People’s Republic of China
Björn Schuller
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, People’s Republic of China
Shiguang Shan
Northwestern Polytechnical University, Xi’an, People’s Republic of China
Dongmei Jiang
Tsinghua University, Beijing, People’s Republic of China
Jia Jia

Authors

Ya Li
View author publications
You can also search for this author in PubMed Google Scholar
Jianhua Tao
View author publications
You can also search for this author in PubMed Google Scholar
Björn Schuller
View author publications
You can also search for this author in PubMed Google Scholar
Shiguang Shan
View author publications
You can also search for this author in PubMed Google Scholar
Dongmei Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Jia Jia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ya Li .

Editor information

Editors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi'an, China
Xuelong Li
Chinese Academy of Sciences, Institute of Computing Technology, Beijing, China
Xilin Chen
Tsinghua University , Beijing, China
Jie Zhou
Nanjing University of Science and Technology, Nanjing, China
Jian Yang
University of Electronic Science and Technology, Chengdu, Sichuan, China
Hong Cheng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y., Tao, J., Schuller, B., Shan, S., Jiang, D., Jia, J. (2016). MEC 2016: The Multimodal Emotion Recognition Challenge of CCPR 2016. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds) Pattern Recognition. CCPR 2016. Communications in Computer and Information Science, vol 663. Springer, Singapore. https://doi.org/10.1007/978-981-10-3005-5_55

Download citation

DOI: https://doi.org/10.1007/978-981-10-3005-5_55
Published: 22 October 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3004-8
Online ISBN: 978-981-10-3005-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics