Feature Learning via Deep Belief Network for Chinese Speech Emotion Recognition

Zhang, Shiqing; Zhao, Xiaoming; Chuang, Yuelong; Guo, Wenping; Chen, Ying

doi:10.1007/978-981-10-3005-5_53

Feature Learning via Deep Belief Network for Chinese Speech Emotion Recognition

Shiqing Zhang¹⁶,
Xiaoming Zhao¹⁶,
Yuelong Chuang¹⁶,
Wenping Guo¹⁶ &
…
Ying Chen¹⁶

Conference paper
First Online: 22 October 2016

2419 Accesses
2 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 663))

Abstract

Speech emotion recognition is an interesting and challenging subject due to the emotion gap between speech signals and high-level speech emotion. To bridge this gap, this paper present a method of Chinese speech emotion recognition using Deep belief networks (DBN). DBN is used to perform unsupervised feature learning on the extracted low-level acoustic features. Then, Multi-layer Perceptron (MLP) is initialized in terms of the learning results of hidden layer of DBN, and employed for Chinese speech emotion classification. Experimental results on the Chinese Natural Audio-Visual Emotion Database (CHEAVD), show that the presented method obtains a classification accuracy of 32.80 % and macro average precision of 41.54 % on the testing data from the CHEAVD dataset on speech emotion recognition tasks, significantly outperforming the baseline results provided by the organizers in the speech emotion recognition sub-challenges.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Zeng, Z., Pantic, M., Roisman, G., Huang, T.: A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31(1), 39–58 (2009)
Article Google Scholar
Zixing, Z., Coutinho, E., Jun, D., Schuller, B.: Cooperative learning and its application to emotion recognition from speech. IEEE/ACM Trans. Audio Speech Lang. Process. 23(1), 115–126 (2015)
Google Scholar
El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. 44(3), 572–587 (2011)
Article MATH Google Scholar
Anagnostopoulos, C.-N., Iliou, T., Giannoukos, I.: Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011. Artif. Intell. Rev. 43(2), 155–177 (2015)
Article Google Scholar
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MathSciNet MATH Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Article MathSciNet MATH Google Scholar
Li, X., Yang, Y., Pang, Z., Wu, X.: A comparative study on selecting acoustic modeling units in deep neural networks based large vocabulary Chinese speech recognition. Neurocomputing 170, 251–256 (2015)
Article Google Scholar
Sarikaya, R., Hinton, G.E., Deoras, A.: Application of deep belief networks for natural language understanding. IEEE Trans. Audio Speech Lang. Process. 22(4), 778–784 (2014)
Article Google Scholar
Lu, Z., Quo, D., Garakani, A.B., Liu, K., May, A.: A comparison between deep neural nets and Kernel acoustic models for speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, pp. 5070–5074 (2016)
Google Scholar
Stuhlsatz, A., Meyer, C., Eyben, F., ZieIke, T., Meier, G., Schuller, B.: Deep neural networks for acoustic emotion recognition: raising the benchmarks. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, pp. 5688–5691 (2011)
Google Scholar
Li, Y., Tao, J., Schuller, B., Shan, S., Jiang, D., Jia, J.: MEC 2016: the multimodal emotion recognition challenge of CCPR 2016. In: 2016 Chinese Conference on Pattern Recognition (CCPR), Chengdu, China (2016)
Google Scholar
Schuller, B., Steidl, S., Batliner, A., Vinciarelli, A., Scherer, K., Ringeval, F., Chetouani, M., Weninger, F., Eyben, F., Marchi, E., Mortillaro, M., Salamin, H., Polychroniou, A., Valente, F., Kim, S.: The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: INTERSPEECH 2013, Lyon, France (2013)
Google Scholar
Eyben, F., Weninger, F., Gross, F., Schuller, B.: Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM International Conference on Multimedia, New York, USA, pp. 835–838 (2013)
Google Scholar
Weninger, F., Eyben, F., Schuller, B.W., Mortillaro, M., Scherer, K.R.: On the acoustics of emotion in audio: what speech, music, and sound have in common. Front. Emot. Sci. 4(292), 1–12 (2013)
Google Scholar

Download references

Acknowledgments

This work is supported by Zhejiang Provincial Natural Science Foundation of China under Grant No. LY16F020011, and No. LY14F020036, National Natural Science Foundation of China under Grant No. 61203257 and No. 61272261.

Author information

Authors and Affiliations

Institute of Intelligent Information Processing, Taizhou University, Taizhou, China
Shiqing Zhang, Xiaoming Zhao, Yuelong Chuang, Wenping Guo & Ying Chen

Authors

Shiqing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoming Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yuelong Chuang
View author publications
You can also search for this author in PubMed Google Scholar
Wenping Guo
View author publications
You can also search for this author in PubMed Google Scholar
Ying Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shiqing Zhang .

Editor information

Editors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi'an, China
Xuelong Li
Chinese Academy of Sciences, Institute of Computing Technology, Beijing, China
Xilin Chen
Tsinghua University , Beijing, China
Jie Zhou
Nanjing University of Science and Technology, Nanjing, China
Jian Yang
University of Electronic Science and Technology, Chengdu, Sichuan, China
Hong Cheng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, S., Zhao, X., Chuang, Y., Guo, W., Chen, Y. (2016). Feature Learning via Deep Belief Network for Chinese Speech Emotion Recognition. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds) Pattern Recognition. CCPR 2016. Communications in Computer and Information Science, vol 663. Springer, Singapore. https://doi.org/10.1007/978-981-10-3005-5_53

Download citation

DOI: https://doi.org/10.1007/978-981-10-3005-5_53
Published: 22 October 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3004-8
Online ISBN: 978-981-10-3005-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics