Skip to main content

Feature Learning via Deep Belief Network for Chinese Speech Emotion Recognition

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 663))

Abstract

Speech emotion recognition is an interesting and challenging subject due to the emotion gap between speech signals and high-level speech emotion. To bridge this gap, this paper present a method of Chinese speech emotion recognition using Deep belief networks (DBN). DBN is used to perform unsupervised feature learning on the extracted low-level acoustic features. Then, Multi-layer Perceptron (MLP) is initialized in terms of the learning results of hidden layer of DBN, and employed for Chinese speech emotion classification. Experimental results on the Chinese Natural Audio-Visual Emotion Database (CHEAVD), show that the presented method obtains a classification accuracy of 32.80 % and macro average precision of 41.54 % on the testing data from the CHEAVD dataset on speech emotion recognition tasks, significantly outperforming the baseline results provided by the organizers in the speech emotion recognition sub-challenges.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Zeng, Z., Pantic, M., Roisman, G., Huang, T.: A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31(1), 39–58 (2009)

    Article  Google Scholar 

  2. Zixing, Z., Coutinho, E., Jun, D., Schuller, B.: Cooperative learning and its application to emotion recognition from speech. IEEE/ACM Trans. Audio Speech Lang. Process. 23(1), 115–126 (2015)

    Google Scholar 

  3. El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. 44(3), 572–587 (2011)

    Article  MATH  Google Scholar 

  4. Anagnostopoulos, C.-N., Iliou, T., Giannoukos, I.: Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011. Artif. Intell. Rev. 43(2), 155–177 (2015)

    Article  Google Scholar 

  5. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  6. Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  7. Li, X., Yang, Y., Pang, Z., Wu, X.: A comparative study on selecting acoustic modeling units in deep neural networks based large vocabulary Chinese speech recognition. Neurocomputing 170, 251–256 (2015)

    Article  Google Scholar 

  8. Sarikaya, R., Hinton, G.E., Deoras, A.: Application of deep belief networks for natural language understanding. IEEE Trans. Audio Speech Lang. Process. 22(4), 778–784 (2014)

    Article  Google Scholar 

  9. Lu, Z., Quo, D., Garakani, A.B., Liu, K., May, A.: A comparison between deep neural nets and Kernel acoustic models for speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, pp. 5070–5074 (2016)

    Google Scholar 

  10. Stuhlsatz, A., Meyer, C., Eyben, F., ZieIke, T., Meier, G., Schuller, B.: Deep neural networks for acoustic emotion recognition: raising the benchmarks. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, pp. 5688–5691 (2011)

    Google Scholar 

  11. Li, Y., Tao, J., Schuller, B., Shan, S., Jiang, D., Jia, J.: MEC 2016: the multimodal emotion recognition challenge of CCPR 2016. In: 2016 Chinese Conference on Pattern Recognition (CCPR), Chengdu, China (2016)

    Google Scholar 

  12. Schuller, B., Steidl, S., Batliner, A., Vinciarelli, A., Scherer, K., Ringeval, F., Chetouani, M., Weninger, F., Eyben, F., Marchi, E., Mortillaro, M., Salamin, H., Polychroniou, A., Valente, F., Kim, S.: The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: INTERSPEECH 2013, Lyon, France (2013)

    Google Scholar 

  13. Eyben, F., Weninger, F., Gross, F., Schuller, B.: Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM International Conference on Multimedia, New York, USA, pp. 835–838 (2013)

    Google Scholar 

  14. Weninger, F., Eyben, F., Schuller, B.W., Mortillaro, M., Scherer, K.R.: On the acoustics of emotion in audio: what speech, music, and sound have in common. Front. Emot. Sci. 4(292), 1–12 (2013)

    Google Scholar 

Download references

Acknowledgments

This work is supported by Zhejiang Provincial Natural Science Foundation of China under Grant No. LY16F020011, and No. LY14F020036, National Natural Science Foundation of China under Grant No. 61203257 and No. 61272261.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shiqing Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Zhang, S., Zhao, X., Chuang, Y., Guo, W., Chen, Y. (2016). Feature Learning via Deep Belief Network for Chinese Speech Emotion Recognition. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds) Pattern Recognition. CCPR 2016. Communications in Computer and Information Science, vol 663. Springer, Singapore. https://doi.org/10.1007/978-981-10-3005-5_53

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-3005-5_53

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-3004-8

  • Online ISBN: 978-981-10-3005-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics