Abstract
With the rapid development of artificial intelligence, the recognition of speech, text, physiological signals and facial expressions has drawn more and more attention from scholars at home and abroad. Therefore, we cannot just study the problems of one area, but more we look for the similarities across fields. In this paper, the method of image enhancement is adapted according to the speech characteristics, and several feasible methods of speech data enhancement are proposed to avoid the problems of data collection and corpus limitation in speech emotion recognition. Based on the Hybrid neural network (Convolution Neural Network, CNN and Recurrent neural network, RNN) model, the feasibility and performance of the method are verified and showed through several sets of comparative experiments in different methods while a high recognition accuracy is obtained.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Kandali, A.B., Routray, A., Basu, T.K.: Emotion recognition from Assamese speeches using MFCC features and GMM classifier. https://doi.org/10.1109/TENCON.2008.4766487
Chao, L., Tao, J., Yang, M., Li, Y.: Improving generation performance of speech emotion recognition by denoising autoencoders. In: International Symposium on Chinese Spoken Language Processing, pp. 341–344 (2014)
Chen, L., Mao, X., Xue, Y., Cheng, L.L.: Speech emotion recognition: features and classification models. Digit. Signal Process. 22(6), 1154–1160 (2012). https://doi.org/10.1016/j.dsp.2012.05.007
Fei, W., Ye, X., Sun, Z., Huang, Y., Zhang, X., Shang, S.: Research on speech emotion recognition based on deep auto-encoder. In: 2016 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems (CYBER), pp. 308–312. IEEE (2016)
Huang, Z., Dong, M., Mao, Q., Zhan, Y.: Speech emotion recognition using CNN, pp. 801–804 (2014)
Jin, Q., Li, C., Chen, S., Wu, H.: Speech emotion recognition with acoustic and lexical features. https://doi.org/10.1109/ICASSP.2015.7178872
Lim, W., Jang, D., Lee, T.: Speech emotion recognition using convolutional and recurrent neural networks. In: Signal and Information Processing Association Summit and Conference, pp. 1–4 (2017)
Liu, Z.T., Wu, M., Cao, W.H., Mao, J.W., Xu, J.P., Tan, G.Z.: Speech emotion recognition based on feature selection and extreme learning machine decision tree. Neurocomputing 273, 271–280 (2018)
Mao, Q., Dong, M., Huang, Z., Zhan, Y.: Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Trans. Multimed. 16(8), 2203–2213 (2014)
Mariooryad, S., Busso, C.: Compensating for speaker or lexical variabilities in speech for emotion recognition. Speech Commun. 57(1), 1–12 (2014)
Mu, Y., Gómez, L.A.H., Montes, A.C., Martínez, C.A., Wang, X., Gao, H.: Speech emotion recognition using convolutional-recurrent neural networks with attention model. DEStech Transactions on Computer Science and Engineering (CII) (2017)
Trigeorgis, G., et al.: Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2016)
Zhou, Y., Sun, Y., Zhang, J., Yan, Y.: Speech emotion recognition using both spectral and prosodic features. In: International Conference on Information Engineering and Computer Science, pp. 1–4 (2009)
Zhou, Y., Sun, Y., Zhang, J., Yan, Y.: Speech emotion recognition using both spectral and prosodic features. https://doi.org/10.1109/ICIECS.2009.5362730
Acknowledgment
The work is supported by the State Key Program of National Natural Science of China (61432004, 71571058, 61461045). This work was partially supported by the China Postdoctoral Science Foundation funded project (2017T100447). This research has been partially supported by National Natural Science Foundation of China under Grant No. 61472117. This work is also supported by the foundational application research of Qinghai Province Science and Technology Fund (No. 2016-ZJ-743).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Cao, X., Sun, X., Ren, F. (2018). Speech Data Enhancement Based on Hybrid Neural Network. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11166. Springer, Cham. https://doi.org/10.1007/978-3-030-00764-5_33
Download citation
DOI: https://doi.org/10.1007/978-3-030-00764-5_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00763-8
Online ISBN: 978-3-030-00764-5
eBook Packages: Computer ScienceComputer Science (R0)