Abstract
A novel method for Chinese speech time series prediction model is proposed. In order to reconstruct the phase space of Chinese speech signal, the delay time and embedding dimension are calculated by C–C method and false nearest neighbor algorithm. The maximum lyapunov exponent and correlation dimension of Chinese speech phoneme are calculated by wolf algorithm and genetic programming algorithm. The numerical results show that there exists nonlinear characteristics in Chinese speech signal. Based on the analysis method of RBF neural network and the nonlinear characteristic parameters such as the delay time and embedding dimension, a nonlinear prediction model is designed. In order to further verify the prediction performance of the designed prediction model, waveform comparison and four evaluation indexes are used. It is shown that compared with the linear prediction model and back propagation neural network nonlinear prediction model, prediction error of the RBF neural network nonlinear prediction model is significantly reduced, and the model has higher prediction accuracy and prediction performance.
Similar content being viewed by others
References
Narayanan SS, Alwan AA (1995) A nonlinear dynamic system analysis of fricative consonants. J Acoust Soc Am 97(4):2511–2524
Kumar K, Mullick SK (1996) Nonlinear dynamical analysis of speech. J Acoust Soc Am 100(1):615–629
Jiang JJ, Zhang Y, Fors CN (2003) Nonlinear dynamics of phonations in excised larynx experiments. J Acoust Soc Am 114(4):2198–2205
Tuller B, Nguyen N, Lancia L, Vallabha GK (2011) Nonlinear Dynamics in Speech Perception. Nonlinear Dynamics in Human Behavior 328:135–150
Dahmani M, Anber A, Dahmani Z (2019) Speech movements on vocal tract: Fractional nonlinear dynamics. J Inf Optim Sci 40(6):1307–1315
Chaitra N, Mohan DM, Dutt DN (2013) Nonlinear synamical snalysis of speech signals. Proceedings of international conference on VLSI, Communication, Advanced Devices, Signals & Systems and Networking 258:343–351
Hu S, Zhang Y, Hua Y (2000) Nonlinear dynamic characteristic analysis of speech for Chinese. Acta Acustica 25(4):329–334
Sun Y, Yao H, Zhang X (2015) Feature extraction of emotional speech based on chaotic characteristics. J Tianjin Univ 48(8):681–685
Asoke KD (2018) Nonlinearity in speech signal. Time Domain Representation of Speech Sounds, pp 131–154 .
Hanilçi C (2018) Linear prediction residual features for automatic speaker verification anti-spoofing[J]. Multimedia Tools and Applications 77(13):160
Wang F, Sahli H, Gao J, Jiang D, Verhelst W (2015) Relevance units machine based dimensional and continuous speech emotion prediction. Multimedia Tools and Applications 74(22):9983–10000
Hermassi H, Hamdi M, Rhouma R, Belghith SM (2017) A joint encryption-compression codec for speech signals using the ITU-T G711 standard and chaotic map. Multimedia Tools and Applications 76(1):1177–1200
Handa A, Agarwal R, Kohli N (2020) A multimodel keyword spotting system based on lip movement and speech features. Multimedia Tools and Applications (prepublish).
Thyssen J, Nielsen H, Hansen SD (1994) Non-linear short-term prediction in speech coding. IEEE, Proc. ICASSP94, pp 185–188 .
Lin T, Horne BG, Tiňo P, Giles CL (1996) Learning long-term dependencies in NARX recurrent neural networks. IEEE Trans Neural Networks 7(6):1329–1338
Al-Jumeily D, Hussain AJ, Fergus P, Radi N (2015) Self-organized neural network inspired by the immune algorithm for the prediction of speech signals. Lect Notes Comput Sci 9226(1):654–664
Lin J, Liu Y (2001) Training methods and the performances of RBF neural networks for nonlinear modeling of speech signals. Signal Process 17(4):322–328
Qin A, Huang Z, Gui W (2008) Nonlinear speech predictor using models for chaotic systems. Comput Eng Appl 44(18):141–143
Takens F (1980) Detecting strange attractors in turbulences. Springer Verlag, Berlin New York, pp 366–381
Cao L (1997) Practical method for determining the minimum embedding dimension of a scalar time series. Physica Section D: Nonlinear Phenomena 110(1–2):43–50
Xie X, Zhang W, Yang Z (2002) A dissipative particle swarm optimization, in: Congress on Evolutionary Computation. Proceedings of the 2002 congress on evolutionary computation, 1456–1461
Lin J, Wang Y, Huang Z, Sheng Z (1999) Selection of proper time-delay in phase space reconstruction of speech signals. Signal Process 15(3):220–225
Kennel MB, Brown R, Abarbanel HD (1992) Determining embedding dimension for phase-space reconstruction using a geometrical construction. Physical Review A, Atomic, Molecular, And Optical Physics 45(6):3403–3411
Buzug T, Pfister G (1992) Comparison of algorithms calculating optimal embedding parameters for delay time coordinate. Physica Section D: Nonlinear Phenomena 58(1–4):127–137
Kugiumtzis D (1996) State space reconstruction parameters in the analysis of chaotic time series-the role of the time window length. Physica Section D: Nonlinear Phenomena 95(1):13–28
Kim HS, Eykholt R, Salas JD (1999) Nonlinear dynamics, delay times, and embedding windows. Physica Section D: Nonlinear Phenomena 127(1–2):48–60
Wolf A, Swift JB, Swinney HL, Vastanoa JA (1985) Determining Lyapunov exponents from a time series. Physica Section D: Nonlinear Phenomena 16(3):285–317
Barna G, Tsuda I (1993) A new method for computing Lyapunov exponents. Phys Lett A 175(6):421–427
Wang Y, Lin J, Huang P, Sheng Z (2000) Nonlinear analysis and processing of speech signals. Communications Technology 1(108):61–65
Grassberger P, Procaccia I (1983) Measuring the Strangeness of strange Attractors. Physica Section D: Nonlinear Phenomena 9(1):189–208
Hou L (2005) Speaker recognition based on nonlinear dynamics and information fusion. PhD thesis, College of Communication and Information Engineering. Shanghai University, Shanghai
Kokkinos I, Maragos P (2005) Nonlinear speech analysis using models for chaotic systems. IEEE Transaction on Speech and Audio Processing 13(6):1098–1109
Lei Y, Jun Z, Xiao W, Yu Z, Jing L (2016) A chaotic time series prediction model for speech signal encoding based on genetic programming. Appl Soft Comput 38:754–761
Acknowledgements
This work reported in this paper was supported by the National Natural Science Foundation of China (NSFC) under Grant 11847163, in part by the Gansu education department project under Grant 2021B-27 and the Qingyang science and technology planning project under Grant QY2021A-G004. The author thanks the referees for their valuable suggestions and comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gao, X. A nonlinear prediction model for Chinese speech signal based on RBF neural network. Multimed Tools Appl 81, 5033–5049 (2022). https://doi.org/10.1007/s11042-021-11612-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11612-6