Choosing the parameters of the NARMA model implemented with the recurrent perceptron for speech prediction

Cidotã, Marina-Anca

doi:10.1007/s00521-010-0375-7

Choosing the parameters of the NARMA model implemented with the recurrent perceptron for speech prediction

Original Article
Published: 23 June 2010

Volume 19, pages 903–910, (2010)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Marina-Anca Cidotã¹

196 Accesses
Explore all metrics

Abstract

Speech signals have statistically nonstationary properties and cannot be processed properly by means of classical linear parametric models (AR, MA, ARMA). The neural network approach to time series prediction is suitable for learning and recognizing the nonlinear nature of the speech signal. We present a neural implementation of the NARMA model (nonlinear ARMA) and test it on a class of speech signals, spoken by both men and women in different dialects of the English language. The Akaike’s information criterion is proposed for the selection of the parameters of the NARMA model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recurrent Neural Networks Training Using Derivative Free Nonlinear Bayesian Filters

A Comparison of Adaptation Techniques and Recurrent Neural Network Architectures

Overview of Incorporating Nonlinear Functions into Recurrent Neural Network Models

References

Brockwell PJ, Davis RA (1987) Time series: theory and methods. Springer, New York
MATH Google Scholar
Mandic DP, Chambers JA (2001) Recurrent neural networks for prediction –learning algorithms, architectures and stability. John Wiley & Sons
Singhal S, Wu L (1989) Training multilayer perceptrons with the extended Kalman algorithm. Advances in Neural Information Processing Systems: 133–140
Catlin D (1989) Estimation, Control and the Discrete Kalman Filter. Springer Verlag
Welch G, Bishop G (2001) An introduction to the Kalman filter. http://www.cs.unc.edu/~welch/kalman/
Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (eds) Proceedings of 2nd international symposium on information theory. Akademiai Kiado, Budapest, pp 267–281
Hamilton JD (1994) Time Series Analysis. Princeton University Press
Young S, Evermann G (2001) The HTK Book. Cambridge University Engineering Dept

Download references

Acknowledgments

The author would like to thank Professor Monica Dumitrescu and Professor Ion Văduva for their advice and support.

Author information

Authors and Affiliations

University of Bucharest, Bucharest, Romania
Marina-Anca Cidotã

Authors

Marina-Anca Cidotã
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marina-Anca Cidotã.

Appendix

Proof of Proposition 1

We assume that $ \hat{w}\left( {k|k - 1} \right) $ and $ P\left( {k|k - 1} \right) $ have been calculated. From the Theorem 1, it follows that

$$ \begin{aligned} \hat{w}\left( {k + 1|k} \right) = & E\left( {w(k + 1)z(k)^{T} } \right)E\left( {z(k)z(k)^{T} } \right)^{ + } z(k) \\ = & E\left( {\left( {w(k) + q(k)} \right)z(k)^{T} } \right)E\left( {z(k)z(k)^{T} } \right)^{ + } z(k). \\ \end{aligned} $$

(6)

But E(q(k)z(k)^T) = 0 because in the definition of the Kalman filter we assumed that $ E\left( {q(k)w(j)^{T} } \right) = 0\quad for\,j \le k, $ $ E\left( {r(k)q(j)} \right) = 0\quad \forall \,j,\,k $ and $ E\left( {q(j)} \right) = 0,\,\forall \,j. $

It results that

$$ \hat{w}\left( {k + 1|k} \right) = E\left( {\left( {w(k)} \right)z(k)^{T} } \right)E\left( {z(k)z(k)^{T} } \right)^{ + } z(k) = \hat{w}\left( {k|k} \right) $$

(7)

$$ \begin{aligned} P\left( {k + 1|k} \right) = & E\left( {\left( {\hat{w}\left( {k + 1|k} \right) - w\left( {k + 1} \right)} \right)\left( {\hat{w}\left( {k + 1|k} \right) - w\left( {k + 1} \right)} \right)^{T} } \right) \\ = & E\left( {\left( {\hat{w}\left( {k|k} \right) - w\left( k \right) - q\left( k \right)} \right)\left( {\hat{w}\left( {k|k} \right) - w\left( k \right) - q\left( k \right)} \right)^{T} } \right) \\ = & P\left( {k|k} \right) + Q. \\ \end{aligned} $$

(8)

Because $ t\left( k \right) = h\left( {\hat{w}\left( {k|k - 1} \right)} \right) - H\left( k \right)\hat{w}\left( {k|k - 1} \right) $ depends only on z(k−1) = [s(1),…, s(k-1), t(1),…, t(k−1)]^T, we can assume that the best linear minimum variance estimator of w(k) based on [s(1),…, s(k−1), t(1),…t(k)]^T, $ \hat{w}\left( {k|k - 1,\,t(k)} \right) $ can be approximated with $ \hat{w}\left( {k|k - 1} \right) $. Applying the Theorem 2, we obtain

$$ \begin{aligned} \hat{w}\left( {k|k} \right) = & \hat{w}\left( {k|k - 1,\,t(k)} \right) + K\left( k \right)\times\left[ {s\left( k \right) - H\left( k \right)\hat{w}\left( {k|k - 1,\,t(k)} \right) - t\left( k \right)} \right] \\ = & \hat{w}\left( {k|k - 1} \right) + K\left( k \right)\left[ {s\left( k \right) - H\left( k \right)\hat{w}\left( {k|k - 1} \right) - t\left( k \right)} \right], \\ \end{aligned} $$

(9)

$$ P\left( {k|k} \right) = P\left( {k|k - 1} \right) - K\left( k \right)H\left( k \right)P\left( {k|k - 1} \right) $$

(10)

where we denoted $ K\left( k \right) = P\left( {k|k - 1} \right)H\left( k \right)^{T} \left[ {C + H\left( k \right)P\left( {k|k - 1} \right)H\left( k \right)^{T} } \right]^{ + } . $

Because $ C + H\left( k \right)P\left( {k|k - 1} \right)H\left( k \right)^{T} \in \Re $ we have

$$ \left[ {C + H\left( k \right)P\left( {k|k - 1} \right)H\left( k \right)^{T} } \right]^{ + } = \left[ {C + H\left( k \right)P\left( {k|k - 1} \right)H\left( k \right)^{T} } \right]^{ - 1} $$

(11)

Finally, from (6) to (11), the conclusion of the Proposition 1 results. □

Proof of Proposition 2

From Theorem 1, we have

$$ \begin{aligned} \hat{s}\left( {k + 1|k} \right) = & E\left( {s\left( {k + 1} \right)z\left( k \right)^{T} } \right)E\left( {z\left( k \right)z\left( k \right)^{T} } \right)^{ + } z\left( k \right) \\ = & E\left( {\left( {H\left( {k + 1} \right)w\left( {k + 1} \right) + r\left( {k + 1} \right) + t\left( {k + 1} \right)} \right)z\left( k \right)^{T} } \right)E\left( {z\left( k \right)z\left( k \right)^{T} } \right)^{ + } z\left( k \right). \\ \end{aligned} $$

As we assumed that $ E\left( {r(k)q(j)} \right) = 0\quad \forall \,j,\,k $, it follows that

$$ \begin{aligned} \hat{s}\left( {k + 1|k} \right) = & H\left( {k + 1} \right)E\left( {w\left( {k + 1} \right)z(k)^{T} } \right)E\left( {z(k)z(k)^{T} } \right)^{ + } z(k) \\ + & E\left( {t\left( {k + 1} \right)z(k)^{T} } \right)E\left( {z(k)z\left( k \right)^{T} } \right)^{ + } z(k), \\ \end{aligned} $$

and applying Theorem 1 again, we obtain

$$ \hat{s}\left( {k + 1|k} \right) = H\left( {k + 1} \right)\hat{w}\left( {k + 1|k} \right) + \hat{t}\left( {k + 1|k} \right), $$

where we denote by $ \hat{t}\left( {k + 1|k} \right) $ the best linear minimum variance estimator of t(k + 1) based on z(k) = [s(1),…, s(k), t(1),…, t(k)]^T.

It follows that

$$ \hat{s}\left( {k + 1|k} \right) = H\left( {k + 1} \right)\hat{w}\left( {k + 1|k} \right) + t\left( {k + 1} \right), $$

because by definition $ t\left( {k + 1} \right) = h\left( {\hat{w}\left( {k + 1|k} \right)} \right) - H\left( {k + 1} \right)\hat{w}\left( {k + 1|k} \right) $ depends only on z(k) = [s(1),…, s(k), t(1),…, t(k)]^T, so we can assume that $ \hat{t}\left( {k + 1|k} \right) \approx t\left( {k + 1} \right). $

But

$$ \begin{aligned} \hat{s}\left( {k + 1|k} \right) - s\left( {k + 1} \right) = & H\left( {k + 1} \right)\hat{w}\left( {k + 1|k} \right) + t\left( {k + 1} \right) - H\left( {k + 1} \right)w\left( {k + 1} \right) - r\left( {t + 1} \right) - t\left( {k + 1} \right) \\ = & H\left( {k + 1} \right)\left( {\hat{w}\left( {k + 1|k} \right) - w\left( {k + 1} \right)} \right) - r\left( {t + 1} \right), \\ \end{aligned} $$

So

$$ \begin{aligned} E\left( {\left( {\hat{s}\left( {k + 1|\,k} \right) - s\left( {k + 1} \right)} \right)\left( {\hat{s}\left( {k + 1|k} \right) - s\left( {k + 1} \right)} \right)^{T} } \right) \\ = & E\left( {\left( {H\left( {k + 1} \right)\left( {\hat{w}\left( {k + 1|\,k} \right) - w\left( {k + 1} \right)} \right) - r\left( {k + 1} \right)} \right)\left( {H\left( {k + 1} \right)\left( {\hat{w}\left( {k + 1|\,k} \right) - w\left( {k + 1} \right)} \right) - r\left( {k + 1} \right)} \right)^{T} } \right) \\ = & H\left( {k + 1} \right)P\left( {k + 1|k} \right)H\left( {k + 1} \right)^{T} + C, \\ \end{aligned} $$

because we assumed in the filter definition that

$$ E\left( {r(k)w(j)^{T} } \right) = 0\quad \forall \,j,\,k $$

and r ~ N(0, C). □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cidotã, MA. Choosing the parameters of the NARMA model implemented with the recurrent perceptron for speech prediction. Neural Comput & Applic 19, 903–910 (2010). https://doi.org/10.1007/s00521-010-0375-7

Download citation

Received: 08 October 2007
Accepted: 22 April 2010
Published: 23 June 2010
Issue Date: September 2010
DOI: https://doi.org/10.1007/s00521-010-0375-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Choosing the parameters of the NARMA model implemented with the recurrent perceptron for speech prediction

Abstract

Access this article

Similar content being viewed by others

Recurrent Neural Networks Training Using Derivative Free Nonlinear Bayesian Filters

A Comparison of Adaptation Techniques and Recurrent Neural Network Architectures

Overview of Incorporating Nonlinear Functions into Recurrent Neural Network Models

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Proof of Proposition 1

Proof of Proposition 2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Choosing the parameters of the NARMA model implemented with the recurrent perceptron for speech prediction

Abstract

Access this article

Similar content being viewed by others

Recurrent Neural Networks Training Using Derivative Free Nonlinear Bayesian Filters

A Comparison of Adaptation Techniques and Recurrent Neural Network Architectures

Overview of Incorporating Nonlinear Functions into Recurrent Neural Network Models

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Proof of Proposition 1

Proof of Proposition 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation