Long short-term memory recurrent neural network architectures for Urdu acoustic modeling

Zia, Tehseen; Zahid, Usman

doi:10.1007/s10772-018-09573-7

Long short-term memory recurrent neural network architectures for Urdu acoustic modeling

Published: 08 November 2018

Volume 22, pages 21–30, (2019)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

1496 Accesses
93 Citations
Explore all metrics

Abstract

Recurrent neural networks (RNNs) have achieved remarkable improvements in acoustic modeling recently. However, the potential of RNNs have not been utilized for modeling Urdu acoustics. The connectionist temporal classification and attention based RNNs are suffered due to the unavailability of lexicon and computational cost of training, respectively. Therefore, we explored contemporary long short-term memory and gated recurrent neural networks Urdu acoustic modeling. The efficacies of plain, deep, bidirectional and deep-directional network architectures are evaluated empirically. Results indicate that deep-directional has an advantage over the other architectures. A word error rate of 20% was achieved on a hundred words dataset of twenty speakers. It shows 15% improvement over the baseline single-layer LSTMs. It has been observed that two-layer architectures can improve performance over single-layer, however the performance is degraded with further layers. LSTM architectures were compared with gated recurrent unit (GRU) based architectures and it was found that LSTM has an advantage over GRU.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

The sequence processing using neural networks is usually performed by operating over a context window at the first layer. We have not considered context window in this section for notational convenience.
Biases are omitted throughout the paper for simplicity.
“Center for Language Engineering” [Online]. Available: http://www.cle.org.pk.
“Python_speech_features toolkit” [Online]. Available: https://python-speech-features.readthedocs.io/en/latest/.
“Colaboratory”, Available: https://colab.research.google.com/notebooks/welcome.ipynb#recent=true.

References

Ahad, A., Fayyaz, A., & Mehmood, T. (2002). Speech recognition using multilayer perceptron. In Proceedings of IEEE students conference (Vol. 1, pp 103–109).
Ali, H., Ahmad, N., & Hafeez, A. (2016). Urdu speech corpus and preliminary results on speech recognition. In International conference on engineering applications of neural networks (pp 317–325). New York: Springer.
Amodei, D., Ananthanarayanan, S., Anubhai, R., Bai, J., Battenberg, E., Case, C., Casper, J., Catanzaro, B., Cheng, Q., Chen, G., & Chen, J. (2016). Deep speech 2: End-to-end speech recognition in English and Mandarin. In International conference on machine Learning (pp 173–182).
Ashraf, J., Iqbal, N., Khattak, N. S., & Zaidi, A. M. (2010). Speaker independent Urdu speech recognition using HMM. In 7th IEEE international conference on informatics and systems (INFOS) (pp 1–5).
Azam, S. M., Mansoor, Z. A., Mughal, M. S., & Mohsin, S. (2007). Urdu spoken digits recognition using classified MFCC and backpropgation neural network. In IEEE conference on computer graphics, imaging and visualisation (pp 414–418).
Bahdanau, D., Chorowski, J., Serdyuk, D., Brakel, P., & Bengio, Y. (2016). End-to-end attention-based large vocabulary speech recognition. In IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4945–4949). IEEE.
Chan, W., Jaitly, N., Le, Q., & Vinyals, O. (2016). Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4960–4964). IEEE.
Chan, W., & Lane, I. (2015), Deep recurrent neural networks for acoustic modelling. arXiv Preprint arXiv:1504.01482.
Chiu, C. C., Sainath, T. N., Wu, Y., Prabhavalkar, R., Nguyen, P., Chen, Z., & Jaitly, N. (2017). State-of-the-art speech recognition with sequence-to-sequence models. arXiv Preprint arXiv:1712.01769.
Chollet, F. (2015). Keras.
Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv Preprint arXiv:1412.3555.
Graves, A., & Jaitly, N. (2014). Towards end-to-end speech recognition with recurrent neural networks. In Proceedings of the 31st international conference on machine learning (ICML-14) (pp 1764–1772).
Graves, A., Mohamed, A. R., & Hinton, G. (2013a). Speech recognition with deep recurrent neural networks. In IEEE international conference on acoustics, speech and signal processing (pp 6645–6649).
Graves, A., Jaitly, N., & Mohamed, A. R. (2013b). Hybrid speech recognition with deep bidirectional LSTM. In IEEE workshop on automatic speech recognition and understanding (ASRU), pp 273–278.
Graves, A., & Schmidhuber, J. (2009). Offline handwriting recognition with multidimensional recurrent neural networks. In Advances in neural information processing systems (pp 545–552).
Greff, K., Srivastava, R. K., Koutník, J., Steunebrink, B. R., & Schmidhuber, J. (2016). LSTM: A search space odyssey. In IEEE transactions on neural networks and learning systems.
Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coates, A., & Ng, A. Y. (2014). Deep speech: Scaling up end-to-end speech recognition. arXiv Preprint arXiv:1412.5567.
Hasnain, S. K., & Awan, M. S. (2008). Recognizing spoken Urdu numbers using Fourier descriptor and neural networks with Matlab. In Second international IEEE conference on electrical engineering (pp 1–6).
Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97.
Article Google Scholar
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Article Google Scholar
Juang, B. H., & Rabiner, L. R. (1991). Hidden Markov models for speech recognition. Technometrics, 33(3), 251–272.
Article MathSciNet MATH Google Scholar
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp 1097–1105).
Lipton, Z. C., Berkowitz, J., & Elkan, C. (2015). A critical review of recurrent neural networks for sequence learning. arXiv Preprint arXiv:1506.00019.
Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., & Khudanpur, S. (2010). Recurrent neural network based language model. Interspeech, 2, 3.
Google Scholar
Pascanu, R., Gulcehre, C., Cho, K., & Bengio, Y. (2013). How to construct deep recurrent neural networks. arXiv Preprint arXiv:1312.6026.
Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of training recurrent neural networks. In International conference on machine learning (pp 1310–1318).
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.
Article Google Scholar
Rao, K., & Sak, H. (2017). Multi-accent speech recognition with hierarchical grapheme based models. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), (pp. 4815–4819). IEEE.
Sak, H., Senior, A., & Beaufays, F. (2014). Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In Fifteenth Annual Conference of the International Speech Communication Association.
Sak, H., Senior, A., Rao, K., & Beaufays, F. (2015). Fast and accurate recurrent neural network acoustic models for speech recognition. arXiv Preprint arXiv:1507.06947.
Sarfraz, H., Hussain, S., Bokhari, R., Raza, A. A., Ullah, I., Sarfraz, Z., Pervez, S., Mustafa, A., Javed, I., & Parveen, R. (2010). Large vocabulary continuous speech recognition for Urdu. In Proceedings of the 8th ACM international conference on frontiers of information technology (p 1).
Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), 2673–2681.
Article Google Scholar
Sutskever, I., Vinyals, O., & Le, Q. V. (2014), Sequence to sequence learning with neural networks. In Advances in neural information processing systems (pp 3104–3112).
Williams, R. J., & Peng, J. (1990). An efficient gradient-based algorithm for on-line training of recurrent network trajectories. Neural computation, 2(4), 490–501.
Article Google Scholar
Yu, D., & Li, J. (2017). Recent progresses in deep learning based acoustic models. IEEE/CAA Journal of Automatica Sinica, 4(3), 396–409.
Article Google Scholar
Zweig, G., Yu, C., Stolcke, D. J., A. (2017). Advances in all-neural speech recognition. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4805–4809). IEEE.

Download references

Author information

Authors and Affiliations

COMSATS University Islamabad, Islamabad, Pakistan
Tehseen Zia & Usman Zahid

Authors

Tehseen Zia
View author publications
You can also search for this author in PubMed Google Scholar
Usman Zahid
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tehseen Zia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zia, T., Zahid, U. Long short-term memory recurrent neural network architectures for Urdu acoustic modeling. Int J Speech Technol 22, 21–30 (2019). https://doi.org/10.1007/s10772-018-09573-7

Download citation

Received: 21 July 2018
Accepted: 29 October 2018
Published: 08 November 2018
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s10772-018-09573-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Long short-term memory recurrent neural network architectures for Urdu acoustic modeling

Abstract

Access this article

Similar content being viewed by others

Development and Application of Artificial Neural Network

A review on the long short-term memory model

Deep learning for time series classification: a review

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Long short-term memory recurrent neural network architectures for Urdu acoustic modeling

Abstract

Access this article

Similar content being viewed by others

Development and Application of Artificial Neural Network

A review on the long short-term memory model

Deep learning for time series classification: a review

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation