Abstract
In this paper we investigated Artificial Neural Networks (ANN) based Automatic Speech Recognition (ASR) by using limited Arabic vocabulary corpora. These limited Arabic vocabulary subsets are digits and vowels carried by specific carrier words. In addition to this, Hidden Markov Model (HMM) based ASR systems are designed and compared to two ANN based systems, namely Multilayer Perceptron (MLP) and recurrent architectures, by using the same corpora. All systems are isolated word speech recognizers. The ANN based recognition system achieved 99.5% correct digit recognition. On the other hand, the HMM based recognition system achieved 98.1% correct digit recognition. With vowels carrier words, the MLP and recurrent ANN based recognition systems achieved 92.13% and 98.06, respectively, correct vowel recognition; but the HMM based recognition system achieved 91.6% correct vowel recognition.
Similar content being viewed by others
References
Abdulah, W., Abdul-Karim, M. (1985). Real-time spoken Arabic recognizer. International Journal of Electronics, 59(5), 645–648.
Alghamdi, M. M. (1998). A spectrographic analysis of Arabic vowels: a cross-dialect study. Journal of King Saud University, 10(Arts1), 3–24.
Alghamdi, M. (2001). Arabic phonetics. Riyadh: Al-Toubah Bookshop (in Arabic).
Alkhouli, M. (1990). Alaswaat Alaghawaiyah. Daar Alfalah: Jordan (in Arabic).
Alotaibi, Y. A. (2003). High performance Arabic digits recognizer using neural networks. In The 2003 international joint conference on neural networks—IJCNN2003, Portland, Oregon.
Deller, J., Proakis, J., & Hansen, J. H. (1993). Discrete-time processing of speech signal. New York: Macmillan Co.
El-Imam, Y. A. (1989). An unrestricted vocabulary Arabic speech synthesis system. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(12), 1829–1845.
Elshafei, M. (1991). Toward an Arabic text-to-speech system. The Arabian Journal for Science and Engineering, 16(4B), 565–583.
Hagos, E. (1985). Implementation of an isolated word recognition system. UMI Dissertation Service.
Haykin, S. (1999). Neural networks: a comprehensive foundation (2nd ed.). New York: Prentice Hall.
Iqbal, H. R., Awais, M. M., Masud, S., & Shamail, S. (2008). On vowels segmentation and identification using formant transitions in continuous recitation of Quranic Arabic. In New challenges in applied intelligence technologies (pp. 155–162) Berlin: Springer.
Juang, B., & Rabiner, L. (1991). Hidden Markov models for speech recognition. Technometrics, 33(3), 251–272.
Kirchhoff, K., Bilmes, J., Das, S., Duta, N., Egan, M., Gang, J., Feng, H., Henderson, J., Daben, L., Noamany, M., Schone, P., Schwartz, R., & Vergyri, D. (2003). Novel approaches to Arabic speech recognition: report from the 2002 Johns-Hopkins Summer Workshop. In Proceedings of ICASSP 2003, April 2003 (Vol. 1, pp. 344–347).
Linguistic Data Consortium (LDC) (2002). Catalog number LDC2002S02, http://www.ldc.upenn.edu/.
Lippmann, R. (1989). Review of neural networks for speech recognition. Neural computation (pp. 1–38). Cambridge: MIT Press.
Loizou, P. C., & Spanias, A. S. (1996). High-performance alphabet recognition. IEEE Transactions on Speech and Audio Processing, 4(6), 430–445.
Newman, D. L., & Verhoeven, J. (2002). Frequency analysis of Arabic vowels in connected speech. Antwerp Papers in Linguistics, 100, 77–87.
Omar, A. (1991). Derasat Alaswat Aloghawi. Egypt: Aalam Alkutob (in Arabic).
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.
Razak, Z., Ibrahim, N. J., Tamil, E. M., Idris, M. Y. I., Yakub, M., & Yusoff, Z. B. M. (2008). Quranic verse recitation feature extraction using mel-frequency cepstral coefficient (MFCC). In Proceedings of the 4th IEEE international colloquium on signal processing and its application (CSPA), 7–9 March 2008, Kuala Lumpur, Malaysia.
Tolba, M. F., Nazmy, T., Abdelhamid, A. A., & Gadallah, M. E. (2005). A novel method for Arabic consonant/vowel segmentation using wavelet transform. International Journal on Intelligent Cooperative Information Systems, IJICIS, 5(1), 353–364.
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., & Woodland, P. (2006). The HTK book (for HTK version. 3.4). Cambridge: Cambridge University Engineering Department. http:///htk.eng.cam.ac.uk/prot-doc/ktkbook.pdf.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Alotaibi, Y.A. Comparing ANN to HMM in implementing limited Arabic vocabulary ASR systems. Int J Speech Technol 15, 25–32 (2012). https://doi.org/10.1007/s10772-011-9107-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-011-9107-3