Abstract
A phoneme identification system for Arabic language has been developed. It is based on a hybrid approach that incorporates two levels of phoneme identification. In the first layer power spectral information, efficiently condensed through the use of singular value decomposition, is utilized to train separate self-organizing maps for identifying each Arabic phoneme. This is followed by a second layer of identification, based on similarity metric, that compares the standard pitch contours of phonemes with the pitch contours of the input sound. The second layer performs the identification in case the first layer generates multiple classifications of the same input sound. The system has been developed using utterances of twenty-eight Arabic phonemes from over a hundred speakers. The identification accuracy based on the first layer alone was recorded at 71%, which increased to 91% with the addition of the second identification layer. The introduction of singular values for training instead of power spectral densities directly has resulted in reduction of training and recognition times for self-organizing maps by 80% and 89% respectively. The research concludes that power spectral densities along with the pitch information result in an acceptable and robust identification system for the Arabic phonemes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Neagoe, V.E., Ropot, A.D.: Concurrent Self-Organizing Maps for Pattern Classification. In: IEEE International Conference on Cognitive Informatics, pp. 304–312 (2002)
Song, H.H., Lee, S.W.: A Self-Organizing Neural Tree for Large-Set Pattern Classification. IEEE Trans. Neural Network. 9(3), 369–379 (1998)
Giurgiu, M.: Self-Organizing Feature Maps and Acoustic Segmentation Applied for Automatic Speech Recognition. In: Proceedings of International Workshop on Speech and Computer. SPECOM 1996, St. Petersburg (1996)
Kohonen, T.: Physiological Interpretation of the Self-Organizing Map Algorithm. Neural Networks 6, 895–905 (1993)
Somervuo, P.: Self-Organizing Maps for Signal and Symbol Sequences. PhD Thesis Helsinki University of Technology. Neural Networks Research Centre (2000)
Díaz, F., Ferrández, J.M., Gómez, P., Rodellar, V., Nieto, V.: Spoken-Digit Recognition using Self-organizing Maps with Perceptual Pre-processing. In: Cabestany, J., Mira, J., Moreno-Díaz, R. (eds.) IWANN 1997. LNCS, vol. 1240, pp. 1203–1212. Springer, Heidelberg (1997)
Yuk, D.S., Flanagan, J.: Telephone Speech Recognition using Neural Networks and Hidden Markov Models. In: Proceedings of International Conference on Acoustic Speech and Signal Processing. ICASSP 1999, pp. 157–160 (1999)
Renals, S., Morgan, N.: Connectionist probability estimation in HMM speech recognition. Tech. Rep. TR-92-081. International Computer Science Institute. Berkeley CA. USA (1992)
Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice Hall Inc., Englewood Cliffs (1993)
Wong, Y.W., Chang, E.: The Effect of Pitch and Lexical Tone on Different Mandarin Speech Recognition Tasks. In: Proceedings of Eurospeech 2001, Aalborg. Denmark, vol. 4, pp. 2741–2744 (2001)
Zue, V.W., Lamel, L.F.: An Expert Spectrogram Reader: A Knowledge-Based Approach to Speech Recognition. In: Proceedings of IEEE International Conference on Acoustic Speech Signal Processing, Tokyo, Japan, pp. 1197–1200 (1986)
Kirchhoff, K.: Novel Speech Recognition Models for Arabic. Johns-Hopkins University. Technical Report (2002)
Kitaoka, N., Yamada, D., Nakagawa, S.: Speaker Independent speech recognition using features based on glottal sound source. In: Proceedings of International Conference on Spoken Language Processing. ICSLP 2002, pp. 2125–2128 (2002)
Praat web page: http://www.praat.org
Kohonen, T.: New Developments and Applications of Self-Organizing Maps. In: Proceedings of the 1996 International Workshop on Neural Networks for Identification, Control, Robotics, and Signal/Image Processing. NICROSP 1996, pp. 164–172 (1996)
Samouelian, A.: Knowledge Based Approach to Consonant Recognition. In: Proceedings of International Conference on Acoustic Speech and Signal Processing ICASSP 1994, pp. 77–80 (1994)
Wooters, C.C., Stolcke, A.: Multiple-pronunciation Lexical Modeling in a Speakerindependent Speech Understanding System. In: Proceedings of Intl. Conf. on Spoken Language Processing, ICSLP 1994, pp. 453–456 (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Awais, M.M., Masud, S., Shamail, S., Akhtar, J. (2004). A Hybrid Multi-layered Speaker Independent Arabic Phoneme Identification System. In: Yang, Z.R., Yin, H., Everson, R.M. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2004. IDEAL 2004. Lecture Notes in Computer Science, vol 3177. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28651-6_61
Download citation
DOI: https://doi.org/10.1007/978-3-540-28651-6_61
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22881-3
Online ISBN: 978-3-540-28651-6
eBook Packages: Springer Book Archive