Abstract
Automatic speech recognition systems are prone to errors when there are confusable words in the dictionary. In this paper, a new approach to the solution of this problem is proposed. The idea is to create a human machine speech interaction language (HUMSIL) with acoustically orthogonal words. In order to minimize pronunciation variations among different nationalities, we selected a common subset of phonemes across world’s major languages and generated a vocabulary set using the algorithm described in this paper. We performed two experiments to compare English, Turkish and HUMSIL in terms of digit recognition performance using microphone recordings from multi-national speakers. We found that in both of the experiments, the proposed vocabulary resulted in a significantly smaller error rate.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Hemphill, C.T., Agarwal, R., Muthusamy, Y.K., Gong, Y.: Voice-Driven Information Access in the Automobile. IEEE Vehicular Technology Society News, August 8-11 (2000)
Arslan, L.M., Hansen, J.H.L.: Likehood Decision Boundary Estimation between HMMPairs in Speech Recognition. IEEE Trans. On Acoust. Speech, and Signal Processing 6(4), 410–414 (1998)
Schubert, K. (ed.): Interlinguistics Aspects of the Science of Planned Languages, Trends in Linguistics. Studies and Monographs, vol. 42, p. 10. Mouton de Gruyter, Berlin (1989)
Mackenzie, I.S., Zang, S.: The immediate usability of Graffiti. In: Proc. of Graphics Interface 1997, pp. 129–137 (1997)
Fromkin, V., Rodman, R.: An Inroduction to Language. Rinehart and Winston, Inc., Orlando (1998)
Deller, J.R., Proakis, J.G., Hansen, J.H.L.: Discrete-Time Processing of Speech Signals. Macmillan Publishing Company, Basingstoke (1993)
IPA, Handbook of the International Phonetic Association, Cambridge University Press (1999)
Maddieson, I.: Patterns of Sounds. Cambridge University Press, Cambridge (1984)
Rabiner, L.R., Schafer, W.: Digital Processing of Speech Signals. Prentice Hall, Englewood Cliffs (1978)
Forgie, J.W., Forgie, C.D.: Results Obtained from a Vowel Recognition Computer Program. The Journal of the Acoustical Soceity of America 31(11), 1480–1489 (1959)
Miller, G.A., Nicely, P.E.: An Analysis of Perceptual Confusions Among Some English Consonants. The Journal of the Acoustical Society of America 27(2), 338–352 (1955)
House, A.S., Williams, C.E., Hecker, M.H.L., Kryter, K.D.: Articulation-Testing Methods: Consonantal Differentiation with a Closed-Response Set. The Journal of the Acoustical Society of America 37(1) (1965)
Odlin, T.: Cross-linguistic Influence in Language Learning. Cambridge University Press, Cambridge (1989)
Roe, D.B., Riley, M.D.: Prediction of Word Confusabilities for Speech Recognition, pp. 227–230. ICSLP, Yokohama (1994)
Arslan, L.M.: A New Universal Language for Speech Recognition Applications. In: IEEE Proc. ICASSP, Istanbul Turkey (2000)
Jurafsky, D., Martin, J.H.: Speech and Language Processing. Prentice Hall, Englewood Cliffs (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Arısoy, E., Arslan, L.M. (2004). A Universal Human Machine Speech Interaction Language for Robust Speech Recognition Applications. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2004. Lecture Notes in Computer Science(), vol 3206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30120-2_33
Download citation
DOI: https://doi.org/10.1007/978-3-540-30120-2_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23049-6
Online ISBN: 978-3-540-30120-2
eBook Packages: Springer Book Archive