Abstract
This article considers the algorithm “Voice activity detection” and the using VAD algorithm in the system of Kazakh speech recognition. The paper presents a mathematical model VAD and methods for detecting voice data: pauses between sentences, words, individual sounds. VAD algorithm is adapted to the recognition of Kazakh speech counting the basic properties of Kazakh language. Voice activity detection researches in Kazakh speech are being conducted for the first time. The results of the spectral analysis are displayed on the picture.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Dorokhin, O. A., & Starushko, D. G. (2003). Speech signal segmentation. Artificial Intellect, 3, 450–478.
Shelepov, V. J., & Nitsenko, A. V. (2003). Amplitude segmentation of speech signal using filtration and known phonetic composition. Artificial Intellect, 6, 120–123.
Lamel, L. F., Rabiner, L. R., Rosenberg, A. E., & Wilpon, J. G. (1981). An improved endpoint detector for isolated word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, Assp-29(4), 777–785.
Rabiner, L. R., & Juang, B.-H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall.
Tucker, R. (1992). Voice activity detection using a periodicity measure. IEE Proceedings Communications Speech and Vision, 139(4), 377–380.
Nemer, E., Goubran, R., & Mahmoud, S. (2001). Robust voice activity detection using higher-order statistics in the LPC residual domain. IEEE Transactions on Speech and Audio Processing, 9(3), 217–231.
Deller, J. R., Hansen, H. L., & Proakis, J. G. (2008). Discrete-time processing of speech signals. New York: Wiley.
Nilsson, M., & Ejnarsson, M. (2002). Speech recognition using hidden Markov model. Department of Telecommunications and Speech Processing. Blekinge Institute of Technology, Blekinge.
Aida-Zade, K. R., Ardil, C., & Rustamov, S. S. (2006). Investigation of combined use of MFCC and LPC features in speech recognition systems. In Proc. of world academy of science, engineering and technology 13 (pp. 275–276).
Rabiner, L. R., & Sambur, M. R. (1975). An algorithm for determining the endpoints of isolated utterances. The Bell System Technical Journal, 54(3), 298–315.
Rabiner, L. R., & Schafer, R. V. (1978). Digital processing of speech signals. Englewood Cliffs: Prentice-Hall. ISBN-13: 9780132136037.
Rabiner, L. R., & Schafer, R. V. (1981). Digital processing of speech signals. Radio and Communication (pp. 495–515).
Atal, B., & Rabiner, L. R. (1984). A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-24(197), 201–212.
Reddy, D. R. (1967). Computer recognition of connected speech. The Journal of the Acoustical Society of America, 42(2), 329–347.
Schafer, R. W., & Rabiner, L. R. (1970). System for automatic formant analysis of voiced speech. The Journal of the Acoustical Society of America, 47(2), 634–648.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kalimoldayev, M.N., Alimhan, K. & Mamyrbayev, O.J. Methods for applying VAD in Kazakh speech recognition systems. Int J Speech Technol 17, 199–204 (2014). https://doi.org/10.1007/s10772-013-9220-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-013-9220-6