Abstract
This paper presents a review of different Voice Activity Detection (VAD) techniques that can be easily applied to On-device Isolated digit recognition on a mobile device. Techniques investigated include; Short Time Energy, Linear predictive coding residual (prediction error), Discrete Fourier Transform (DFT) based linear cross correlation and K-means clustering based VAD. The optimum VAD technique was found to be K-means clustering of Prediction error which gives a recognition rate of 86.6 %. This technique will be further used with an LPC based speech recognition algorithm for digit recognition on the mobile device.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Huggins-Daines, D., Kumar, M., Chan, A., Black, A.W., Ravishankar, M., Rudnicky, A.I.: Pocketsphinx: A free, real-time continuous speech recognition system for hand-held devices. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing 2006, ICASSP 2006, vol. 1, pp. I–I. IEEE (2006)
Ali, S.A., Haider, N.G., Pathan, M.K.: A LPC-PEV based VAD for word boundary detection. Int. J. Electr. Comput. Sci. 12(02) (2012)
Cournapeau, D., Kawahara, T., Mase, K., Toriyama, T.: Voice activity detector based on enhanced cumulant of IPC residual and on-line EM algorithm. In: Proceedings of INTERSPEECH06 (2006)
Wu, B., Wang, K.: Voice activity detection based on auto-correlation function using wavelet transform and teager energy operator. Comput. Linguist. Chin. Lang. Process. 11, 87–100 (2006)
Enqing, D., Guizhong, L., Yatong, Z., Yu, C.: Voice activity detection based on short-time energy and noise spectrum adaptation. In: 6th International Conference on Signal Processing, vol. 1, pp. 464–467. IEEE (2002)
Rabiner, L.R., Sambur, M.R.: An algorithm for determining the endpoints of isolated utterances. Bell Syst. Techn. J. 54, 297–315 (1975)
Tashan, T., Allen, T., Nolle, L.: Speaker verification using heterogeneous neural network architecture with linear correlation speech activity detection. Expert Syst. (2013). doi:10.1111/exsy.12030
Huang, H., Lin, F.: A speech feature extraction method using complexity measure for voice activity detection in WGN. Speech Commun. 51, 714–723 (2009)
Ghaemmaghami, H., Baker, B.J., Vogt, R.J., Sridharan, S.: Noise robust voice activity detection using features extracted from the time-domain autocorrelation function. In: Proceedings of Interspeech 2010
Plannerer, B.: An introduction to speech recognition. Munich (2005)
Rabiner, L.R., Schafer, R.W.: Digital processing of speech signals. In: IET (1979)
Kesarkar, M.: Feature extraction for speech recognition. Electronic Systems, Department of Electrical Engineering, IIT Bombay (2003)
Rabiner, L., Juang, B.: Fundamentals of speech recognition (1993)
Zheng, F., Zhang, G., Song, Z.: Comparison of different implementations of MFCC. J. Comput. Sci. Technol. 16, 582–589 (2001)
Waheed, K., Weaver, K., Salam, F.M.: A robust algorithm for detecting speech segments using an entropic contrast. In: 45th Midwest Symposium on Circuits and Systems, 2002, MWSCAS-2002, vol. 3, pp. III-328–III-331. IEEE (2002)
Vaidyanathan, P.: The theory of linear prediction. Synth. Lect. Signal Process. 2, 1–184 (2007)
Jones, D.L., Appadwedula, S., Berry, M., Haun, M., Janovetz, J., Kramer, M., Moussa, D., Sachs, D., Wade, B.: Speech processing: theory of LPC analysis and synthesis (2009)
Hachkar, Z., Mounir, B., Farchi, A., El Abbadi, J.: Comparison of MFCC and PLP parameterization in pattern recognition of Arabic alphabet speech. Can. J. Artif. Intell. Mach. Learn. Pattern Recognit. 2, 56–60 (2011)
Tashan, T.: Biologically inspired speaker verification. Submitted to Nottingham Trent University (2012)
Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19, 297–301 (1965)
CSLU Database: Available at http://www.cslu.ogi.edu/corpora/isolet/
Wilpon, J., Rabiner, L.: A modified K-means clustering algorithm for use in isolated work recognition. Acoust. Speech Signal Process. IEEE Trans. 33, 587–594 (1985)
Looney, C.G.: A fuzzy clustering and fuzzy merging algorithm. CS791q Class notes (1999)
A Tutorial on Clustering Algorithms: Available Online at http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/kmeans.html
Acknowledgments
I will like to thank the Petroleum Technology Development Fund (PTDF) for their continued support and sponsorship of this research. My parents, supervisor, as well as other colleagues who helped in experiments and algorithms.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Mustafa, M.K., Allen, T., Evett, L. (2014). A Review of Voice Activity Detection Techniques for On-Device Isolated Digit Recognition on Mobile Devices. In: Bramer, M., Petridis, M. (eds) Research and Development in Intelligent Systems XXXI. SGAI 2014. Springer, Cham. https://doi.org/10.1007/978-3-319-12069-0_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-12069-0_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12068-3
Online ISBN: 978-3-319-12069-0
eBook Packages: Computer ScienceComputer Science (R0)