Skip to main content

A Review of Voice Activity Detection Techniques for On-Device Isolated Digit Recognition on Mobile Devices

  • Conference paper
  • First Online:

Abstract

This paper presents a review of different Voice Activity Detection (VAD) techniques that can be easily applied to On-device Isolated digit recognition on a mobile device. Techniques investigated include; Short Time Energy, Linear predictive coding residual (prediction error), Discrete Fourier Transform (DFT) based linear cross correlation and K-means clustering based VAD. The optimum VAD technique was found to be K-means clustering of Prediction error which gives a recognition rate of 86.6 %. This technique will be further used with an LPC based speech recognition algorithm for digit recognition on the mobile device.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Huggins-Daines, D., Kumar, M., Chan, A., Black, A.W., Ravishankar, M., Rudnicky, A.I.: Pocketsphinx: A free, real-time continuous speech recognition system for hand-held devices. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing 2006, ICASSP 2006, vol. 1, pp. I–I. IEEE (2006)

    Google Scholar 

  2. Ali, S.A., Haider, N.G., Pathan, M.K.: A LPC-PEV based VAD for word boundary detection. Int. J. Electr. Comput. Sci. 12(02) (2012)

    Google Scholar 

  3. Cournapeau, D., Kawahara, T., Mase, K., Toriyama, T.: Voice activity detector based on enhanced cumulant of IPC residual and on-line EM algorithm. In: Proceedings of INTERSPEECH06 (2006)

    Google Scholar 

  4. Wu, B., Wang, K.: Voice activity detection based on auto-correlation function using wavelet transform and teager energy operator. Comput. Linguist. Chin. Lang. Process. 11, 87–100 (2006)

    Google Scholar 

  5. Enqing, D., Guizhong, L., Yatong, Z., Yu, C.: Voice activity detection based on short-time energy and noise spectrum adaptation. In: 6th International Conference on Signal Processing, vol. 1, pp. 464–467. IEEE (2002)

    Google Scholar 

  6. Rabiner, L.R., Sambur, M.R.: An algorithm for determining the endpoints of isolated utterances. Bell Syst. Techn. J. 54, 297–315 (1975)

    Google Scholar 

  7. Tashan, T., Allen, T., Nolle, L.: Speaker verification using heterogeneous neural network architecture with linear correlation speech activity detection. Expert Syst. (2013). doi:10.1111/exsy.12030

  8. Huang, H., Lin, F.: A speech feature extraction method using complexity measure for voice activity detection in WGN. Speech Commun. 51, 714–723 (2009)

    Google Scholar 

  9. Ghaemmaghami, H., Baker, B.J., Vogt, R.J., Sridharan, S.: Noise robust voice activity detection using features extracted from the time-domain autocorrelation function. In: Proceedings of Interspeech 2010

    Google Scholar 

  10. Plannerer, B.: An introduction to speech recognition. Munich (2005)

    Google Scholar 

  11. Rabiner, L.R., Schafer, R.W.: Digital processing of speech signals. In: IET (1979)

    Google Scholar 

  12. Kesarkar, M.: Feature extraction for speech recognition. Electronic Systems, Department of Electrical Engineering, IIT Bombay (2003)

    Google Scholar 

  13. Rabiner, L., Juang, B.: Fundamentals of speech recognition (1993)

    Google Scholar 

  14. Zheng, F., Zhang, G., Song, Z.: Comparison of different implementations of MFCC. J. Comput. Sci. Technol. 16, 582–589 (2001)

    Google Scholar 

  15. Waheed, K., Weaver, K., Salam, F.M.: A robust algorithm for detecting speech segments using an entropic contrast. In: 45th Midwest Symposium on Circuits and Systems, 2002, MWSCAS-2002, vol. 3, pp. III-328–III-331. IEEE (2002)

    Google Scholar 

  16. Vaidyanathan, P.: The theory of linear prediction. Synth. Lect. Signal Process. 2, 1–184 (2007)

    Google Scholar 

  17. Jones, D.L., Appadwedula, S., Berry, M., Haun, M., Janovetz, J., Kramer, M., Moussa, D., Sachs, D., Wade, B.: Speech processing: theory of LPC analysis and synthesis (2009)

    Google Scholar 

  18. Hachkar, Z., Mounir, B., Farchi, A., El Abbadi, J.: Comparison of MFCC and PLP parameterization in pattern recognition of Arabic alphabet speech. Can. J. Artif. Intell. Mach. Learn. Pattern Recognit. 2, 56–60 (2011)

    Google Scholar 

  19. Tashan, T.: Biologically inspired speaker verification. Submitted to Nottingham Trent University (2012)

    Google Scholar 

  20. Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19, 297–301 (1965)

    Google Scholar 

  21. CSLU Database: Available at http://www.cslu.ogi.edu/corpora/isolet/

  22. Wilpon, J., Rabiner, L.: A modified K-means clustering algorithm for use in isolated work recognition. Acoust. Speech Signal Process. IEEE Trans. 33, 587–594 (1985)

    Google Scholar 

  23. Looney, C.G.: A fuzzy clustering and fuzzy merging algorithm. CS791q Class notes (1999)

    Google Scholar 

  24. A Tutorial on Clustering Algorithms: Available Online at http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/kmeans.html

Download references

Acknowledgments

I will like to thank the Petroleum Technology Development Fund (PTDF) for their continued support and sponsorship of this research. My parents, supervisor, as well as other colleagues who helped in experiments and algorithms.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. K. Mustafa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Mustafa, M.K., Allen, T., Evett, L. (2014). A Review of Voice Activity Detection Techniques for On-Device Isolated Digit Recognition on Mobile Devices. In: Bramer, M., Petridis, M. (eds) Research and Development in Intelligent Systems XXXI. SGAI 2014. Springer, Cham. https://doi.org/10.1007/978-3-319-12069-0_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12069-0_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12068-3

  • Online ISBN: 978-3-319-12069-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics