Skip to main content
Log in

Wavelet energy based voice activity detection and adaptive thresholding for efficient speech coding

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

During the last five decades, extensive researches have been carried out in the field of speech compression, which has resulted in various techniques for speech coding. Researchers have been in full swing for more efficient speech coding and their effort is still continuing in different parts of the world. In this paper we are proposing an alternative method for better speech coding. In the proposed technique we use discrete wavelet transform to decompose the signal and wavelet energy is used to differentiate between active voice region and silence region in the speech signal. Depending upon the region’s status the system, different thresholding strategies have been chosen which leads to a better compression without any loss of speech intelligibility. The proposed method is evaluated in terms of qualitative and quantitative parameters. In this paper we also propose an alternative parameter for MOS values which is here after known as System Recognition Rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Achuthan, A., Rajeswari, M., Ramachandram, D., Aziz, M.E., & Shuaib, I.L. (2010). Wavelet energy-guided level set-based active contour: A segmentation method to segment highly similar regions. Computers in Biology and Medicine, 407, 608–620.

  • Amaar, A., Saad, E.M., Ashour, I., & Elzorkany, M. (2011). Image compression using hybrid vector quantization with DC., In International conference on graphic and image processing, Cairo.

  • Chacko, B. P. (2011). Intelligent Character Recognition: A study and anlysis of extreme learning machine and support vector machine using divison point and wavelet feature. Kannur: Depatrment of Information Technology, Kannur University.

    Google Scholar 

  • Chacko, B.P., Vimal Krishnan, V.R., Raju, G., & Anto, P.B. (2012). Handwritten character recognition using wavelet energy and extreme learning machine. International Journal of Machine Learning and Cybernetics, 32, 149–161.

  • Daubechies, I. (1992). Ten lectures on wavelets. Philadelphia: SIAM.

    Book  MATH  Google Scholar 

  • Feher, K. (2001). Wirless digital communication, modulation & spread spectrum applications. New Delhi: Prentice Hall of India.

    Google Scholar 

  • Haykin, S. (2001). Communication systems. New York: Wiley.

    Google Scholar 

  • Holmes, J. N. (1988). Speech synthesis and recognition. London: Chapman & Hall.

    Google Scholar 

  • Hubbard, B. B. (2003). The world according to wavelets: The story of a mathematical technique in the making (2nd ed.). Ahmedabad: Universities Press.

    MATH  Google Scholar 

  • Joseph, S.M., & Anto, P.B. (2011). The optimal wavelet for speech compression. In Advances in computing and communications (pp. 406–414). Berlin: Springer.

  • Karam, J. (2006). Various speech processing techniques for multimedia applications. Kuwait: Gulf University for Sciences and Technology (GUST).

    Google Scholar 

  • Karam, J. (2010). A comprenhensive approach for speech related multimedia applications. WSEAS Transactions on Signal Processing, 6(1), 12–21.

    Google Scholar 

  • Kondoz, X. X. X. (2004). Digital speech coding for low bit rate communication systems (2nd ed.). New York: Wiley.

    Book  Google Scholar 

  • Lin, B., Nguyen, B., & Olsen, E. T. (1995). Orthogonal wavelets and signal processing, signal processing methods for audio images and telecommunications. London: Academic Press.

  • Litwin, L.R. (1998). Speech coding with wavelets. IEEE Potentials, 17(2), 38–41.

  • Mallat, S. A. (1989). Theory for muItiresolution signal decomposition: The wavelet representation. EEE Transactions on Pattern Analysis. Machine Intelligence, 31, 674–693.

  • McClellan, J. H., & Schafer, R. W. (2003). Signal processing first. Upper Saddle River: Pearson Education.

    Google Scholar 

  • Meyer, Y., & Ryan, R. D. (1993). Wavelets: algorithms and applications. Philadelphia: Society for Industrial and Applied Mathematics.

    Google Scholar 

  • Nelson, M., & Gailly, J.-L. (2003). The data compression book (2nd ed.). Mumba: BPB Publications.

    Google Scholar 

  • Oi, J., & Viswanathan, V. (1995). Application of wavelets to speech processing, modern methods of speech processing. Boston: Kluwer Academic Publishers.

  • Osman, M.A., Al, N., Magboub, H.M., & Alfandi, S.A. (2010). Speech compression using LPC and wavelet, in 2010 2nd international conference on computer engineering and technology (ICCET), (pp. V7–92–V97-99).

  • Osman, A., Nasser A.I., Magboub, H.M., & Alfandi, S.A. (2010). Speech compression using LPC and Wavelet, In 2nd international conference on computer engineering and technology IEEE, pp. 7.

  • Painter, T., & Spanias, A. (2000). Perceputal coding of digital audio. Proceedings of the IEEE, 884, 62.

  • Polikar, R. (1999). The story of wavelets. In Proceedings of IMACS/IEEE CSCC’99 (pp. 5481–5486).

  • Polikar, R. (1996). Fundamental concept & an over view of the wavelet theory. Glassboro: Rowan University.

    Google Scholar 

  • Rabiner, L., & Schafer, R. W. (2003). Digital processing of speech signals. New Delhi: Pearson Education.

    Google Scholar 

  • Rabiner, L. R., Juang, B. H., & Yengnanarayana, B. (2009). Fundamentals of speech recognition. New Delhi: Pearson Education Inc.

    Google Scholar 

  • Rao, R. M., & Ajit, S. (2004). Wavelet transforms: Introduction to theory and applications. New Delhi: Pearson Education Pvt. Ltd,

  • Resnikoff, H. L., & Wells, R. O. (2004). Wavelet analysis: The scalable strcture of information. Heidelberg: Springer.

    Google Scholar 

  • Salomon, D. (2011). Data compression, The complete reference (4th ed.). New Delh: Springer.

    MATH  Google Scholar 

  • Sayood, K. (2000). Introduction to data compression (2nd ed.). New Delhi: Elsevier India Pvt Ltd.

  • Schiller, J. (2005). Mobile communication (2e ed.). New Delhi: Pearson Education.

    Google Scholar 

  • Wu, X.-Q., Wang, K.-Q., & Zhang, D. (2005). Wavelet energy feature extraction and matching for palmprint recognition. Journal of Computer Science and Technology, 203, 411–418.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shijo M. Joseph.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Joseph, S.M., Babu, A.P. Wavelet energy based voice activity detection and adaptive thresholding for efficient speech coding. Int J Speech Technol 19, 537–550 (2016). https://doi.org/10.1007/s10772-014-9240-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-014-9240-x

Keywords

Navigation