Skip to main content
Log in

Speech enhancement and encoding by combining SS-VAD and LPC

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this paper, the encoding of noisy and enhanced speech data is demonstrated. To encode and enhance the speech data under an uncontrolled environment, the linear predictive coding (LPC) and spectral subtraction with voice activity detection (SS-VAD) methods are studied individually. The noisy speech data is obtained by considering the amalgamation of the clean speech signal and noise model and it is encoded using the LPC technique. The LPC uses a lossy compression procedure to encode the speech data which converts the data rate from 64 to 2.4 Kbps. Due to reverberations and degradations in noisy speech data, the quality of encoded noisy speech data is very less. Therefore, an algorithm is proposed to enhance and encode the speech data by combining SS-VAD and LPC under degraded conditions. In the first step, the encoding of noisy speech data is done using LPC and its performance is evaluated using signal-to-ratio. The noisy speech data is given as input to the SS-VAD algorithm and the output of SS-VAD is given as input to the LPC encoder is followed in the second step. In the LPC encoder, the coefficients are extracted from the input speech data to design all-pole filters. The cross correlation process is also done for differentiating the voiced and unvoiced samples at the analysis step. The pitch information and extracted coefficients are used in the synthesis step. The experiments are conducted for different types of noisy speech data which are degraded by musical noise, F16 noise, factory noise, and car noise. The experimental results show that there is a significant improvement in the quality of enhanced encoded speech data obtained by the proposed method compared to encoded noisy speech data. The schematic representation of outputs of LPC and proposed combined SS-VAD and LPC waveforms are also given in this work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Bhattacharjee, U. (2013). A comparative study of LPCC and MFCC features for the recognition of assamese phonemes. International Journal of Engineering Research & Technology (IJERT), 2(1). ISSN: 2278-0181.

  • Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 2, ASSP-27, 113–120.

  • Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP–32(6), 1109–1121.

    Article  Google Scholar 

  • Loizou, P. (2007). Speech enhancement: Theory and practice (1st ed.). Boca Raton, FL: CRC Taylor & Francis.

    Book  Google Scholar 

  • Mada Sanjaya, W. S., Anggraeni, D, & Santika, I. P. (2018). Speech recognition using linear predictive coding (LPC) and adaptive neuro-fuzzy (ANFIS) to control 5 DoF arm robot. International Conference on Computation in Science and Engineering, IOP Conf. Series: Journal of Physics: Conf. Series, 1090 (2018) 012046 https://doi.org/10.1088/1742-6596/1090/1/012046.

  • Martin, R. (2005). Speech enhancement based on minimum mean-squareerror estimation and supergaussian priors. IEEE Transaction on Speech and Audio Processing,13(5)

  • Moriya, T., Sugiura, R., Kameoka, Y., & Harada, N. (2016). Progress in LPC-based frequency-domain audio coding. SIP, Overview paper, 05(11), 1–10.

    Google Scholar 

  • Paul, A. K., Das, D., & Kamal, M. M. (2009). Bangla speech recognition system using LPC and ANN. IEEE Seventh International Conference on Advances in Pattern Recognition, pp. 171–174.

  • Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. Upper Saddle River: Prentice- Hall, Inc.

    Google Scholar 

  • Ramirez, J., Gorriz, J. M., & Segura, J. C. (2003). Voice activity detection. Fundamentals and speech recognition system robustness.

  • Vanik, R. B. & Dudy, A. (2014). Linear predictive coding algorithm with its application to sound signal compression. International Journal of Engineering and Innovative Technology (IJEIT), 3(12).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yadava G. Thimmaraja.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Thimmaraja, Y.G., Nagaraja, B.G. & Jayanna, H.S. Speech enhancement and encoding by combining SS-VAD and LPC. Int J Speech Technol 24, 165–172 (2021). https://doi.org/10.1007/s10772-020-09786-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-020-09786-9

Keywords

Navigation