Skip to main content
Log in

A Wideband Scalable Bit Rate Mixed Excitation Linear Prediction-Enhanced Speech Coder by Preserving Speaker-Specific Features

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

There has been a significant growth in the mobile devices and services, fuelling an increasing demand for voice-activated applications. In this context, it is important that individual speaker characteristics are captured, in addition to the salient information in the speech signal. Thus, efficient speech coders that can achieve the dual goals of compact speech representation that maintains speech intelligibility and quality, and preservation of speaker-specific characteristics are attractive. A wideband scalable bit rate mixed excitation linear prediction-enhanced speech coder with an efficient representation for excitation using glottal instants and linear predictive coding based on mel scale is proposed in this paper. The instantaneous pitch or epoch is included in the excitation to get an accurate estimation of glottal instants, a vital parameter in speaker recognition. By optimizing the bit requirement using speech category-based coding, the proposed wideband coder can operate at bit rates ranging from 3.3 to 5.1 kbps with an average bit rate of 3.6 kbps. The proposed coder provides, at 3.6 kbps, similar perceptual quality, as measured by mean opinion score and perceptual evaluation of speech quality, as that of code excited linear prediction operating at 6.4 kbps. The performance of the proposed coder in speaker recognition is analysed, and it gives an equal error rate of 12.5%, which is very promising.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data Availability

The current study used two datasets; TIMIT and James Hillenbrand databases for performance analysis of various aspects and the information regarding these are given, respectively, in [18, 22] along with the link for accessing these.

References

  1. G. Alipoor, M.H. Savoji, Wide-band speech coding based on bandwidth extension and sparse linear prediction. 2012 35th International Conference on Telecommunications and Signal Processing (TSP) (Prague, 2012), pp. 454–459

  2. T. Ananthapadmanabha, B. Yegnanarayana, Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Trans. Acoust. Speech Signal Process. 27(4), 309–319 (1979)

    Article  Google Scholar 

  3. M.S. Arun Sankar, P.S. Sathidevi, An investigation on the degradation of different features extracted from the compressed American English speech using narrowband and wideband codecs. Int. J. Speech Technol. 21, 861–876 (2018). https://doi.org/10.1007/s10772-018-09559-5

    Article  Google Scholar 

  4. M.S. Arun-Sankar, P.S. Sathidevi, Design of MELPe-based variable-bit-rate speech coding with mel scale approach using low-order linear prediction filter and representing excitation signal using glottal closure instants. Arab. J. Sci. Eng. (2019). https://doi.org/10.1007/s13369-019-04273-z

    Article  Google Scholar 

  5. M.S. Arun-Sankar, P.S. Sathidevi, Mel scale-based linear prediction approach to reduce the prediction filter order in CELP paradigm. Circuits Syst. Signal Process. 40, 1–23 (2021). https://doi.org/10.1007/s00034-021-01647-3

    Article  Google Scholar 

  6. M.S. Athulya, P.S. Sathidevi, Speaker verification from codec distorted speech for forensic investigation through serial combination of classifiers. Digit. Investig. 25, 70–77 (2018). https://doi.org/10.1016/j.diin.2018.03.005

    Article  Google Scholar 

  7. M.S. Athulya, P.S. Sathidevi, Speaker verification from codec-distorted speech through combination of affine transform and feature switching. Circuits Syst. Signal Process. 40, 6016–6034 (2021)

    Article  Google Scholar 

  8. T. Backstrom, Speech coding. Signals and Communication Technology (Springer International Publishing AG, 2017) https://doi.org/10.1007/978-3-319-50204-5_5

  9. P. Boersma, D. Weenink, Praat: doing phonetics by computer. Version 6.0.40 (2018)

  10. A. Bouzid, N. Ellouze, Glottal opening instant detection from speech signal. 2004 12th European Signal Processing Conference (Vienna, 2004), pp. 729–732

  11. M. Bouzid, S.E. Cheraitia, M. Hireche, Switched split vector quantizer applied for encoding the LPC parameters of the 2.4 Kbits/s MELP speech coder. 2010 7th International Multi-conference on Systems, Signals and Devices (Amman, 2010), pp. 1–5

  12. S. Bruhn et al., Standardization of the new 3GPP EVS codec. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2015), pp. 5703–5707. https://doi.org/10.1109/ICASSP.2015.7179064

  13. C. Cannam, C. Landone, M. Sandler, An open source application for viewing, analysing, and annotating music audio files. Proceedings of the ACM Multimedia 2010 International Conference, Firenze, Italy, October, pp. 1467–1468, 2010

  14. W.C. Chu, Speech Coding Algorithms: Foundation and Evolution of Standardized Coders (Wiley, Hoboken, 2004)

    MATH  Google Scholar 

  15. V. Cuperman et al., A novel approach to excitation coding in low-bit-rate high-quality CELP coders. 2000 IEEE Workshop on Speech Coding (Delavan, WI, USA, 2000), pp. 14–16

  16. A.M. De Lima Araujo, F. Violaro, Formant frequency estimation using a Mel-scale LPC algorithm. Telecommunications Symposium, 1998. ITS ’98 Proceedings vol. 1 (SBT/IEEE International, Sao Paulo, 1998), pp. 207–212

  17. T. Friedrich, G. Schuller, Spectral band replication tool for very low delay audio coding applications. 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (New Paltz, NY, USA, 2007), pp. 199–202

  18. J.S. Garofolo et al., TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1 (Linguistic Data Consortium, Philadelphia, 1993)

    Google Scholar 

  19. J.D. Gibson, Challenges in speech coding research. in Speech and Audio Processing for Coding, Enhancement and Recognition. (Springer, 2015), pp. 19–39

  20. J.D. Gibson, Speech compression. Information 7(2), 32 (2016). https://doi.org/10.3390/info7020032

    Article  Google Scholar 

  21. A. Gray, J. Markel, Distance measures for speech processing. IEEE Trans. Acoust. Speech Signal Process. 24(5), 380–391 (1976)

    Article  Google Scholar 

  22. J.M. Hillenbrand, L.A. Getty, M.J. Clark, K. Wheeler, Acoustic characteristics of American English vowels. J. Acoust. Soc. Am. 97, 3011–3099 (1995)

    Article  Google Scholar 

  23. ITU-T. Recommendation, P.862.1 Mapping function for transforming P.862 raw result scores to MOS-LQO

  24. R. Jarina, J. Polacký, P. Poćta, M. Chmulik, Automatic speaker verification on narrowband and wideband lossy coded clean speech. IET Biom 6, 276–281 (2017)

    Article  Google Scholar 

  25. G. Jyothish-Lal, E.A. Gopalakrishnan, D. Govind, Epoch estimation from emotional speech signals using variational mode decomposition. Circuits Syst. Signal Process. 37(8), 3245–3274 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  26. A. Krobba, M. Debyeche, S.A. Selouani, Maximum entropy PLDA for robust speaker recognition under speech coding distortion. Int. J. Speech Technol. 22, 1115–1122 (2019)

    Article  Google Scholar 

  27. E. Kruger, H.W. Strube, Linear prediction on a warped frequency scale speech processing. IEEE Trans. Acoust. Speech Signal Process. 36(9), 1529–1531 (1988)

    Article  MATH  Google Scholar 

  28. U.K. Laine, M. Karjalainen, T. Altosaar, Warped linear prediction (WLP) in speech and audio processing. 1994 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1994. ICASSP-94, vol.3. Adelaide, SA, 1994, pp. III/349-III/352

  29. M. Lourakis, A brief description of the Levenberg–Marquardt algorithm implemened by levmar. Found. Res. Technol. 4, 1–6 (2005)

    Google Scholar 

  30. R. Martin, R.V. Cox, New speech enhancement techniques for low bit rate speech coding. 1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351) (Porvoo, Finland, 1999), pp. 165–167

  31. A.V. McCree, T.P. Barnwell, A mixed excitation LPC vocoder model for low bit rate speech coding. IEEE Trans. Speech Audio Process. 3(4), 242–250 (1995)

    Article  Google Scholar 

  32. P. Nizampatnam, K.K. Tappeta, Bandwidth extension of narrowband speech using integer wavelet transform. IET Signal Process. 11(4), 437–445 (2017). https://doi.org/10.1049/iet-spr.2016.0453

    Article  Google Scholar 

  33. K.K. Paliwal, B.S. Atal, Efficient vector quantization of LPC parameters at 24 bits/frame. IEEE Trans. Speech Audio Process. 1(1), 3–14 (1993)

    Article  Google Scholar 

  34. D. Pravena, D. Govind, Significance of incorporating excitation source parameters for improved emotion recognition from speech and electroglottographic signals. Int. J. Speech Technol. 20(4), 787–797 (2017)

    Article  Google Scholar 

  35. L.R. Rabiner, R.W. Schafer, Digital Processing of Speech Signals (Prentice-Hall, New Jersey, 1978)

    Google Scholar 

  36. K.S. Rao, B. Yegnanarayana, Prosody modification using instants of significant excitation. IEEE Trans. Audio Speech Lang. Process. 14(3), 972–980 (2006)

    Article  Google Scholar 

  37. A.W. Rix, J.G. Beerends, M.P. Hollier, A.P. Hekstra, Perceptual evaluation of speech quality (PESQ)—a new method for speech quality assessment of telephone networks and codecs. in 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221) vol.2 (Salt Lake City, UT, 2001), pp. 749–752

  38. S. Singh, The role of speech technology in biometrics, forensics and man-machine interface. Int. J. Electric. Comput. Eng. (IJECE) (2019). https://doi.org/10.11591/ijece.v9i1.pp281-288

    Article  Google Scholar 

  39. K. Sreenivasa Rao, B. Yegnanarayana, Prosodic manipulation using instants of significant excitation. in 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP ’03) (Hong Kong, 2003), p. I

  40. C.J. van der Merwe, J.A. du Preez, Calculation of LPC-based cepstrum coefficients using mel-scale frequency warping. in COMSIG 1991 Proceedings: South African Symposium on Communications and Signal Processing (Pretoria, 1991), pp. 17–21

  41. R. Vergin, D. O’Shaughnessy, A. Farhat, Generalized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition. IEEE Trans. Speech Audio Process. 7(5), 525–532 (1999)

    Article  Google Scholar 

  42. A.K. Vuppala, J. Yadav, S. Chakrabarti, K.S. Rao, Effect of low bit rate speech coding on epoch extraction. in 2011 International Conference on Devices and Communications (ICDeCom) (Mesra, 2011), pp. 1–4

  43. B. Yegnanarayana, Suryakanth V. Gangashetty, Epoch-based analysis of speech signals. Sadhana 36(5), 651–697 (2011)

    Article  Google Scholar 

  44. E.W. M. Yu, M.-W. Mak, S.-Y. Kung. Speaker verification from coded telephone speech using stochastic feature transformation and handset identification. in Pacific-Rim Conference on Multimedia (Springer, Berlin, 2002)

Download references

Acknowledgements

Authors would like to thank Department of Science and Technology, Government of India, for supporting this work under the FIST scheme No. SR/FST/ET-I/2017/68.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. S. Arun Sankar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sankar, M.S.A., Sathidevi, P.S. A Wideband Scalable Bit Rate Mixed Excitation Linear Prediction-Enhanced Speech Coder by Preserving Speaker-Specific Features. Circuits Syst Signal Process 42, 3437–3463 (2023). https://doi.org/10.1007/s00034-022-02277-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-022-02277-z

Keywords

Navigation