A Wideband Scalable Bit Rate Mixed Excitation Linear Prediction-Enhanced Speech Coder by Preserving Speaker-Specific Features

Sankar, M. S. Arun; Sathidevi, P. S.

doi:10.1007/s00034-022-02277-z

A Wideband Scalable Bit Rate Mixed Excitation Linear Prediction-Enhanced Speech Coder by Preserving Speaker-Specific Features

Published: 04 January 2023

Volume 42, pages 3437–3463, (2023)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

176 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

There has been a significant growth in the mobile devices and services, fuelling an increasing demand for voice-activated applications. In this context, it is important that individual speaker characteristics are captured, in addition to the salient information in the speech signal. Thus, efficient speech coders that can achieve the dual goals of compact speech representation that maintains speech intelligibility and quality, and preservation of speaker-specific characteristics are attractive. A wideband scalable bit rate mixed excitation linear prediction-enhanced speech coder with an efficient representation for excitation using glottal instants and linear predictive coding based on mel scale is proposed in this paper. The instantaneous pitch or epoch is included in the excitation to get an accurate estimation of glottal instants, a vital parameter in speaker recognition. By optimizing the bit requirement using speech category-based coding, the proposed wideband coder can operate at bit rates ranging from 3.3 to 5.1 kbps with an average bit rate of 3.6 kbps. The proposed coder provides, at 3.6 kbps, similar perceptual quality, as measured by mean opinion score and perceptual evaluation of speech quality, as that of code excited linear prediction operating at 6.4 kbps. The performance of the proposed coder in speaker recognition is analysed, and it gives an equal error rate of 12.5%, which is very promising.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quality Enhancement of Low Bit Rate Speech Coder with Nonlinear Prediction

An investigation on the degradation of different features extracted from the compressed American English speech using narrowband and wideband codecs

Article 29 October 2018

Design of MELPe-Based Variable-Bit-Rate Speech Coding with Mel Scale Approach Using Low-Order Linear Prediction Filter and Representing Excitation Signal Using Glottal Closure Instants

Article 05 December 2019

Data Availability

The current study used two datasets; TIMIT and James Hillenbrand databases for performance analysis of various aspects and the information regarding these are given, respectively, in [18, 22] along with the link for accessing these.

References

G. Alipoor, M.H. Savoji, Wide-band speech coding based on bandwidth extension and sparse linear prediction. 2012 35th International Conference on Telecommunications and Signal Processing (TSP) (Prague, 2012), pp. 454–459
T. Ananthapadmanabha, B. Yegnanarayana, Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Trans. Acoust. Speech Signal Process. 27(4), 309–319 (1979)
Article Google Scholar
M.S. Arun Sankar, P.S. Sathidevi, An investigation on the degradation of different features extracted from the compressed American English speech using narrowband and wideband codecs. Int. J. Speech Technol. 21, 861–876 (2018). https://doi.org/10.1007/s10772-018-09559-5
Article Google Scholar
M.S. Arun-Sankar, P.S. Sathidevi, Design of MELPe-based variable-bit-rate speech coding with mel scale approach using low-order linear prediction filter and representing excitation signal using glottal closure instants. Arab. J. Sci. Eng. (2019). https://doi.org/10.1007/s13369-019-04273-z
Article Google Scholar
M.S. Arun-Sankar, P.S. Sathidevi, Mel scale-based linear prediction approach to reduce the prediction filter order in CELP paradigm. Circuits Syst. Signal Process. 40, 1–23 (2021). https://doi.org/10.1007/s00034-021-01647-3
Article Google Scholar
M.S. Athulya, P.S. Sathidevi, Speaker verification from codec distorted speech for forensic investigation through serial combination of classifiers. Digit. Investig. 25, 70–77 (2018). https://doi.org/10.1016/j.diin.2018.03.005
Article Google Scholar
M.S. Athulya, P.S. Sathidevi, Speaker verification from codec-distorted speech through combination of affine transform and feature switching. Circuits Syst. Signal Process. 40, 6016–6034 (2021)
Article Google Scholar
T. Backstrom, Speech coding. Signals and Communication Technology (Springer International Publishing AG, 2017) https://doi.org/10.1007/978-3-319-50204-5_5
P. Boersma, D. Weenink, Praat: doing phonetics by computer. Version 6.0.40 (2018)
A. Bouzid, N. Ellouze, Glottal opening instant detection from speech signal. 2004 12th European Signal Processing Conference (Vienna, 2004), pp. 729–732
M. Bouzid, S.E. Cheraitia, M. Hireche, Switched split vector quantizer applied for encoding the LPC parameters of the 2.4 Kbits/s MELP speech coder. 2010 7th International Multi-conference on Systems, Signals and Devices (Amman, 2010), pp. 1–5
S. Bruhn et al., Standardization of the new 3GPP EVS codec. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2015), pp. 5703–5707. https://doi.org/10.1109/ICASSP.2015.7179064
C. Cannam, C. Landone, M. Sandler, An open source application for viewing, analysing, and annotating music audio files. Proceedings of the ACM Multimedia 2010 International Conference, Firenze, Italy, October, pp. 1467–1468, 2010
W.C. Chu, Speech Coding Algorithms: Foundation and Evolution of Standardized Coders (Wiley, Hoboken, 2004)
MATH Google Scholar
V. Cuperman et al., A novel approach to excitation coding in low-bit-rate high-quality CELP coders. 2000 IEEE Workshop on Speech Coding (Delavan, WI, USA, 2000), pp. 14–16
A.M. De Lima Araujo, F. Violaro, Formant frequency estimation using a Mel-scale LPC algorithm. Telecommunications Symposium, 1998. ITS ’98 Proceedings vol. 1 (SBT/IEEE International, Sao Paulo, 1998), pp. 207–212
T. Friedrich, G. Schuller, Spectral band replication tool for very low delay audio coding applications. 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (New Paltz, NY, USA, 2007), pp. 199–202
J.S. Garofolo et al., TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1 (Linguistic Data Consortium, Philadelphia, 1993)
Google Scholar
J.D. Gibson, Challenges in speech coding research. in Speech and Audio Processing for Coding, Enhancement and Recognition. (Springer, 2015), pp. 19–39
J.D. Gibson, Speech compression. Information 7(2), 32 (2016). https://doi.org/10.3390/info7020032
Article Google Scholar
A. Gray, J. Markel, Distance measures for speech processing. IEEE Trans. Acoust. Speech Signal Process. 24(5), 380–391 (1976)
Article Google Scholar
J.M. Hillenbrand, L.A. Getty, M.J. Clark, K. Wheeler, Acoustic characteristics of American English vowels. J. Acoust. Soc. Am. 97, 3011–3099 (1995)
Article Google Scholar
ITU-T. Recommendation, P.862.1 Mapping function for transforming P.862 raw result scores to MOS-LQO
R. Jarina, J. Polacký, P. Poćta, M. Chmulik, Automatic speaker verification on narrowband and wideband lossy coded clean speech. IET Biom 6, 276–281 (2017)
Article Google Scholar
G. Jyothish-Lal, E.A. Gopalakrishnan, D. Govind, Epoch estimation from emotional speech signals using variational mode decomposition. Circuits Syst. Signal Process. 37(8), 3245–3274 (2018)
Article MathSciNet MATH Google Scholar
A. Krobba, M. Debyeche, S.A. Selouani, Maximum entropy PLDA for robust speaker recognition under speech coding distortion. Int. J. Speech Technol. 22, 1115–1122 (2019)
Article Google Scholar
E. Kruger, H.W. Strube, Linear prediction on a warped frequency scale speech processing. IEEE Trans. Acoust. Speech Signal Process. 36(9), 1529–1531 (1988)
Article MATH Google Scholar
U.K. Laine, M. Karjalainen, T. Altosaar, Warped linear prediction (WLP) in speech and audio processing. 1994 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1994. ICASSP-94, vol.3. Adelaide, SA, 1994, pp. III/349-III/352
M. Lourakis, A brief description of the Levenberg–Marquardt algorithm implemened by levmar. Found. Res. Technol. 4, 1–6 (2005)
Google Scholar
R. Martin, R.V. Cox, New speech enhancement techniques for low bit rate speech coding. 1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351) (Porvoo, Finland, 1999), pp. 165–167
A.V. McCree, T.P. Barnwell, A mixed excitation LPC vocoder model for low bit rate speech coding. IEEE Trans. Speech Audio Process. 3(4), 242–250 (1995)
Article Google Scholar
P. Nizampatnam, K.K. Tappeta, Bandwidth extension of narrowband speech using integer wavelet transform. IET Signal Process. 11(4), 437–445 (2017). https://doi.org/10.1049/iet-spr.2016.0453
Article Google Scholar
K.K. Paliwal, B.S. Atal, Efficient vector quantization of LPC parameters at 24 bits/frame. IEEE Trans. Speech Audio Process. 1(1), 3–14 (1993)
Article Google Scholar
D. Pravena, D. Govind, Significance of incorporating excitation source parameters for improved emotion recognition from speech and electroglottographic signals. Int. J. Speech Technol. 20(4), 787–797 (2017)
Article Google Scholar
L.R. Rabiner, R.W. Schafer, Digital Processing of Speech Signals (Prentice-Hall, New Jersey, 1978)
Google Scholar
K.S. Rao, B. Yegnanarayana, Prosody modification using instants of significant excitation. IEEE Trans. Audio Speech Lang. Process. 14(3), 972–980 (2006)
Article Google Scholar
A.W. Rix, J.G. Beerends, M.P. Hollier, A.P. Hekstra, Perceptual evaluation of speech quality (PESQ)—a new method for speech quality assessment of telephone networks and codecs. in 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221) vol.2 (Salt Lake City, UT, 2001), pp. 749–752
S. Singh, The role of speech technology in biometrics, forensics and man-machine interface. Int. J. Electric. Comput. Eng. (IJECE) (2019). https://doi.org/10.11591/ijece.v9i1.pp281-288
Article Google Scholar
K. Sreenivasa Rao, B. Yegnanarayana, Prosodic manipulation using instants of significant excitation. in 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP ’03) (Hong Kong, 2003), p. I
C.J. van der Merwe, J.A. du Preez, Calculation of LPC-based cepstrum coefficients using mel-scale frequency warping. in COMSIG 1991 Proceedings: South African Symposium on Communications and Signal Processing (Pretoria, 1991), pp. 17–21
R. Vergin, D. O’Shaughnessy, A. Farhat, Generalized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition. IEEE Trans. Speech Audio Process. 7(5), 525–532 (1999)
Article Google Scholar
A.K. Vuppala, J. Yadav, S. Chakrabarti, K.S. Rao, Effect of low bit rate speech coding on epoch extraction. in 2011 International Conference on Devices and Communications (ICDeCom) (Mesra, 2011), pp. 1–4
B. Yegnanarayana, Suryakanth V. Gangashetty, Epoch-based analysis of speech signals. Sadhana 36(5), 651–697 (2011)
Article Google Scholar
E.W. M. Yu, M.-W. Mak, S.-Y. Kung. Speaker verification from coded telephone speech using stochastic feature transformation and handset identification. in Pacific-Rim Conference on Multimedia (Springer, Berlin, 2002)

Download references

Acknowledgements

Authors would like to thank Department of Science and Technology, Government of India, for supporting this work under the FIST scheme No. SR/FST/ET-I/2017/68.

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, National Institute of Technology Calicut, Calicut, Kerala, 673601, India
M. S. Arun Sankar & P. S. Sathidevi

Authors

M. S. Arun Sankar
View author publications
You can also search for this author in PubMed Google Scholar
P. S. Sathidevi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. S. Arun Sankar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sankar, M.S.A., Sathidevi, P.S. A Wideband Scalable Bit Rate Mixed Excitation Linear Prediction-Enhanced Speech Coder by Preserving Speaker-Specific Features. Circuits Syst Signal Process 42, 3437–3463 (2023). https://doi.org/10.1007/s00034-022-02277-z

Download citation

Received: 21 February 2022
Revised: 15 December 2022
Accepted: 16 December 2022
Published: 04 January 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s00034-022-02277-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Wideband Scalable Bit Rate Mixed Excitation Linear Prediction-Enhanced Speech Coder by Preserving Speaker-Specific Features

Abstract

Access this article

Similar content being viewed by others

Quality Enhancement of Low Bit Rate Speech Coder with Nonlinear Prediction

An investigation on the degradation of different features extracted from the compressed American English speech using narrowband and wideband codecs

Design of MELPe-Based Variable-Bit-Rate Speech Coding with Mel Scale Approach Using Low-Order Linear Prediction Filter and Representing Excitation Signal Using Glottal Closure Instants

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Wideband Scalable Bit Rate Mixed Excitation Linear Prediction-Enhanced Speech Coder by Preserving Speaker-Specific Features

Abstract

Access this article

Similar content being viewed by others

Quality Enhancement of Low Bit Rate Speech Coder with Nonlinear Prediction

An investigation on the degradation of different features extracted from the compressed American English speech using narrowband and wideband codecs

Design of MELPe-Based Variable-Bit-Rate Speech Coding with Mel Scale Approach Using Low-Order Linear Prediction Filter and Representing Excitation Signal Using Glottal Closure Instants

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation