Skip to main content
Log in

Mel Scale-Based Linear Prediction Approach to Reduce the Prediction Filter Order in CELP Paradigm

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

This paper proposes a novel method to reduce the order of prediction filter from 10 to 7 in Code Excited Linear Prediction (CELP) coding framework by the inclusion of psychoacoustic Mel scale into Linear Predictive Coding (Mel-LPC). Efficient quantization methods using 2-split Vector Quantization (VQ) for Mel-LPC obtained a reduction of 4 bits/frame and resulted in a total bit gain of 200 bps. A weighting scheme for the Euclidean distance measure gave a reduction of 6 bits/frame that adds up to a total bit gain of 300 bps. A lower Mel-LPC order of 3 has been employed for unvoiced frames by using the perceptual quality as selection criteria and an efficient VQ method using 5 bits is developed which brought down the average bit requirement to 11.5 bits/frame. To incorporate this into Mel-LPC-based CELP encoding scheme, a neural network-based voiced-unvoiced classification algorithm using 5 derived features as input has been constructed and this selection of filter order based on signal statistics provides the benefit of bit reduction by 625 and 325 bps, respectively, for 10th order LPC and 7th order Mel-LPC. In addition to all, the incorporation of Mel-LPC gives a better performance in the estimation of formants.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. A.K.H. Al-Ali, D. Dean, B. Senadji, V. Chandran, G.R. Naik, Enhanced forensic speaker verification using a combination of DWT and MFCC feature warping in the presence of noise and reverberation conditions. IEEE Access 5, 15400–15413 (2017). https://doi.org/10.1109/ACCESS.2017.2728801

    Article  Google Scholar 

  2. A. Albahri, M. Lech, Effects of band reduction and coding on speech emotion recognition, 2016 International Conference on Signal Processing and Communication Systems, 12, 1–8 (2016)

  3. B.S. Atal, The history of linear prediction. IEEE Signal Process. Mag. 23(2), 154–161 (2006)

    Article  Google Scholar 

  4. P. Boersma, D. Weenink, Praat: doing phonetics by computer, Version 6.0.40 (2018)

  5. M. Bouzid, S.E. Cheraitia, M. Hireche, Switched split vector quantizer applied for encoding the LPC parameters of the 2.4 Kbits/s MELP speech coder, 2010 7th International Multi- Conference on Systems, Signals and Devices, Amman, 1–5 (2010)

  6. C. Cannam, C. Landone, M. Sandler, An Open Source Application for Viewing, Analysing, and Annotating Music Audio Files, Proceedings of the ACM Multimedia 2010 International Conference, Firenze, Italy, 1467–1468 (2010)

  7. W.C. Chu, Speech coding algorithms: foundation and evolution of standardized coders (Wiley, Hoboken, 2004)

    MATH  Google Scholar 

  8. A. M. De Lima Araujo, F. Violaro, Formant frequency estimation using a Mel-scale LPC algorithm, Telecommunications Symposium, 1998. ITS ’98 Proceedings. SBT/IEEE International, Sao Paulo, 1, 207–212 (1998)

  9. H. Deng, D. O’Shaughnessy, Voiced-unvoiced-silence speech sound classification based on unsupervised learning, 2007 IEEE International Conference on Multimedia and Expo, Beijing, 176–179 (2007)

  10. Nilanjan Dey, Amira S. Ashour, Direction of arrival estimation and localization of multi-speech sources. SpringerBriefs Electr. Comput. Eng. (2018). https://doi.org/10.1007/978-3-319-73059-2

    Article  Google Scholar 

  11. John S. Garofolo et al., TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1 (Linguistic Data Consortium, Philadelphia, 1993)

    Google Scholar 

  12. J.D. Gibson, Speech coding methods, standards, and applications. IEEE Circuits Syst. Mag. 5(4), 30–49 (2005)

    Article  Google Scholar 

  13. A. Gray, J. Markel, Distance measures for speech processing. IEEE Trans. Acoustics Speech Signal Process. 24(5), 380–391 (1976)

    Article  Google Scholar 

  14. J.M. Hillenbrand, L.A. Getty, M.J. Clark, K. Wheeler, Acoustic characteristics of American English vowels. J. Acoustical Soc. Am. 3011–3099 (1995)

  15. ITU-T Enhanced Voice Services (EVS) coder, Codec for Enhanced Voice Services (EVS); Performance Characterization, (2014)

  16. ITU-T. Recommendation P.862.1 Mapping function for transforming P.862 raw result scores to MOS-LQO, (2003)

  17. ITU-T G.720.1: Generic Sound Activity Detector (Series G: Transmission Systems and Media, Digital Systems and Networks: Digital Terminal Equipments - Coding of Voice and Audio Signals). Technical Report Telecommunication standardization sector of ITU (ITU-T). https://www.itu.int/rec/T-REC-G.720.1 (2010)

  18. R. Jarina, J. Polacký, P. Počta, M. Chmulík, Automatic speaker verification on narrowband and wideband lossy coded clean speech. IET Biometrics 6(4), 276–281 (2017)

    Article  Google Scholar 

  19. Polacký Jozef, Počta Peter, Jarina Roman, An impact of narrowband speech codec mismatch on a performance of GMM-UBM speaker recognition over telecommunication channel. Commun. Sci. Lett. Univ. Zilina 18, 23–28 (2016)

    Google Scholar 

  20. S. Kadiri, A Quantitative Comparison of Epoch Extraction Algorithms for Telephone Speech, 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2019. Proceedings. (ICASSP ’19), 6500–6504 (2019)

  21. A.I. Koutrouvelis, G.P. Kafentzis, N.D. Gaubitch, R. Heusdens, A fast method for high-resolution voiced/unvoiced detection and glottal closure/opening instant estimation of speech. IEEE Trans. Audio Speech Lang. Process. 24(2), 316–328 (2016)

    Article  Google Scholar 

  22. E. Kruger, H.W. Strube, Linear prediction on a warped frequency scale. IEEE Trans. Acoustics Speech Signal Process. 36(9), 1529–1531 (1988)

    Article  Google Scholar 

  23. F. Labelle, R. Lefebvre, P. Gournay, A subjective evaluation of the effects of speech coding on the perception of emotions, 2016 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), 1–6 (2016)

  24. U.K. Laine, M. Karjalainen, T. Altosaar, Warped linear prediction (WLP) in speech and audio processing, 1994 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1994. ICASSP-94, Adelaide, SA, 3, III/349-III/352 (1994)

  25. G. Jyothish Lal, E.A. Gopalakrishnan, G. Divu, Epoch estimation from emotional speech signals using variational mode decomposition, circuits, systems, and signal processing, 37 (2018)

  26. Y. Li, Q. Hao, P. Zhang, J. Jiang, X. Ma, Y. Fan, H.V. Davydau, A variable-bit-rate speech coding algorithm based on enhanced mixed excitation linear prediction, 2016 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), 915–919 (2016)

  27. P.K. Meher, B.K. Mohanty, S.K. Patel, S. Ganguly, T. Srikanthan, Efficient VLSI Architecture for Decimation-in-Time Fast Fourier Transform of Real-Valued Data. IEEE Transactions on Circuits and Systems I: Regular Papers 62(12), 2836–2845 (2015)

    Article  MathSciNet  Google Scholar 

  28. K.K. Paliwal, B.S. Atal, Efficient vector quantization of LPC parameters at 24 bits/frame. IEEE Transa. Speech Audio Process. 1(1), 3–14 (1993)

    Article  Google Scholar 

  29. C.J. vander Merwe, J.A. du Preez, Calculation of LPC-based cepstrum coefficients using mel-scale frequency warping, COMSIG 1991 Proceedings: South African Symposium on Communications and Signal Processing, Pretoria, 17–21 (1991)

  30. L.R. Rabiner, R.W. Schafer, Digital Processing of Speech Signals (Prentice-Hall, New Jersey, 1978)

    Google Scholar 

  31. A.W. Rix, J.G. Beerends, M.P. Hollier, A.P. Hekstra, Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), Salt Lake City, UT, 2, 749–752 (2001)

  32. M.S. Arun Sankar, P.S. Sathidevi, Design of MELPe based variable bit rate speech coding with Mel scale approach using low order linear prediction filter and representing excitation signal using glottal closure instants. Arabian Journal for Science and Engineering, 4(3), 1785–1801 (2019), https://doi.org/10.1007/s13369-019-04273-z

  33. K.Shikano, Evaluation of spectral matching measures for phonetic unit recognition, Internal report, Computer Science Department, Carnegie Mellon University, (1986)

  34. A.S. Spanias, Speech coding: a tutorial review. Proc. IEEE 82(10), 1541–1582 (1994)

    Article  Google Scholar 

  35. R. Vergin, D. O’Shaughnessy, A. Farhat, Generalized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition. IEEE Trans. Speech Audio Process. 7(5), 525–532 (1999)

    Article  Google Scholar 

  36. C.M. Vikram, P. Mahadeva, Epoch Extraction From Telephone Quality Speech Using Single Pole Filter, IEEE/ACM Transactions on Audio, Speech, and Language Processing (2017)

  37. Y. Zhang, L. Ni, Feature extraction algorithm fusing GFCC and phase information, 2017 IEEE 2nd Advanced Information Technology. Electronic and Automation Control Conference (IAEAC) 1163–1167 (2017)

Download references

Acknowledgements

Authors would like to thank Department of Science & Technology, Government of India, for supporting this work under the FIST scheme No. SR/FST/ET-I/2017/68.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. S. Arun Sankar.

Ethics declarations

Data Availability

The current study used three datasets; TIMIT, P.862 and James Hillenbrand databases for performance analysis of various aspects and the information regarding these are given respectively in [11] and [14] along with the link for accessing these.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sankar, M.S.A., Sathidevi, P.S. Mel Scale-Based Linear Prediction Approach to Reduce the Prediction Filter Order in CELP Paradigm. Circuits Syst Signal Process 40, 3813–3835 (2021). https://doi.org/10.1007/s00034-021-01647-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-021-01647-3

Keywords

Navigation