Mel Scale-Based Linear Prediction Approach to Reduce the Prediction Filter Order in CELP Paradigm

Sankar, M. S. Arun; Sathidevi, P. S.

doi:10.1007/s00034-021-01647-3

Mel Scale-Based Linear Prediction Approach to Reduce the Prediction Filter Order in CELP Paradigm

Published: 25 January 2021

Volume 40, pages 3813–3835, (2021)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

184 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

This paper proposes a novel method to reduce the order of prediction filter from 10 to 7 in Code Excited Linear Prediction (CELP) coding framework by the inclusion of psychoacoustic Mel scale into Linear Predictive Coding (Mel-LPC). Efficient quantization methods using 2-split Vector Quantization (VQ) for Mel-LPC obtained a reduction of 4 bits/frame and resulted in a total bit gain of 200 bps. A weighting scheme for the Euclidean distance measure gave a reduction of 6 bits/frame that adds up to a total bit gain of 300 bps. A lower Mel-LPC order of 3 has been employed for unvoiced frames by using the perceptual quality as selection criteria and an efficient VQ method using 5 bits is developed which brought down the average bit requirement to 11.5 bits/frame. To incorporate this into Mel-LPC-based CELP encoding scheme, a neural network-based voiced-unvoiced classification algorithm using 5 derived features as input has been constructed and this selection of filter order based on signal statistics provides the benefit of bit reduction by 625 and 325 bps, respectively, for 10th order LPC and 7th order Mel-LPC. In addition to all, the incorporation of Mel-LPC gives a better performance in the estimation of formants.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

Mahendra Kumar Gourisaria, Rakshit Agrawal, … Pradeep Kumar Singh

A review of channel selection algorithms for EEG signal processing

Article Open access 01 August 2015

Turky Alotaiby, Fathi E Abd El-Samie, … Ishtiaq Ahmad

Power line noise and baseline wander removal from ECG signals using empirical mode decomposition and lifting wavelet transform technique

Article 06 April 2022

Shahid A. Malik, Shabir A. Parah & Bilal A. Malik

References

A.K.H. Al-Ali, D. Dean, B. Senadji, V. Chandran, G.R. Naik, Enhanced forensic speaker verification using a combination of DWT and MFCC feature warping in the presence of noise and reverberation conditions. IEEE Access 5, 15400–15413 (2017). https://doi.org/10.1109/ACCESS.2017.2728801
Article Google Scholar
A. Albahri, M. Lech, Effects of band reduction and coding on speech emotion recognition, 2016 International Conference on Signal Processing and Communication Systems, 12, 1–8 (2016)
B.S. Atal, The history of linear prediction. IEEE Signal Process. Mag. 23(2), 154–161 (2006)
Article Google Scholar
P. Boersma, D. Weenink, Praat: doing phonetics by computer, Version 6.0.40 (2018)
M. Bouzid, S.E. Cheraitia, M. Hireche, Switched split vector quantizer applied for encoding the LPC parameters of the 2.4 Kbits/s MELP speech coder, 2010 7th International Multi- Conference on Systems, Signals and Devices, Amman, 1–5 (2010)
C. Cannam, C. Landone, M. Sandler, An Open Source Application for Viewing, Analysing, and Annotating Music Audio Files, Proceedings of the ACM Multimedia 2010 International Conference, Firenze, Italy, 1467–1468 (2010)
W.C. Chu, Speech coding algorithms: foundation and evolution of standardized coders (Wiley, Hoboken, 2004)
MATH Google Scholar
A. M. De Lima Araujo, F. Violaro, Formant frequency estimation using a Mel-scale LPC algorithm, Telecommunications Symposium, 1998. ITS ’98 Proceedings. SBT/IEEE International, Sao Paulo, 1, 207–212 (1998)
H. Deng, D. O’Shaughnessy, Voiced-unvoiced-silence speech sound classification based on unsupervised learning, 2007 IEEE International Conference on Multimedia and Expo, Beijing, 176–179 (2007)
Nilanjan Dey, Amira S. Ashour, Direction of arrival estimation and localization of multi-speech sources. SpringerBriefs Electr. Comput. Eng. (2018). https://doi.org/10.1007/978-3-319-73059-2
Article Google Scholar
John S. Garofolo et al., TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1 (Linguistic Data Consortium, Philadelphia, 1993)
Google Scholar
J.D. Gibson, Speech coding methods, standards, and applications. IEEE Circuits Syst. Mag. 5(4), 30–49 (2005)
Article Google Scholar
A. Gray, J. Markel, Distance measures for speech processing. IEEE Trans. Acoustics Speech Signal Process. 24(5), 380–391 (1976)
Article Google Scholar
J.M. Hillenbrand, L.A. Getty, M.J. Clark, K. Wheeler, Acoustic characteristics of American English vowels. J. Acoustical Soc. Am. 3011–3099 (1995)
ITU-T Enhanced Voice Services (EVS) coder, Codec for Enhanced Voice Services (EVS); Performance Characterization, (2014)
ITU-T. Recommendation P.862.1 Mapping function for transforming P.862 raw result scores to MOS-LQO, (2003)
ITU-T G.720.1: Generic Sound Activity Detector (Series G: Transmission Systems and Media, Digital Systems and Networks: Digital Terminal Equipments - Coding of Voice and Audio Signals). Technical Report Telecommunication standardization sector of ITU (ITU-T). https://www.itu.int/rec/T-REC-G.720.1 (2010)
R. Jarina, J. Polacký, P. Počta, M. Chmulík, Automatic speaker verification on narrowband and wideband lossy coded clean speech. IET Biometrics 6(4), 276–281 (2017)
Article Google Scholar
Polacký Jozef, Počta Peter, Jarina Roman, An impact of narrowband speech codec mismatch on a performance of GMM-UBM speaker recognition over telecommunication channel. Commun. Sci. Lett. Univ. Zilina 18, 23–28 (2016)
Google Scholar
S. Kadiri, A Quantitative Comparison of Epoch Extraction Algorithms for Telephone Speech, 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2019. Proceedings. (ICASSP ’19), 6500–6504 (2019)
A.I. Koutrouvelis, G.P. Kafentzis, N.D. Gaubitch, R. Heusdens, A fast method for high-resolution voiced/unvoiced detection and glottal closure/opening instant estimation of speech. IEEE Trans. Audio Speech Lang. Process. 24(2), 316–328 (2016)
Article Google Scholar
E. Kruger, H.W. Strube, Linear prediction on a warped frequency scale. IEEE Trans. Acoustics Speech Signal Process. 36(9), 1529–1531 (1988)
Article Google Scholar
F. Labelle, R. Lefebvre, P. Gournay, A subjective evaluation of the effects of speech coding on the perception of emotions, 2016 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), 1–6 (2016)
U.K. Laine, M. Karjalainen, T. Altosaar, Warped linear prediction (WLP) in speech and audio processing, 1994 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1994. ICASSP-94, Adelaide, SA, 3, III/349-III/352 (1994)
G. Jyothish Lal, E.A. Gopalakrishnan, G. Divu, Epoch estimation from emotional speech signals using variational mode decomposition, circuits, systems, and signal processing, 37 (2018)
Y. Li, Q. Hao, P. Zhang, J. Jiang, X. Ma, Y. Fan, H.V. Davydau, A variable-bit-rate speech coding algorithm based on enhanced mixed excitation linear prediction, 2016 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), 915–919 (2016)
P.K. Meher, B.K. Mohanty, S.K. Patel, S. Ganguly, T. Srikanthan, Efficient VLSI Architecture for Decimation-in-Time Fast Fourier Transform of Real-Valued Data. IEEE Transactions on Circuits and Systems I: Regular Papers 62(12), 2836–2845 (2015)
Article MathSciNet Google Scholar
K.K. Paliwal, B.S. Atal, Efficient vector quantization of LPC parameters at 24 bits/frame. IEEE Transa. Speech Audio Process. 1(1), 3–14 (1993)
Article Google Scholar
C.J. vander Merwe, J.A. du Preez, Calculation of LPC-based cepstrum coefficients using mel-scale frequency warping, COMSIG 1991 Proceedings: South African Symposium on Communications and Signal Processing, Pretoria, 17–21 (1991)
L.R. Rabiner, R.W. Schafer, Digital Processing of Speech Signals (Prentice-Hall, New Jersey, 1978)
Google Scholar
A.W. Rix, J.G. Beerends, M.P. Hollier, A.P. Hekstra, Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), Salt Lake City, UT, 2, 749–752 (2001)
M.S. Arun Sankar, P.S. Sathidevi, Design of MELPe based variable bit rate speech coding with Mel scale approach using low order linear prediction filter and representing excitation signal using glottal closure instants. Arabian Journal for Science and Engineering, 4(3), 1785–1801 (2019), https://doi.org/10.1007/s13369-019-04273-z
K.Shikano, Evaluation of spectral matching measures for phonetic unit recognition, Internal report, Computer Science Department, Carnegie Mellon University, (1986)
A.S. Spanias, Speech coding: a tutorial review. Proc. IEEE 82(10), 1541–1582 (1994)
Article Google Scholar
R. Vergin, D. O’Shaughnessy, A. Farhat, Generalized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition. IEEE Trans. Speech Audio Process. 7(5), 525–532 (1999)
Article Google Scholar
C.M. Vikram, P. Mahadeva, Epoch Extraction From Telephone Quality Speech Using Single Pole Filter, IEEE/ACM Transactions on Audio, Speech, and Language Processing (2017)
Y. Zhang, L. Ni, Feature extraction algorithm fusing GFCC and phase information, 2017 IEEE 2nd Advanced Information Technology. Electronic and Automation Control Conference (IAEAC) 1163–1167 (2017)

Download references

Acknowledgements

Authors would like to thank Department of Science & Technology, Government of India, for supporting this work under the FIST scheme No. SR/FST/ET-I/2017/68.

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, National Institute of Technology Calicut, Kerala, India
M. S. Arun Sankar & P. S. Sathidevi

Authors

M. S. Arun Sankar
View author publications
You can also search for this author in PubMed Google Scholar
P. S. Sathidevi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. S. Arun Sankar.

Ethics declarations

Data Availability

The current study used three datasets; TIMIT, P.862 and James Hillenbrand databases for performance analysis of various aspects and the information regarding these are given respectively in [11] and [14] along with the link for accessing these.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sankar, M.S.A., Sathidevi, P.S. Mel Scale-Based Linear Prediction Approach to Reduce the Prediction Filter Order in CELP Paradigm. Circuits Syst Signal Process 40, 3813–3835 (2021). https://doi.org/10.1007/s00034-021-01647-3

Download citation

Received: 23 June 2020
Revised: 28 December 2020
Accepted: 07 January 2021
Published: 25 January 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s00034-021-01647-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mel Scale-Based Linear Prediction Approach to Reduce the Prediction Filter Order in CELP Paradigm

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

A review of channel selection algorithms for EEG signal processing

Power line noise and baseline wander removal from ECG signals using empirical mode decomposition and lifting wavelet transform technique

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Data Availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mel Scale-Based Linear Prediction Approach to Reduce the Prediction Filter Order in CELP Paradigm

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

A review of channel selection algorithms for EEG signal processing

Power line noise and baseline wander removal from ECG signals using empirical mode decomposition and lifting wavelet transform technique

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Data Availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation