Data-Adaptive Single-Pole Filtering of Magnitude Spectra for Robust Keyword Spotting

Rout, Jayant Kumar; Pradhan, Gayadhar

doi:10.1007/s00034-021-01923-2

Data-Adaptive Single-Pole Filtering of Magnitude Spectra for Robust Keyword Spotting

Short Paper
Published: 17 January 2022

Volume 41, pages 3023–3039, (2022)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Jayant Kumar Rout¹ &
Gayadhar Pradhan¹

234 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

This paper proposes a simple and effective data-adaptive smoothing approach to suppress the pitch and environment-induced mismatches in keyword spotting (KWS) systems. In the proposed method, the magnitude spectra are smoothed by processing through a data-adaptive single-pole filter (DA-SPF) before computation of Mel frequency cepstral coefficients (MFCCs) to filter out the high-frequency components, which are mainly due to the pitch periodicity. The pole magnitude, which controls spectral smoothing, is changed adaptively for each analysis frame depending on the normalized spectral magnitude in 0–2500 Hz frequency band. The formant magnitude of the voiced sound units is predominant in this frequency band. Consequently, the magnitude spectra of pitch-sensitive voiced frames are relatively more smoothed than the non-voiced frames. When the KWS systems are developed using MFCCs extracted from the DA-SPF smoothed spectra, referred to as single-pole smoothed (SPS)-MFCCs, significantly improved KWS performances are observed in pitch and noise mismatched test conditions. The SPS-MFCCs result in a relative improvement of 86.12% on the DNN-HMM-based KWS system over the MFCCs baseline for pitch mismatched test conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An approach for reducing pitch induced mismatches to detect keywords in children’s speech

Article 16 September 2021

A Pitch and Noise Robust Keyword Spotting System Using SMAC Features with Prosody Modification

Article 27 October 2020

Addressing Effects of Formant Dispersion and Pitch Sensitivity for the Development of Children’s KWS System

Data Availability

The two different speech corpora used in this paper for experimental evaluations are available online at WSJCAM0 Cambridge Read News and the PF-STAR British English Children’s Speech Corpus.

References

A. Batliner, M. Blomberg, S. D’Arcy, D. Elenius, D. Giuliani, M. Gerosa, C. Hacker, M. Russell, M. Wong, The pf-star children’s speech corpus, in Proceeding on INTERSPEECH, pp. 2761–2764 (2005)
V. Digalakis, D. Rtischev, L. Neumeyer, Speaker adaptation using constrained estimation of Gaussian mixtures. IEEE Trans. Speech Audio Process. 3(5), 357–366 (1995)
Article Google Scholar
J. Garofolo, L. Lamel, W. Fisher, J. Fiscus, D. Pallett, N. Dahlgren, V. Zue, TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1, vol. 33 (Linguistic Data Consortium, Philadelphia, 1993)
Google Scholar
M. Gerosa, D. Giuliani, S. Narayanan, A. Potamianos, A review of ASR technologies for children’s speech, in Proceedings on Workshop on Child, Computer and Interaction, pp. 7:1–7:8 (2009)
S. Ghai, R. Sinha, Exploring the role of spectral smoothing in context of children’s speech recognition, in Proceedings of INTERSPEECH, pp. 1607–1610 (2009)
G.E. Hinton, L. Deng, D. Yu, G. Dahl, A.R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, B. Kingsbury, Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
A. Kumar, S. Shahnawazuddin, G. Pradhan, Non-local estimation of speech signal for vowel onset point detection in varied environments, in Proceedings of INTERSPEECH, pp. 429–433 (2017)
L. Lee, R. Rose, A frequency warping approach to speaker normalization. IEEE Trans. Speech Audio Process. 6(1), 49–60 (1998)
Article Google Scholar
S. Lee, A. Potamianos, S.S. Narayanan, Acoustics of children’s speech: developmental changes of temporal and spectral parameters. J. Acoust. Soc. Am. 105(3), 1455–1468 (1999)
Article Google Scholar
K. Maity, G. Pradhan, J.P. Singh, A pitch and noise robust keyword spotting system using SMAC features with prosody modification. Circuits Syst. Signal Process. 40(4), 1892–1904 (2021)
Article Google Scholar
J. Makhoul, F. Kubala, T. Leek, D. Liu, L. Nguyen, R. Schwartz, A. Srivastava, Speech and language technologies for audio indexing and retrieval. Proc. IEEE 88(8), 1338–1353 (2000)
Article Google Scholar
K.S.R. Murthy, B. Yegnanarayana, Epoch extraction from speech signals. Trans. Audio Speech Lang. Process. 16, 1602–1613 (2008)
Article Google Scholar
S. Narayanan, A. Potamianos, Creating conversational interfaces for children. IEEE Trans. Speech Audio Process. 10(2), 65–78 (2002)
Article Google Scholar
B. Pattanayak, J.K. Rout, G. Pradhan, Adaptive spectral smoothening for development of robust keyword spotting system. IET Signal Proc. 13(5), 544–550 (2019)
Article Google Scholar
A. Potamianos, S. Narayanan, Robust recognition of children’s speech. IEEE Trans. Speech Audio Process. 11(6), 603–616 (2003)
Article Google Scholar
D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, K. Vesely, The Kaldi speech recognition toolkit, in Proceedings of Workshop on Automatic Speech Recognition and Understanding (ASRU) (2011)
S. Prasanna, D. Govind, K.S. Rao, B. Yegnanarayana, Fast prosody modification using instants of significant excitation, in Proceedings of Speech Prosody (2010)
S.P. Rath, D. Povey, K. Veselỳ, J. Cernockỳ, Improved feature processing for deep neural networks, in Proceedings of INTERSPEECH, pp. 109–113 (2013)
T. Robinson, J. Fransen, D. Pye, J. Foote, S. Renals, WSJCAM0: a British English speech corpus for large vocabulary continuous speech recognition, in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP) vol. 1, pp. 81–84 (1995)
J. Schalkwyk, D. Beeferman, F. Beaufays, B. Byrne, C. Chelba, M. Cohen, M. Kamvar, B. Strope, Your word is my command: Google search by voice: a case study, in Advances in Speech Recognition: Mobile Environments. ed. by A. Neustein (Call Centers and Clinics, Springer, Boston, MA, 2010), pp. 61–90
Chapter Google Scholar
S. Shahnawazuddin, A. Dey, R. Sinha, Pitch-adaptive front-end features for robust children’s ASR, in Proceedings of INTERSPEECH (2016)
S. Shahnawazuddin, K. Maity, G. Pradhan, Improving the performance of keyword spotting system for children’s speech through prosody modification. Digital Signal Process. 86, 11–18 (2018)
Article Google Scholar
S. Shahnawazuddin, R. Sinha, G. Pradhan, Pitch-normalized acoustic features for robust children’s speech recognition. IEEE Signal Process. Lett. 24(8), 1128–1132 (2017)
Article Google Scholar
P.G. Shivakumar, A. Potamianos, S. Lee, S. Narayanan, Improving speech recognition for children using acoustic adaptation and pronunciation modeling, in Proceedings of Workshop on Child Computer Interaction (2014)
R. Sinha, S. Ghai, On the use of pitch normalization for improving children’s speech recognition, in Proceedings of INTERSPEECH, pp. 568–571 (2009)
K. Sjölander, J. Beskow, Wavesurfer—an open source speech tool, in Proceedings of INTERSPEECH, pp. 464 – 467 (2000)
A. Varga, H.J. Steeneken, Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)
Article Google Scholar
D. Vergyri, I. Shafran, A. Stolcke, R.R. Gadde, M. Akbacak, B. Roark, W. Wang, The SRI/OGI 2006 spoken term detection system, in Proceedings of Eighth Annual Conference of the International Speech Communication Association (2007)
R.L. Warren, Broadcast speech recognition system for keyword monitoring. US Patent 6,332,120 (2001)
S. Wegmann, A. Faria, A. Janin, K. Riedhammer, N. Morgan, The TAO of ATWV: Probing the mysteries of keyword search performance, in Proceedings of Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 192–197 (2013)
I.C. Yadav, A. Kumar, S. Shahnawazuddin, G. Pradhan, Non-uniform spectral smoothing for robust children’s speech recognition, in Proceedings on INTERSPEECH, pp. 1601–1605 (2018)
I.C. Yadav, G. Pradhan, Significance of pitch-based spectral normalization for children’s speech recognition. IEEE Signal Process. Lett. 26(12), 1822–1826 (2019)
Article Google Scholar
I.C. Yadav, S. Shahnawazuddin, G. Pradhan, Spectral smoothing by variational mode decomposition and its effect on noise and pitch robustness of ASR system, in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 5629–5633 (2018)
I.C. Yadav, S. Shahnawazuddin, G. Pradhan, Addressing noise and pitch sensitivity of speech recognition system through variational mode decomposition based spectral smoothing. Digital Signal Process. 86, 55–64 (2019)
Article Google Scholar
M. Zbancioc, M. Costin, Using neural networks and LPCC to improve speech recognition, in Proceedings of SCS 2003. International Symposium on Signals, Circuits and Systems, vol. 2, pp. 445–448 (2003)
N. Zhao, H. Yang, Realizing speech to gesture conversion by keyword spotting, in Proceedings of Chinese Spoken Language Processing (ISCSLP), pp. 1–5 (2016)

Download references

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, National Institute of Technology Patna, Patna, India
Jayant Kumar Rout & Gayadhar Pradhan

Authors

Jayant Kumar Rout
View author publications
You can also search for this author in PubMed Google Scholar
Gayadhar Pradhan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jayant Kumar Rout.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rout, J.K., Pradhan, G. Data-Adaptive Single-Pole Filtering of Magnitude Spectra for Robust Keyword Spotting. Circuits Syst Signal Process 41, 3023–3039 (2022). https://doi.org/10.1007/s00034-021-01923-2

Download citation

Received: 25 May 2021
Revised: 25 November 2021
Accepted: 26 November 2021
Published: 17 January 2022
Issue Date: May 2022
DOI: https://doi.org/10.1007/s00034-021-01923-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data-Adaptive Single-Pole Filtering of Magnitude Spectra for Robust Keyword Spotting

Abstract

Access this article

Similar content being viewed by others

An approach for reducing pitch induced mismatches to detect keywords in children’s speech

A Pitch and Noise Robust Keyword Spotting System Using SMAC Features with Prosody Modification

Addressing Effects of Formant Dispersion and Pitch Sensitivity for the Development of Children’s KWS System

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Data-Adaptive Single-Pole Filtering of Magnitude Spectra for Robust Keyword Spotting

Abstract

Access this article

Similar content being viewed by others

An approach for reducing pitch induced mismatches to detect keywords in children’s speech

A Pitch and Noise Robust Keyword Spotting System Using SMAC Features with Prosody Modification

Addressing Effects of Formant Dispersion and Pitch Sensitivity for the Development of Children’s KWS System

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation