Detection of Fricative Landmarks Using Spectral Weighting: A Temporal Approach

Vydana, Hari Krishna; Vuppala, Anil Kumar

doi:10.1007/s00034-020-01576-7

Detection of Fricative Landmarks Using Spectral Weighting: A Temporal Approach

Published: 01 November 2020

Volume 40, pages 2376–2399, (2021)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

199 Accesses
1 Citation
Explore all metrics

Abstract

Fricatives are characterized by two prime acoustic properties, i.e., having high-frequency spectral concentration and possessing noisy nature. Spectral domain approaches for detecting fricatives employ a time–frequency representation to compute acoustic cues such as band energy ratio, spectral centroid, and dominant resonant frequency. The detection accuracy of these approaches depends on the efficiency of the employed time–frequency representation. An approach that would not require any time–frequency representation for detecting fricatives from speech has been explored in this work. In this study, a time-domain operation is proposed which emphasizes the high-frequency spectral characteristics of fricatives implicitly. The proposed approach aims to scale the spectrum of the speech signal using a scaling function \(k^2\), where k is the discrete frequency. The spectral weighting function used in the proposed approach can be approximated as a cascaded temporal difference operation over speech signal. The emphasized regions in spectrally weighted speech signal are quantified to detect fricative regions. Contrasting the spectral domain approaches, the predictability measure-based approach in literature relies on capturing the noisy nature of fricatives. The proposed approach and the predictability measure-based approaches rely on two complementary properties for detecting fricatives, and a combination of these approaches is put forth in this work. The proposed approach has performed better than the state-of-the-art fricative detectors. To study the significance of the proposed evidence, an early fusion between the proposed evidence and the feature-space maximum log-likelihood transform features is explored for developing speech recognition systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classification of Fricatives Using Novel Modulation Spectrogram Based Features

Estimation of place of articulation of fricatives from spectral features

Article 19 December 2023

Identification of Palatal Fricative Fronting Using Shannon Entropy of Spectrogram

Data Availability

The data that support the findings of this study are available from the corresponding author, Hari Krishna Vydana upon reasonable request.

References

A.M.A. Ali, J.V. der Spiegel, Acoustic-phonetic features for the automatic classification of fricatives. J. Acoust. Soc. Am. 109(5), 2217–2235 (2001)
Article Google Scholar
T. Ananthapadmanabha, A. Prathosh, A. Ramakrishnan, Detection of the closure-burst transitions of stops and affricates in continuous speech using the plosion index. J. Acoust. Soc. Am. 135(1), 460–471 (2014)
Article Google Scholar
T. Ananthapadmanabha, A. Ramakrishnan, P. Balachandran, An interesting property of LPCs for sonorant vs fricative discrimination. arXiv:1411.1267 (2014)
C. Chan, K. Ng, Separation of fricatives from aspirated plosives by means of temporal spectral variation. IEEE Trans. Acoust. Speech Signal Process. 33(5), 1130–1137 (1985)
Article Google Scholar
I.F. Chen, S.M. Siniscalchi, C.H. Lee, Attribute based lattice rescoring in spontaneous speech recognition, in Proceedings of IEEE International Conference on Acoustics, Speech, Signal Processing, pp. 3325–3329 (2014)
C.-Y. Chiang, S.M. Siniscalchi, S.-H. Chen, C.-H. Lee, Knowledge integration for improving performance in LVCSR, in INTERSPEECH, pp. 1786–1790 (2013)
S. Davis, P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Lang. Process. 28(4), 357–366 (1980)
Article Google Scholar
L. Deng, G. Hinton, B. Kingsbury, New types of deep neural network learning for speech recognition and related applications: an overview, in Proceedings of IEEE International Conference on Acoustics, Speech, Signal Processing, pp. 8599–8603 (2013)
N. Dhananjaya, Signal processing for excitation-based analysis of acoustic events in speech. Ph.D. dissertation, Department of Computer Science and Engineering IIT Madras, pp. 129–184 (2011)
J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, DARPA TIMIT acoustic-phonetic continous speech corpus. NASA STI/Recon Technical Report N, vol. 93 (1993)
M. Gautam, Query-by-example spoken term detection on low resource languages. Ph.D. dissertation, IIIT Hyderabad, pp. 128–129 (2011)
A. Graves, N. Jaitly, Towards end-to-end speech recognition with recurrent neural networks. Int. Conf. Mach. Learn. 14, 1764–1772 (2014)
Google Scholar
A. Graves, A.R. Mohamed, G. Hinton, Speech recognition with deep recurrent neural networks, in Proceedings of IEEE International Conference on Acoustics, Speech, Signal Processing, pp. 6645–6649 (2013)
H. Hermansky, Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87(4), 1738–1752 (1990)
Article Google Scholar
G. Hinton, L. Deng, D. Yu, G.E. Dahl, A.R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T.N. Sainath, Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
A. Jansen, P. Niyogi, Modeling the temporal dynamics of distinctive feature landmark detectors for speech recognition. J. Acoust. Soc. Am. 124(3), 1739–1758 (2008)
Article Google Scholar
A. Jongman, R. Wayland, S. Wong, Acoustic characteristics of english fricatives. J. Acoust. Soc. Am. 108(3), 1252–1263 (2000)
Article Google Scholar
A. Juneja, C. Espy-Wilson, A probabilistic framework for landmark detection based on phonetic features for automatic speech recognition. J. Acoust. Soc. Am. 123(2), 1154–1168 (2008)
Article Google Scholar
S. King, P. Taylor, Detection of phonological features in continuous speech using neural networks. Comput. Speech Lang. 14(4), 333–353 (2000)
Article Google Scholar
C.-Y. Lin, H.-C. Wang, Burst onset landmark detection and its application to speech recognition. IEEE Trans. Audio Speech Lang. Process. 19(5), 1253–1264 (2011)
Article Google Scholar
S.A. Liu, Landmark detection for distinctive feature-based speech recognition. J. Acoust. Soc. Am. 100(5), 3417–3430 (1996)
Article Google Scholar
Y. Miao, Kaldi+ PDNN: Building DNN-based ASR systems with Kaldi and PDNN. arXiv:1401.6984 (2014)
Y. Miao, M. Gowayyed, F. Metze, Eesen: End-to-end speech recognition using deep RNN models and WFST-based decoding, in 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). IEEE, pp. 167–174 (2015)
S. Narayanan, A. Alwan, Noise source models for fricative consonants. IEEE Trans. Speech Audio Process. 8(3), 328–344 (2000)
Article Google Scholar
C.H. Shadle, The acoustics of fricative consonants. Ph.D. dissertation, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, pp. 17–18 (1985)
Shakti P Rath, D. Povey, K. Vesely, J. Cernocky, Improved feature processing for deep neural networks, in INTERSPEECH, pp. 109–113 (2013)
S.M. Siniscalchi, T. Svendsen, C.-H. Lee, A bottom-up modular search approach to large vocabulary continuous speech recognition. IEEE Trans. Audio Speech Lang. Process. 21(4), 786–797 (2013)
Article Google Scholar
K.N. Stevens, Toward a model for lexical access based on acoustic landmarks and distinctive features. J. Acoust. Soc. Am. 111(4), 1872–1891 (2002)
Article Google Scholar
R.G. Stockwell, L. Mansinha, R.P. Lowe, Localization of the complex spectrum: the S-transform. IEEE Trans. Signal Process. 44(4), 998–1001 (1996)
Article Google Scholar
H.K. Vydana, A.K. Vuppala, Detection of fricatives using s-transform. J. Acoust. Soc. Am. 140(5), 3896–3907 (2016)
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank MeitY (Ministry of Electronics and Information Technology) for supporting the research under the Visvesvaraya PhD fellowship scheme.

Author information

Authors and Affiliations

Speech Processing Laboratory, LTRC, International Institute of Information Technology, Hyderabad, India
Hari Krishna Vydana & Anil Kumar Vuppala

Authors

Hari Krishna Vydana
View author publications
You can also search for this author in PubMed Google Scholar
Anil Kumar Vuppala
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hari Krishna Vydana.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vydana, H.K., Vuppala, A.K. Detection of Fricative Landmarks Using Spectral Weighting: A Temporal Approach. Circuits Syst Signal Process 40, 2376–2399 (2021). https://doi.org/10.1007/s00034-020-01576-7

Download citation

Received: 01 April 2019
Revised: 14 October 2020
Accepted: 17 October 2020
Published: 01 November 2020
Issue Date: May 2021
DOI: https://doi.org/10.1007/s00034-020-01576-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detection of Fricative Landmarks Using Spectral Weighting: A Temporal Approach

Abstract

Access this article

Similar content being viewed by others

Classification of Fricatives Using Novel Modulation Spectrogram Based Features

Estimation of place of articulation of fricatives from spectral features

Identification of Palatal Fricative Fronting Using Shannon Entropy of Spectrogram

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Detection of Fricative Landmarks Using Spectral Weighting: A Temporal Approach

Abstract

Access this article

Similar content being viewed by others

Classification of Fricatives Using Novel Modulation Spectrogram Based Features

Estimation of place of articulation of fricatives from spectral features

Identification of Palatal Fricative Fronting Using Shannon Entropy of Spectrogram

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation