Skip to main content
Log in

Detection of Fricative Landmarks Using Spectral Weighting: A Temporal Approach

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

Fricatives are characterized by two prime acoustic properties, i.e., having high-frequency spectral concentration and possessing noisy nature. Spectral domain approaches for detecting fricatives employ a time–frequency representation to compute acoustic cues such as band energy ratio, spectral centroid, and dominant resonant frequency. The detection accuracy of these approaches depends on the efficiency of the employed time–frequency representation. An approach that would not require any time–frequency representation for detecting fricatives from speech has been explored in this work. In this study, a time-domain operation is proposed which emphasizes the high-frequency spectral characteristics of fricatives implicitly. The proposed approach aims to scale the spectrum of the speech signal using a scaling function \(k^2\), where k is the discrete frequency. The spectral weighting function used in the proposed approach can be approximated as a cascaded temporal difference operation over speech signal. The emphasized regions in spectrally weighted speech signal are quantified to detect fricative regions. Contrasting the spectral domain approaches, the predictability measure-based approach in literature relies on capturing the noisy nature of fricatives. The proposed approach and the predictability measure-based approaches rely on two complementary properties for detecting fricatives, and a combination of these approaches is put forth in this work. The proposed approach has performed better than the state-of-the-art fricative detectors. To study the significance of the proposed evidence, an early fusion between the proposed evidence and the feature-space maximum log-likelihood transform features is explored for developing speech recognition systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Data Availability

The data that support the findings of this study are available from the corresponding author, Hari Krishna Vydana upon reasonable request.

References

  1. A.M.A. Ali, J.V. der Spiegel, Acoustic-phonetic features for the automatic classification of fricatives. J. Acoust. Soc. Am. 109(5), 2217–2235 (2001)

    Article  Google Scholar 

  2. T. Ananthapadmanabha, A. Prathosh, A. Ramakrishnan, Detection of the closure-burst transitions of stops and affricates in continuous speech using the plosion index. J. Acoust. Soc. Am. 135(1), 460–471 (2014)

    Article  Google Scholar 

  3. T. Ananthapadmanabha, A. Ramakrishnan, P. Balachandran, An interesting property of LPCs for sonorant vs fricative discrimination. arXiv:1411.1267 (2014)

  4. C. Chan, K. Ng, Separation of fricatives from aspirated plosives by means of temporal spectral variation. IEEE Trans. Acoust. Speech Signal Process. 33(5), 1130–1137 (1985)

    Article  Google Scholar 

  5. I.F. Chen, S.M. Siniscalchi, C.H. Lee, Attribute based lattice rescoring in spontaneous speech recognition, in Proceedings of IEEE International Conference on Acoustics, Speech, Signal Processing, pp. 3325–3329 (2014)

  6. C.-Y. Chiang, S.M. Siniscalchi, S.-H. Chen, C.-H. Lee, Knowledge integration for improving performance in LVCSR, in INTERSPEECH, pp. 1786–1790 (2013)

  7. S. Davis, P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Lang. Process. 28(4), 357–366 (1980)

    Article  Google Scholar 

  8. L. Deng, G. Hinton, B. Kingsbury, New types of deep neural network learning for speech recognition and related applications: an overview, in Proceedings of IEEE International Conference on Acoustics, Speech, Signal Processing, pp. 8599–8603 (2013)

  9. N. Dhananjaya, Signal processing for excitation-based analysis of acoustic events in speech. Ph.D. dissertation, Department of Computer Science and Engineering IIT Madras, pp. 129–184 (2011)

  10. J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, DARPA TIMIT acoustic-phonetic continous speech corpus. NASA STI/Recon Technical Report N, vol. 93 (1993)

  11. M. Gautam, Query-by-example spoken term detection on low resource languages. Ph.D. dissertation, IIIT Hyderabad, pp. 128–129 (2011)

  12. A. Graves, N. Jaitly, Towards end-to-end speech recognition with recurrent neural networks. Int. Conf. Mach. Learn. 14, 1764–1772 (2014)

    Google Scholar 

  13. A. Graves, A.R. Mohamed, G. Hinton, Speech recognition with deep recurrent neural networks, in Proceedings of IEEE International Conference on Acoustics, Speech, Signal Processing, pp. 6645–6649 (2013)

  14. H. Hermansky, Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87(4), 1738–1752 (1990)

    Article  Google Scholar 

  15. G. Hinton, L. Deng, D. Yu, G.E. Dahl, A.R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T.N. Sainath, Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)

    Article  Google Scholar 

  16. A. Jansen, P. Niyogi, Modeling the temporal dynamics of distinctive feature landmark detectors for speech recognition. J. Acoust. Soc. Am. 124(3), 1739–1758 (2008)

    Article  Google Scholar 

  17. A. Jongman, R. Wayland, S. Wong, Acoustic characteristics of english fricatives. J. Acoust. Soc. Am. 108(3), 1252–1263 (2000)

    Article  Google Scholar 

  18. A. Juneja, C. Espy-Wilson, A probabilistic framework for landmark detection based on phonetic features for automatic speech recognition. J. Acoust. Soc. Am. 123(2), 1154–1168 (2008)

    Article  Google Scholar 

  19. S. King, P. Taylor, Detection of phonological features in continuous speech using neural networks. Comput. Speech Lang. 14(4), 333–353 (2000)

    Article  Google Scholar 

  20. C.-Y. Lin, H.-C. Wang, Burst onset landmark detection and its application to speech recognition. IEEE Trans. Audio Speech Lang. Process. 19(5), 1253–1264 (2011)

    Article  Google Scholar 

  21. S.A. Liu, Landmark detection for distinctive feature-based speech recognition. J. Acoust. Soc. Am. 100(5), 3417–3430 (1996)

    Article  Google Scholar 

  22. Y. Miao, Kaldi+ PDNN: Building DNN-based ASR systems with Kaldi and PDNN. arXiv:1401.6984 (2014)

  23. Y. Miao, M. Gowayyed, F. Metze, Eesen: End-to-end speech recognition using deep RNN models and WFST-based decoding, in 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). IEEE, pp. 167–174 (2015)

  24. S. Narayanan, A. Alwan, Noise source models for fricative consonants. IEEE Trans. Speech Audio Process. 8(3), 328–344 (2000)

    Article  Google Scholar 

  25. C.H. Shadle, The acoustics of fricative consonants. Ph.D. dissertation, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, pp. 17–18 (1985)

  26. Shakti P Rath, D. Povey, K. Vesely, J. Cernocky, Improved feature processing for deep neural networks, in INTERSPEECH, pp. 109–113 (2013)

  27. S.M. Siniscalchi, T. Svendsen, C.-H. Lee, A bottom-up modular search approach to large vocabulary continuous speech recognition. IEEE Trans. Audio Speech Lang. Process. 21(4), 786–797 (2013)

    Article  Google Scholar 

  28. K.N. Stevens, Toward a model for lexical access based on acoustic landmarks and distinctive features. J. Acoust. Soc. Am. 111(4), 1872–1891 (2002)

    Article  Google Scholar 

  29. R.G. Stockwell, L. Mansinha, R.P. Lowe, Localization of the complex spectrum: the S-transform. IEEE Trans. Signal Process. 44(4), 998–1001 (1996)

    Article  Google Scholar 

  30. H.K. Vydana, A.K. Vuppala, Detection of fricatives using s-transform. J. Acoust. Soc. Am. 140(5), 3896–3907 (2016)

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank MeitY (Ministry of Electronics and Information Technology) for supporting the research under the Visvesvaraya PhD fellowship scheme.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hari Krishna Vydana.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vydana, H.K., Vuppala, A.K. Detection of Fricative Landmarks Using Spectral Weighting: A Temporal Approach. Circuits Syst Signal Process 40, 2376–2399 (2021). https://doi.org/10.1007/s00034-020-01576-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-020-01576-7

Keywords

Navigation