Skip to main content
Log in

Robust analysis for improvement of vowel onset point detection under noisy conditions

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Vowel onset point (VOP) is the instant of time at which the vowel region starts in a speech signal. The VOPs are used as anchor points to design various speech based systems. Different algorithms exist in the literature to identify the occurrences of vowels in continuous spoken utterances. The algorithm based on combined evidences derived from source excitation, spectral peaks and modulation spectrum have been used as a baseline system for the present study. The baseline system provides a satisfactory level of performance under clean data condition. However under noisy data condition the performance of the previous system may be improved further by additional pre-processing of the raw speech data and post-processing the detected VOPs. In this paper we propose to use the speech enhancement techniques as pre-processing module to remove the noise from the speech data under different noisy conditions. The pre-processed speech data is then passed through the baseline system to detect the VOPs. It has been observed that there exist several spurious VOPs at the output of the baseline system. We propose to use a post-processing module based on average signal-to-noise ratio and information derived from the glottal closure instant to remove the spurious VOPs. The experiments were carried out on clean, artificially injected noisy, and data collected from the practical noisy environments. The results suggest that the proposed system using pre-processing and post-processing modules is robust and shows an improvement of 28–35 % over the existing baseline system by removing the spurious VOPs under different noisy conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • Boll, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction, IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(2), 113–120.

    Article  Google Scholar 

  • Ephrain, Y., & Malah, D. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), 1109–1121.

    Article  Google Scholar 

  • Garofolo, J. D. (1993). TIMIT acoustic-phonetic continuous speech corpus linguistic data consortium. Philadelphia, PA: TIMIT.

    Google Scholar 

  • Hermes, D. J. (1990). Vowel onset detection. Journal of the Acoustical Society of America, 87, 866–873.

    Article  Google Scholar 

  • Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech and Language Processing, 16(8), 1602–1613.

    Article  Google Scholar 

  • Prasanna, S. R. M., & Pradhan, G. (2011). Significance of vowel-like regions for speaker verification under degraded condition. IEEE Transactions on Audio, Speech and Language Processing, 19(8), 2552–2565.

    Article  Google Scholar 

  • Prasanna, S. R. M., Reddy, B. V. S., & Krishnamoorthy, P. (2009). Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 556–565.

    Article  Google Scholar 

  • Prasanna, S. R. M. & Yegnanarayana, B. (2005). Detection of vowel onset point events using excitation source information. in Proceeding of the interspeech, (pp. 1133-1136), Lisbon.

  • Prasanna, S. R. M., Zachariah, J. M., & Yegnanarayana, B. (2003). Begin-end detection using vowel onset points (pp. 33–39). Mumbai: Proceedings of Workshop on Spoken Language Processing.

    Google Scholar 

  • Rao, J. Y. S. R. K., Sekhar, C. C. & Yegnanarayana, B. (1999). Neural networks based approach for detection of vowel onset points. In Proceeding of the International Conference Advances in Pattern Recognition and Digital Techniques, (pp. 316–320), Calcutta.

  • Rao, K. S., & Yegnanarayana, B. (2009). “Duration modification using glottal closure instants and vowel onset points, Speech Communication, 15(12), 1263–1269.

    Article  Google Scholar 

  • Sekhar, C. C. (1996). Neural network models for recognition of stop consonant-vowel (SCV) segments in continuous speech. Ph.D. dissertation, Department of Computer Science and Engineering Indian Institute of Technology Madras, Chennai.

  • ‘TIMIT acoustic-phonetic continuous speech corpus. (1990). National Institute of Standards and Technology Gaithersburg, MD, NTIS Order PB91-505065, Speech Disc 1-1.1.

  • Vuppala, A. K., Rao, K. S., Chakrabarti, S., Krishnamoorthy, P., & Prasanna, S. R. M. (2011). Recognition of consonant-vowel (CV) units under background noise using combined temporal and spectral preprocessing. International Journal of Speech Technology, 14(3), 259–272.

    Article  Google Scholar 

  • Wang, J. H., & Chen, S. H. (1999). A C/V segmentation algorithm for Mandarin speech using wavelet transforms. Proceeding of the International Conference on Acoustic, Speech and Signal Processing, 1, 1261–1264.

    Google Scholar 

  • Wang, J. F., Wu, C. H., Chang, S. H., & Lee, J. Y. (1991). A heirarchical neural network based C/V segmentation algorithm for Mandarin speech recognition. IEEE Transactions on Signal Processing, 39(9), 2141–2146.

    Article  Google Scholar 

Download references

Acknowledgments

This work is supported by the project titled “Development of Speech based Multi-Level Person Authentication System”, funded by the Department of Information Technology (DIT), New Delhi, India.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Partha Saha.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saha, P., Baruah, U., Laskar, R.H. et al. Robust analysis for improvement of vowel onset point detection under noisy conditions. Int J Speech Technol 19, 433–448 (2016). https://doi.org/10.1007/s10772-016-9336-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-016-9336-6

Keywords

Navigation