Skip to main content
Log in

Voiced/non-voiced speech classification using adaptive thresholding with bivariate EMD

  • Short Paper
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

This paper introduces a robust voiced/non-voiced (VnV) speech classification method using bivariate empirical mode decomposition (bEMD). Fractional Gaussian noise (fGn) is employed as the reference signal to derive a data adaptive threshold for VnV discrimination. The analyzing speech signal and fGn are combined to generate a complex signal which is decomposed into a finite number of complex-valued intrinsic mode functions (IMFs) by using bEMD. The real and imaginary parts of the IMFs represent the IMFs of observed speech and fGn, respectively. The log-energies of both types of IMFs are calculated. There exist similarities between the IMF log-energy representation of fGn and unvoiced speech signals. Hence, the upper confidence limit from IMF log-energies of fGn is used as data adaptive threshold for VnV classification. If the subband log-energy of speech segment exceeds the threshold, the segment is classified as voiced and unvoiced otherwise. The experimental results show that the proposed algorithm performs better than the recently reported methods without requiring any training data for a wide range of SNRs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. Dhananjaya N, Yegnanarayana B (2010) Voiced/unvoiced detection based on robustness of voiced epochs. IEEE Signal Process Lett 17(3):273–276

    Article  Google Scholar 

  2. Arifianto D (2007) Dual parameters for voiced–unvoiced speech signal determination. IEEE ICASSP, May 2007, pp 749–752

  3. Shah JK et al (2004) Robust voiced/unvoiced classification using novel features and Gaussian mixture model. In: Proceedings of IEEE ICASSP, May 17–21, Canada

  4. Alkilaibi A, Soraghan JJ, Durrani TS (1996) Fast HOS based simultaneous voiced/unvoiced detection and pitch estimation using 3-level binary speech signals. In: Proceedings of IEEE Workshop on Statistical Signal and Array Processing, pp 194–197

  5. Radmard M, Hadavi M, Nayebi MM (2011) A new method of voiced/unvoiced classification based on clustering. J Signal Inf Process 2:336–347

    Google Scholar 

  6. Pattanaburi K, Onshaunjit J, Srinonchat J (2012) Enhancement pattern analysis technique for voiced/unvoiced classification. In: International Symposium on Computer, Consumer and Control, pp 389–392

  7. Faycal Y, Bensbti M (2014) Comparative performance study of several features for voiced/non-voiced classification. Int Arab J Inf Technol 11(3):293–299

    Google Scholar 

  8. Huang NE et al (1988) The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc R Soc London A 454(1972):903–995

    MathSciNet  MATH  Google Scholar 

  9. Molla MKI, Hirose K, Roy SK, Ahmad S (2010) Adaptive thresholding approach for robust voiced/unvoiced classification. In: Proceedings of the International of Symposium on Circuits and Systems, pp 2409–2412

  10. Rilling G, Flandrin P, Goncalves P, Lilly J (2007) Bivariate empirical mode decomposition. IEEE Signal Process Lett 14(12):936–939

    Article  Google Scholar 

  11. Hamid ME, Molla MKI, Dang X, Nakai T (2013) Single channel speech enhancement using adaptive soft-thresholding with bivariate EMD. ISRN Signal Process 2013:1–8

    Article  Google Scholar 

  12. Vaseghi SV (2007) Multimedia signal processing: theory and applications in speech, music and communications. Wiley, New York

    Book  MATH  Google Scholar 

  13. Molla MKI, Hirose K, Minematsu N, Hasan MK (2007) Pitch estimation of noisy speech signals using empirical mode decomposition. In: Proceedings of INTERSPEECH 2007, 8th annual conference of the international speech communication association, pp 1645–1648

  14. Qian H (2003) Fractional Brownian motion and fractional Gaussian noise. Lect Notes Phys 621:22–33

    Article  Google Scholar 

  15. Koutsoyiannis Demetris (2002) The Hurst phenomenon and fractional Gaussian noise made easy. Hydrol Sci J 47(4):573–596

    Article  Google Scholar 

  16. Rilling G, Flandrin P, Goncalves P (2004) Detrending and denoising with empirical mode decomposition. In: Proc EUSIPCO2004

  17. Flandrin P, Rilling G, Goncalves P (2004) Empirical mode decomposition as a filter bank. IEEE Signal Process Lett 11(2):112–114

    Article  Google Scholar 

  18. Wu Z, Huang NE (2004) A study of the characteristics of white noise using the empirical mode decomposition method. Proc R Soc London A 460(2046):1597–1611

    Article  MATH  Google Scholar 

  19. Boersma P (2001) Praat, a system for doing phonetics by computer. Glot Int 5(9):341–345 (http://www.praat.org)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Md. Khademul Islam Molla.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Molla, M.K.I., Hirose, K. & Hasan, M.K. Voiced/non-voiced speech classification using adaptive thresholding with bivariate EMD. Pattern Anal Applic 19, 139–144 (2016). https://doi.org/10.1007/s10044-015-0449-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-015-0449-3

Keywords

Navigation