Abstract
This paper introduces a robust voiced/non-voiced (VnV) speech classification method using bivariate empirical mode decomposition (bEMD). Fractional Gaussian noise (fGn) is employed as the reference signal to derive a data adaptive threshold for VnV discrimination. The analyzing speech signal and fGn are combined to generate a complex signal which is decomposed into a finite number of complex-valued intrinsic mode functions (IMFs) by using bEMD. The real and imaginary parts of the IMFs represent the IMFs of observed speech and fGn, respectively. The log-energies of both types of IMFs are calculated. There exist similarities between the IMF log-energy representation of fGn and unvoiced speech signals. Hence, the upper confidence limit from IMF log-energies of fGn is used as data adaptive threshold for VnV classification. If the subband log-energy of speech segment exceeds the threshold, the segment is classified as voiced and unvoiced otherwise. The experimental results show that the proposed algorithm performs better than the recently reported methods without requiring any training data for a wide range of SNRs.





References
Dhananjaya N, Yegnanarayana B (2010) Voiced/unvoiced detection based on robustness of voiced epochs. IEEE Signal Process Lett 17(3):273–276
Arifianto D (2007) Dual parameters for voiced–unvoiced speech signal determination. IEEE ICASSP, May 2007, pp 749–752
Shah JK et al (2004) Robust voiced/unvoiced classification using novel features and Gaussian mixture model. In: Proceedings of IEEE ICASSP, May 17–21, Canada
Alkilaibi A, Soraghan JJ, Durrani TS (1996) Fast HOS based simultaneous voiced/unvoiced detection and pitch estimation using 3-level binary speech signals. In: Proceedings of IEEE Workshop on Statistical Signal and Array Processing, pp 194–197
Radmard M, Hadavi M, Nayebi MM (2011) A new method of voiced/unvoiced classification based on clustering. J Signal Inf Process 2:336–347
Pattanaburi K, Onshaunjit J, Srinonchat J (2012) Enhancement pattern analysis technique for voiced/unvoiced classification. In: International Symposium on Computer, Consumer and Control, pp 389–392
Faycal Y, Bensbti M (2014) Comparative performance study of several features for voiced/non-voiced classification. Int Arab J Inf Technol 11(3):293–299
Huang NE et al (1988) The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc R Soc London A 454(1972):903–995
Molla MKI, Hirose K, Roy SK, Ahmad S (2010) Adaptive thresholding approach for robust voiced/unvoiced classification. In: Proceedings of the International of Symposium on Circuits and Systems, pp 2409–2412
Rilling G, Flandrin P, Goncalves P, Lilly J (2007) Bivariate empirical mode decomposition. IEEE Signal Process Lett 14(12):936–939
Hamid ME, Molla MKI, Dang X, Nakai T (2013) Single channel speech enhancement using adaptive soft-thresholding with bivariate EMD. ISRN Signal Process 2013:1–8
Vaseghi SV (2007) Multimedia signal processing: theory and applications in speech, music and communications. Wiley, New York
Molla MKI, Hirose K, Minematsu N, Hasan MK (2007) Pitch estimation of noisy speech signals using empirical mode decomposition. In: Proceedings of INTERSPEECH 2007, 8th annual conference of the international speech communication association, pp 1645–1648
Qian H (2003) Fractional Brownian motion and fractional Gaussian noise. Lect Notes Phys 621:22–33
Koutsoyiannis Demetris (2002) The Hurst phenomenon and fractional Gaussian noise made easy. Hydrol Sci J 47(4):573–596
Rilling G, Flandrin P, Goncalves P (2004) Detrending and denoising with empirical mode decomposition. In: Proc EUSIPCO2004
Flandrin P, Rilling G, Goncalves P (2004) Empirical mode decomposition as a filter bank. IEEE Signal Process Lett 11(2):112–114
Wu Z, Huang NE (2004) A study of the characteristics of white noise using the empirical mode decomposition method. Proc R Soc London A 460(2046):1597–1611
Boersma P (2001) Praat, a system for doing phonetics by computer. Glot Int 5(9):341–345 (http://www.praat.org)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Molla, M.K.I., Hirose, K. & Hasan, M.K. Voiced/non-voiced speech classification using adaptive thresholding with bivariate EMD. Pattern Anal Applic 19, 139–144 (2016). https://doi.org/10.1007/s10044-015-0449-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-015-0449-3