Detection of time varying pitch in tonal languages: an approach based on ensemble empirical mode decomposition

Hong, Hong; Zhu, Xiao-hua; Su, Wei-min; Geng, Run-tong; Wang, Xin-long

doi:10.1631/jzus.C1100092

Detection of time varying pitch in tonal languages: an approach based on ensemble empirical mode decomposition

Published: 27 January 2012

Volume 13, pages 139–145, (2012)
Cite this article

Journal of Zhejiang University SCIENCE C Aims and scope Submit manuscript

Hong Hong¹,
Xiao-hua Zhu¹,
Wei-min Su¹,
Run-tong Geng¹ &
…
Xin-long Wang²

125 Accesses
2 Citations
Explore all metrics

Abstract

A method based on ensemble empirical mode decomposition (EEMD) is proposed for accurately detecting the time varying pitch of speech in tonal languages. Unlike frame-, event-, or subspace-based pitch detectors, the time varying information of pitch within the short duration, which is of crucial importance in speech processing of tonal languages, can be accurately extracted. The Chinese Linguistic Data Consortium (CLDC) database for Mandarin Chinese was employed as standard speech data for the evaluation of the effectiveness of the method. It is shown that the proposed method provides more accurate and reliable results, particularly in estimating the tones of non-monotonically varying pitches like the third one in Mandarin Chinese. Also, it is shown that the new method has strong resistance to noise disturbance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Pitch Detection Algorithm Based on Instantaneous Frequency for Clean and Noisy Speech

Article 25 June 2022

A corroborative study on improving pitch determination by time–frequency cepstrum decomposition using wavelets

Article Open access 06 May 2016

Multiple Pitch Estimation Based on Modified Harmonic Product Spectrum

References

Ananthapadmanabha, T., Yegnanarayana, B., 1975. Epoch extraction of voiced speech. IEEE Trans. Acoust. Speech Signal Process., 23(6):562–570. [doi:10.1109/TASSP.1975.1162745]
Article Google Scholar
Bekara, M., Baan, M.V.D., 2009. Random and coherent noise attenuation by empirical mode decomposition. Geophysics, 74(5):89–98. [doi:10.1190/1.3157244]
Article Google Scholar
Boersma, P., 2002. Praat, a system for doing phonetics by computer. Glot Int., 5:341–345.
Google Scholar
Chan, K.W., So, H.C., 2004. Accurate frequency estimation for real harmonic sinusoids. IEEE Signal Process. Lett., 11(7):609–612. [doi:10.1109/LSP.2004.830115]
Article Google Scholar
Chang, E., Zhou, J., Di, S., Huang, C., Lee, K., 2000. Large Vocabulary Mandarin Speech Recognition with Different Approaches in Modeling Tones. Proc. Int. Conf. on Spoken Language Processing, p.983–986.
Cheng, Y.M., O’shaughnessy, D., 1989. Automatic and reliable estimation of glottal closure instant and period. IEEE Trans. Acoust. Speech Signal Process., 37(12):1805–1815. [doi:10.1109/29.45529]
Article Google Scholar
Christensen, M.G., Jakobsson, A., Jensen, S.H., 2007. Joint high-resolution fundamental frequency and order estimation. IEEE Trans. Audio Speech Lang. Process., 15(5):1635–1644. [doi:10.1109/TASL.2007.899267]
Article Google Scholar
Christensen, M.G., Stoica, P., Jakobsson, A., Jensen, S.H., 2008. Multi-pitch estimation. Signal Process., 88(4):972–983. [doi:10.1016/j.sigpro.2007.10.014]
Article MATH Google Scholar
de Cheveigne, A., Kawahara, H., 2002. YIN, a fundamental frequency estimator for speech and music. J. Acoust. Soc. Am., 111(4):1917–1930. [doi:10.1121/1.1458024]
Article Google Scholar
Deller, J., Proakis, J., Hanson, J., 1993. Discrete-Time Processing of Speech Signals. Prentice Hall, Englewood Cliffs, NJ, USA.
Google Scholar
Goska, A., Krawiecki, A., 2006. Analysis of phase synchronization of coupled chaotic oscillators with empirical mode decomposition. Phys. Rev. E, 74(4):046217. [doi:10.1103/PhysRevE.74.046217]
Article Google Scholar
Hong, H., Wang, X.L., Tao, Z.Y., 2009. Local integral mean-based sifting for empirical mode decomposition. IEEE Signal Process. Lett., 16(10):841–844. [doi:10.1109/LSP.2009.2025925]
Article Google Scholar
Huang, H., Pan, J., 2006. Speech pitch determination based on Hilbert-Huang transform. Signal Process., 86(4):792–803. [doi:10.1016/j.sigpro.2005.06.011]
Article MATH Google Scholar
Huang, N.E., Wu, Z., 2007. An adaptive data analysis method for nonlinear and nonstationary time series: the empirical mode decomposition and Hilbert spectral analysis. Wavel. Anal. Appl., 1(4):363–376. [doi:10.1007/978-3-7643-7778-6_25]
Article Google Scholar
Huang, N.E., Shen, Z., Long, S.R., Wu, M.C., Shih, H.H., Zheng, Q., Yen, N.C., Tung, C.C., Liu, H.H., 1998. The empirical mode decomposition and the Hilbert spectrum for nonlinear non-stationary time series analysis. Proc. R. Soc. Lond. A, 454:903–995. [doi:10.1098/rspa.1998.0193]
Article MathSciNet MATH Google Scholar
Huang, N.E., Shen, Z., Long, S., 1999. A new view of nonlinear water waves: the Hilbert spectrum. Ann. Rev. Fluid Mech., 31(1):417–459. [doi:10.1146/annurev.fluid.31.1.417]
Article MathSciNet Google Scholar
Huang, N.E., Chern, C.C., Huang, K., Salvino, L.W., Long, S.R., Fan, K.L., 2001. A new spectral representation of earthquake data: Hilbert spectral analysis of Station TCU129, Chi-Chi, Taiwan, 21 September 1999. Bull. Seismol. Soc. Am., 91(5):1310–1338. [doi:10.1785/0120000735]
Article Google Scholar
Jánosi, I.M., Müller, R., 2005. Empirical mode decomposition and correlation properties of long daily ozone records. Phys. Rev. E, 71(5):056126. [doi:10.1103/PhysRevE.71.056126]
Article Google Scholar
Kadambe, S., Boudreaux-Bartels, G.F., 1992. Application of the wavelet transform for pitch detection of speech signals. IEEE Trans. Inf. Theory, 38(2):917–924. [doi:10.1109/18.119752]
Article Google Scholar
Lei, Y.G., He, Z.J., Zi, Y.Y., 2009. Application of the EEMD method to rotor fault diagnosis of rotating machinery. Mech. Syst. Signal Process., 23(4):1327–1338. [doi:10.1016/j.ymssp.2008.11.005]
Article Google Scholar
Li, H.B., Stoica, P., Li, J., 2000. Computationally efficient parameter estimation for harmonic sinusoidal signals. Signal Process., 80(9):1937–1944. [doi:10.1016/S0165-1684(00)00103-1]
Article MathSciNet MATH Google Scholar
Liang, H., Lin, Z., McCallum, R.W., 2000. Artifact reduction in electrogastrograms based on the empirical mode decomposition. Med. Biol. Eng. Comput., 38(1):35–41. [doi:10.1007/BF02344686]
Article Google Scholar
Lin, S.L., Tung, P.C., Huang, N.E., 2009. Data analysis using a combination of independent component analysis and empirical mode decomposition. Phys. Rev. E, 79(6):066705. [doi:10.1103/PhysRevE.79.066705]
Article Google Scholar
Noll, A.M., 1967. Cepstrum pitch determination. J. Acoust. Soc. Am., 41(2):293–309. [doi:10.1121/1.1910339]
Article MathSciNet Google Scholar
Pai, P.F., Palazotto, A.N., 2008. Detection and identification of nonlinearities by amplitude and frequency modulation analysis. Mech. Syst. Signal Process., 22(5):1107–1132. [doi:10.1016/j.ymssp.2007.11.006]
Article Google Scholar
Qi, K., He, Z.J., Zi, Y.Y., 2007. Cosine window-based boundary processing method for EMD and its application in rubbing fault diagnosis application in rubbing fault diagnosis. Mech. Syst. Signal Process., 21(7):2750–2760. [doi:10.1016/j.ymssp.2007.04.007]
Article Google Scholar
Resch, B., Nilsson, M., Ekman, A., Kleijn, W.B., 2007. Estimation of the instantaneous pitch of speech. IEEE Trans. Audio Speech Lang. Process., 15(3):813–822. [doi:10.1109/TASL.2006.885242]
Article Google Scholar
Schlurmann, T., Dose, T., Schimmels, S., 2001. Characteristic Modes of the ‘Adreanov Tsunami’ Based on the Hilbert-Huang Transformation. Proc. 4th Int. Symp. on Ocean Wave Measurement and Analysis, 2:1525–1534. [doi:10.1061/40604(273)154]
Google Scholar
Talkin, D., 1995. A robust algorithm for pitch tracking (RAPT). Speech Cod. Synth., 14:495–518.
Google Scholar
Wang, C., Seneff, S., 1998. A Study of Tones and Tempo in Continuous Mandarin Digit Strings and Their Application in Telephone Quality Speech Recognition. Proc. Int. Conf. on Spoken Language Processing, p.635–638.
Wu, Z., Huang, N.E., 2009. Ensemble empirical mode decomposition: a noise-assisted data analysis method. Adv. Adapt. Data Anal., 1(1):1–41. [doi:10.1142/S1793536909000047]
Article Google Scholar
Xu, G.L., Wang, X.T., Xu, X.G., 2009. Time-varying frequency-shifting signal-assisted empirical mode decomposition method for AM-FM signals. Mech. Syst. Signal Process., 23(8):2458–2469. [doi:10.1016/j.ymssp.2009.06.006]
Article Google Scholar
Zhang, J.X., Christensen, M.G., Jensen, S.H., Moonen, M., 2010. A robust and computationally efficient subspace-based fundamental frequency estimator. IEEE Trans. Audio Speech Lang. Process., 18(3):487–497. [doi:10.1109/TASL.2010.2040786]
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Electronic Engineering and Optoelectronic Techniques, Nanjing University of Science and Technology, Nanjing, 210094, China
Hong Hong, Xiao-hua Zhu, Wei-min Su & Run-tong Geng
State Key Laboratory of Modern Acoustics, Institute of Acoustics, Nanjing University, Nanjing, 210093, China
Xin-long Wang

Authors

Hong Hong
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-hua Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Wei-min Su
View author publications
You can also search for this author in PubMed Google Scholar
Run-tong Geng
View author publications
You can also search for this author in PubMed Google Scholar
Xin-long Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong Hong.

Additional information

Project supported by the National Natural Science Foundation of China (No. 10574070) and the State Key Laboratory Foundation of China (No. 9140C240207060C24)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hong, H., Zhu, Xh., Su, Wm. et al. Detection of time varying pitch in tonal languages: an approach based on ensemble empirical mode decomposition. J. Zhejiang Univ. - Sci. C 13, 139–145 (2012). https://doi.org/10.1631/jzus.C1100092

Download citation

Received: 13 April 2011
Accepted: 09 August 2011
Published: 27 January 2012
Issue Date: February 2012
DOI: https://doi.org/10.1631/jzus.C1100092

Key words

CLC number

TN912.3

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detection of time varying pitch in tonal languages: an approach based on ensemble empirical mode decomposition

Abstract

Access this article

Similar content being viewed by others

A Novel Pitch Detection Algorithm Based on Instantaneous Frequency for Clean and Noisy Speech

A corroborative study on improving pitch determination by time–frequency cepstrum decomposition using wavelets

Multiple Pitch Estimation Based on Modified Harmonic Product Spectrum

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

CLC number

Navigation

Detection of time varying pitch in tonal languages: an approach based on ensemble empirical mode decomposition

Abstract

Access this article

Similar content being viewed by others

A Novel Pitch Detection Algorithm Based on Instantaneous Frequency for Clean and Noisy Speech

A corroborative study on improving pitch determination by time–frequency cepstrum decomposition using wavelets

Multiple Pitch Estimation Based on Modified Harmonic Product Spectrum

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Search

Navigation