Abstract
Multipitch tracking is beneficial for speech separation, audio transcription and many other tasks. In this paper, we greatly improve a state-of-the-art multipitch tracking algorithm. While the amplitude and individual peak positions of autocorrelation function (ACF) were used in previous algorithms, a novel feature based on the average frequency of each time-frequency (T-F) unit is proposed in this paper. This feature is computed by an empirical mode decomposition (EMD) method. Then it is utilized to form the conditional probabilities in the hidden Markov model (HMM) given a pitch state of each frame, and finally the most likely state sequence is searched out. Quantitative evaluations show that the novel feature is more effective, and our algorithm significantly outperforms the previous one.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cheveigné, A., Kawahara, H.: Multiple period estimation and pitch perception model. Speech Commun. 27, 175–185 (1999)
Wu, M.Y., Wang, D.L., Brown, G.J.: A multipitch tracking algorithm for noisy speech. IEEE Trans. Speech and Audio Processing 11, 229–241 (2003)
Klapuri, A.: Multiple fundamental frequency estimation by summing harmonic amplitudes. In: Proc. Int. Conf. Music Inf. Retrieval (ISMIR), pp. 216–221 (2006)
Jin, Z.Z., Wang, D.L.: HMM-based multipitch tracking for noisy and reverberant speech. IEEE Trans. Audio, Speech, Lang. Process. 19, 1091–1102 (2011)
Schnupp, J., Nelken, I., King, A.: Auditory Neuroscience: Making Sense of Sound, pp. 128–129. MIT Press, Cambridge (2011)
Meddis, R.: Simulation of auditory-neural transduction: Further studies. J. Acoust. Soc. Amer. 83, 1056–1063 (1988)
Slaney, M., Lyon, R.F.: On the importance of time a temporal representation of sound. In: Visual Representations of Speech Signals, pp. 95–116. Wiley, New York (1993)
Huang, N.E., Shen, Z., Long, S.R., Wu, M.L., Shih, H.H., Zheng, Q., Yen, N.C., Tung, C.C., Liu, H.H.: The empirical mode decomposition and Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. Roy. Soc. London A 545, 903–995 (1998)
Cooke, M.P.: Modeling Auditory Processing and Organization. Cambridge University, U.K (1993)
Zwicker, E.: Psychoacoustics. Springer, New York (1982)
Liu, W.J., Zhang, X.L., Jiang, W., et al.: Monaural voiced speech segregation based on elaborate harmonic grouping strategies. Sci. China. Inf. Sci., 2471–2480 (2011)
Boersma, P., Weenink, D.: Praat: Doing Phonetics by Computer (2004), http://www.praat.org
Cheveigné, A.: Separation of concurrent harmonic sounds: Fundamental frequency estimation and a time-domain cancellation model of auditory processing. J. Acoust. Soc. Am., 3271–3290 (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jiang, W., Liu, WJ., Tan, YW., Liang, S. (2014). An Improved Multipitch Tracking Algorithm with Empirical Mode Decomposition. In: Li, S., Liu, C., Wang, Y. (eds) Pattern Recognition. CCPR 2014. Communications in Computer and Information Science, vol 484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45643-9_22
Download citation
DOI: https://doi.org/10.1007/978-3-662-45643-9_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45642-2
Online ISBN: 978-3-662-45643-9
eBook Packages: Computer ScienceComputer Science (R0)