Abstract
To account for the strong non-stationarity of voiced speech and its nonlinear aero-acoustic origin, the classical source-filter model is extended to a cascaded drive-response model with a conventional linear secondary response, a synchronized and/or synchronously modulated primary response and a non-stationary fundamental drive which plays the role of the long time-scale part of the basic time-scale separation of acoustic perception. The transmission proto col of voiced speech is assumed to be based on non-stationary acoustic objects which can be synthesized as the described secondary response and which are analysed by introducing a self-consistent (filter stable) part-tone decom position, suited to reconstruct the hidden funda mental drive and to confirm its topo logical equivalence to a glottal master oscillator. The filter-stable part-tone decomposition opens the option of a phase modulation trans mission protocol of voiced speech. Aiming at communi cation channel invariant acoustic features of voiced speech, the phase modulation cues are expected to be particularly suited to extend and/or replace the classical feature vectors of phoneme and speaker recognition.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Gold, B., Morgan, N.: Speech and audio signal processing. John Wiley & Sons, Chichester (2000)
Moore, B.C.J.: An introduction to the psychology of hearing. Academic Press, London (1989)
Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals. Prentice Hall, NJ, Englewood Cliffs (1978)
Kantz, H., Schreiber, T.: Nonlinear time series analysis. Cambridge Univ. Press, Cambridge (1997)
Herzel, H., Berry, D., Titze, I.R., Saleh, M.: Analysis of vocal disorders with methods from nonlinear dynamics. J. Speech Hear. Res. 37, 1008–1019 (1994)
Teager, H.M., Teager, S.M.: Evidence for nonlinear sound production in the vocal tract. In: Proc NATO ASI on Speech Production and Speech Modelling, pp. 241–261 (1990)
Jackson, P.J.B., Shadle, C.H.: Pitch scaled estimation of simultaneous voiced and turbulent-noise components in speech. IEEE trans. speech audio process 9, 713–726 (2001)
Schoentgen, J.: Stochastic models of jitter. J. Acoust. Soc. Am. 109(4), 1631–1650 (2001)
Grice, M.: Intonation. In: Brown, K. (ed.) Encyclopedia of Language and Linguistics, vol. 5, Elsevier, Oxford (2006)
Drepper, F.R.: A two-level drive-response model of non-stationary speech signals. In: Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (eds.) NOLISP 2005. LNCS (LNAI), vol. 3817, pp. 125–138. Springer, Heidelberg (2006)
Drepper, F.R.: Voiced excitation as entrained primary response of a reconstructed glottal master oscillator. In: Interspeech 2005, Lisboa, pp. 329–332 (2005)
Drepper, F.R.: Fortschritte der Akustik-DAGA 2006 (2006)
Drepper, F.R.: Voiced speech as response of a self-consistent fundamental drive. Speech Comm. 49, 186–200 (2007)
Rulkov, N.F., Sushchik, M.M., Tsimring, L.S., Abarbanel, H.D.I.: Generalized synchronization of chaos in directionally coupled systems. Phys. Rev. E 51, 980–994 (1995)
Afraimovich, V.S., Verichev, N.N., Rabinovich, M.I.: Stochastic synchronization of oscillation in dissipative systems. Radiophys. Quantum Electron. 29, 795 (1986)
Rameau, J.-P.: Generation harmonique. In: Jacobi, E. (ed.) Complete Theoretical Writings, vol. 3, American Institute of Musicology (1967)
Seebeck, A.: Über die Definition des Tones. Poggendorf’s Annalen der Physik und Chemie LXIII, 353–368 (1844)
Terhardt, E., Stoll, G., Seewann, M.: Algorithm for extraction of pitch and pitch salience from complex tonal signals. J. Acoust. Soc. Am. 71, 679–688 (1982)
Goldstein, J.: An optimum processor theory for the central formation of the pitch of complex tones. J. Acoust. Soc. Am. 54, 1496–1516 (1973)
Paliwal, K.K., Atal, B.S.: Frequency-related representation of speech. In: Eurospeech 2003, Genf (2003)
Kawahara, H., Katayose, H., de Cheveigné, A., Patterson, R.: Fixed point analysis of frequency to instantaneous frequency mapping. EuroSpeech 99, 2781–2784 (1999)
McAulay, R., Quatieri, T.: Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans. Acoust. Speech a. Signal Proc. ASSP 34(4), 744–754 (1986)
Heinbach, W.: Aurally adequate signal representation: The part-tone-time-pattern. Acustica 67, 113–121 (1988)
Patterson, R.D.: Auditory images: How complex sounds are represented in the auditory system. J. Acoust. Soc. Jpn (E) 21, 4 (2000)
Hohmann, V.: Frequency analysis and synthesis using a gammatone filterbank. Acta Acustica 10, 433–442 (2002)
Gabor, D.: Acoustic quanta and the theory of hearing. Nature 159, 591–594 (1947)
Huang, N.E., Shen, Z., Long, S.R., Wu, M.C., Shih, H., Zheng, Q., Yen, N.-C., Tung, C.C., Liu, H.H.: The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. Lond. A 454, 903–995 (1998)
Bouzid, A., Ellouze, N.: Voiced Speech Analysis by Empirical Mode Decomposition. In: Chetouani, M., Hussain, A., Gas, B., Milgram, M., Zarader, J.-L. (eds.) NOLISP 2005. LNCS, vol. 4885, pp. 213–220. Springer, Heidelberg (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Drepper, F.R. (2007). Non-stationary Self-consistent Acoustic Objects as Atoms of Voiced Speech. In: Chetouani, M., Hussain, A., Gas, B., Milgram, M., Zarader, JL. (eds) Advances in Nonlinear Speech Processing. NOLISP 2007. Lecture Notes in Computer Science(), vol 4885. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77347-4_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-77347-4_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77346-7
Online ISBN: 978-3-540-77347-4
eBook Packages: Computer ScienceComputer Science (R0)