Abstract
A new method for predicting pitch contour of a speech signal using a small number of pitch values is addressed, for the application of very low rate speech coding, relying on the correlation between phonetic evolution and pitch variations during voiced speech segments. To track the phonetic evolution and specify perceptually significant time points, Temporal Decomposition (TD) is used. TD provides information required for both determination of critical pitch values and estimation of pitch contour by detecting event functions, as interpolation paths, and their centroids, as the most steady points, in the spectral parameters space. It is shown that the proposed method reduces the amount of pitch information to about one-tenth of that in conventional frame-by-frame based techniques with less than 5% error in pitch approximation.
Similar content being viewed by others
References
Ahlbom, G., Bimbot, F., and Chollet, G. (1987). Modeling spectral speech transitions using temporal decomposition techniques.Proc. ICASSP'87, pp. 13–16.
Atal, B.S. (1983). Efficient coding of LPC parameters by temporal decomposition.Proc. ICASSP'83, pp. 81–84.
Bimbot, F. and Atal, B.S. (1991). An evaluation of temporal decomposition.Proc. EUROSPEECH'91, pp. 1089–1092.
Blumstein, S.E. and Stevens, K.N. (1979). Acoustic invariance in speech production: Evidence from measurements of the spectral characteristics of stop consonants.J. Acoust. Soc. Am., 66(4): 1001–1017.
Campbell, J.P., Jr. and Tremain, T.E. (1986). Voiced/unvoiced classification of speech with application to the U.S. government LPC-10E algorithm.Proc. ICASSP'86, pp. 473–476.
Childers, D.G. and Wu, K. (1990). Quality of speech produced by analysis-synthesis.Speech Comm., 9:97–117.
Chung, J.H. and Schafer, R.W. (1990). Excitation modeling in a homomorphic vocoder.Proc. ICASSP'90, vol. 2, pp. 25–28.
Ghaemmaghami, S. and Deriche, M. (1996). A new approach to very low-rate speech coding using temporal decomposition.Proc. ICASSP'96, vol. 1, pp. 224–227.
Ghaemmaghami, S., Deriche, M., and Boashash, B. (1997a). Comparative study of different parameters for temporal decomposition based speech coding.Proc. ICASSP'97, vol. 3, pp. 1703–1706.
Ghaemmaghami, S., Deriche, M., and Boashash, B. (1997b). On modeling event functions in temporal decomposition based speech coding.EUROSPEECH'97, vol. 3, pp. 1299–1302.
Golub, G.H. and Van Loan, C.F. (1983).Matrix Computation. North Oxford Academic.
Gong, Y. and Haton, J. (1987). Time domain harmonic matching pitch estimation using time dependent speech modeling.IEEE Trans. ASSP, ASSP-35(10): 1386–1400.
Harris, M.S. and Umeda, N. (1987). Difference limens for fundamental frequency contours in sentences.J. Acoust. Soc. Am., 81(4): 1139–1145.
Hess, W.J. (1983).Pitch Determination of Speech Signals: Algorithms and Devices. Springer-Verlag.
Kleijn, W.B. and Haagen, J. (1995). A speech coder based on decomposition of characteristic waveforms.Proc. ICASSP'95, vol. 1, pp. 508–511.
Knagenhjelm, H.P.W. and Kleijn, B. (1995). Spectral dynamics is more important than spectral distortion.Proc. ICASSP'95, vol. 1, pp. 732–735.
Mouy, B., De La Noue, P., and Goudezeune, G. (1995). NATO STANAG 4479: A standard for an 800 BPS vocoder and channel coding in HF-ECCM system.Proc. ICASSP'95, vol. 1, pp. 480–483.
O'Shaughnessy, D. (1987).Speech Communication: Human and Machine. Addison-Wesley Pub. Co.
Rabiner, L.R., Cheng, M.J., Rosenberg, A.E., and McGonegal, C.A. (1976). A comparative performance study of several pitch detection algorithms.IEEE Trans. ASSP, ASSP-24(5):399–418.
Roucos, S., Schwartz, R., and Makhoul, J. (1983). A segment vocoder at 150 bits/s.Proc. ICASSP'83, pp. 61–64.
Schwartz, R.M. and Roucos, S.E. (1983). A comparison of methods for 300–400 bits/s vocoders.Proc. ICASSP'83, pp. 69–72.
Sekey, A. and Hanson, B.A. (1984). Improved 1-bark bandwidth auditory filter.J. Acoust. Soc. Am., 75(6): 1902–1904.
Shiraki, Y. and Honda, M. (1988). LPC speech coding based on variable-length segment quantization.IEEE Trans. ASSP, ASSP-36:1437–1444.
Taori, R., Sluijter, and Kathmann, E. (1995). Speech compression using pitch synchronous interpolation.Proc. ICASSP'95, vol. 1, pp. 512–515.
Van Dijk-Kappers, A.M.L. (1989). Comparison of parameter sets for temporal decomposition.Speech Comm., 8(3):204–220.
Wilgus, A.M. and Barnwell, T.P. (1983). Data rate reduction of gain and pitch parameters in an LPC vocoder.Proc. ICASSP'83, pp. 77–80.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Ghaemmaghami, S., Deriche, M. & Boashash, B. Interpolation of pitch contour using temporal decomposition. Int J Speech Technol 2, 215–225 (1998). https://doi.org/10.1007/BF02111209
Received:
Revised:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF02111209