Skip to main content
Log in

Interpolation of pitch contour using temporal decomposition

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

A new method for predicting pitch contour of a speech signal using a small number of pitch values is addressed, for the application of very low rate speech coding, relying on the correlation between phonetic evolution and pitch variations during voiced speech segments. To track the phonetic evolution and specify perceptually significant time points, Temporal Decomposition (TD) is used. TD provides information required for both determination of critical pitch values and estimation of pitch contour by detecting event functions, as interpolation paths, and their centroids, as the most steady points, in the spectral parameters space. It is shown that the proposed method reduces the amount of pitch information to about one-tenth of that in conventional frame-by-frame based techniques with less than 5% error in pitch approximation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ahlbom, G., Bimbot, F., and Chollet, G. (1987). Modeling spectral speech transitions using temporal decomposition techniques.Proc. ICASSP'87, pp. 13–16.

  • Atal, B.S. (1983). Efficient coding of LPC parameters by temporal decomposition.Proc. ICASSP'83, pp. 81–84.

  • Bimbot, F. and Atal, B.S. (1991). An evaluation of temporal decomposition.Proc. EUROSPEECH'91, pp. 1089–1092.

  • Blumstein, S.E. and Stevens, K.N. (1979). Acoustic invariance in speech production: Evidence from measurements of the spectral characteristics of stop consonants.J. Acoust. Soc. Am., 66(4): 1001–1017.

    Google Scholar 

  • Campbell, J.P., Jr. and Tremain, T.E. (1986). Voiced/unvoiced classification of speech with application to the U.S. government LPC-10E algorithm.Proc. ICASSP'86, pp. 473–476.

  • Childers, D.G. and Wu, K. (1990). Quality of speech produced by analysis-synthesis.Speech Comm., 9:97–117.

    Google Scholar 

  • Chung, J.H. and Schafer, R.W. (1990). Excitation modeling in a homomorphic vocoder.Proc. ICASSP'90, vol. 2, pp. 25–28.

    Google Scholar 

  • Ghaemmaghami, S. and Deriche, M. (1996). A new approach to very low-rate speech coding using temporal decomposition.Proc. ICASSP'96, vol. 1, pp. 224–227.

    Google Scholar 

  • Ghaemmaghami, S., Deriche, M., and Boashash, B. (1997a). Comparative study of different parameters for temporal decomposition based speech coding.Proc. ICASSP'97, vol. 3, pp. 1703–1706.

    Google Scholar 

  • Ghaemmaghami, S., Deriche, M., and Boashash, B. (1997b). On modeling event functions in temporal decomposition based speech coding.EUROSPEECH'97, vol. 3, pp. 1299–1302.

    Google Scholar 

  • Golub, G.H. and Van Loan, C.F. (1983).Matrix Computation. North Oxford Academic.

  • Gong, Y. and Haton, J. (1987). Time domain harmonic matching pitch estimation using time dependent speech modeling.IEEE Trans. ASSP, ASSP-35(10): 1386–1400.

    Google Scholar 

  • Harris, M.S. and Umeda, N. (1987). Difference limens for fundamental frequency contours in sentences.J. Acoust. Soc. Am., 81(4): 1139–1145.

    Google Scholar 

  • Hess, W.J. (1983).Pitch Determination of Speech Signals: Algorithms and Devices. Springer-Verlag.

  • Kleijn, W.B. and Haagen, J. (1995). A speech coder based on decomposition of characteristic waveforms.Proc. ICASSP'95, vol. 1, pp. 508–511.

    Google Scholar 

  • Knagenhjelm, H.P.W. and Kleijn, B. (1995). Spectral dynamics is more important than spectral distortion.Proc. ICASSP'95, vol. 1, pp. 732–735.

    Google Scholar 

  • Mouy, B., De La Noue, P., and Goudezeune, G. (1995). NATO STANAG 4479: A standard for an 800 BPS vocoder and channel coding in HF-ECCM system.Proc. ICASSP'95, vol. 1, pp. 480–483.

    Google Scholar 

  • O'Shaughnessy, D. (1987).Speech Communication: Human and Machine. Addison-Wesley Pub. Co.

  • Rabiner, L.R., Cheng, M.J., Rosenberg, A.E., and McGonegal, C.A. (1976). A comparative performance study of several pitch detection algorithms.IEEE Trans. ASSP, ASSP-24(5):399–418.

    Google Scholar 

  • Roucos, S., Schwartz, R., and Makhoul, J. (1983). A segment vocoder at 150 bits/s.Proc. ICASSP'83, pp. 61–64.

  • Schwartz, R.M. and Roucos, S.E. (1983). A comparison of methods for 300–400 bits/s vocoders.Proc. ICASSP'83, pp. 69–72.

  • Sekey, A. and Hanson, B.A. (1984). Improved 1-bark bandwidth auditory filter.J. Acoust. Soc. Am., 75(6): 1902–1904.

    Google Scholar 

  • Shiraki, Y. and Honda, M. (1988). LPC speech coding based on variable-length segment quantization.IEEE Trans. ASSP, ASSP-36:1437–1444.

    Google Scholar 

  • Taori, R., Sluijter, and Kathmann, E. (1995). Speech compression using pitch synchronous interpolation.Proc. ICASSP'95, vol. 1, pp. 512–515.

    Google Scholar 

  • Van Dijk-Kappers, A.M.L. (1989). Comparison of parameter sets for temporal decomposition.Speech Comm., 8(3):204–220.

    Google Scholar 

  • Wilgus, A.M. and Barnwell, T.P. (1983). Data rate reduction of gain and pitch parameters in an LPC vocoder.Proc. ICASSP'83, pp. 77–80.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghaemmaghami, S., Deriche, M. & Boashash, B. Interpolation of pitch contour using temporal decomposition. Int J Speech Technol 2, 215–225 (1998). https://doi.org/10.1007/BF02111209

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02111209

Keywords

Navigation