Skip to main content
Log in

Dual estimation based vocal tract shape computation

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This paper presents a new method for direct estimation of vocal tract shape from the speech signal. The method computes cross-sectional areas of uniform-length cylindrical tubes comprising the vocal tract. Cross-sectional areas are calculated from reflection coefficients at tube junctions whose values depend on the areas of adjoining tubes. A new state space representation of the speech production system has been formulated in which reflection coefficients are parameters. The state space model has been constructed using state equations of the glottal flow signal and vocal tract formulated from Liljencrants–Fant model and concatenated tube model respectively. Dual extended Kalman filtering algorithm has been used for estimation of unknown parameters of the system. The estimated reflection coefficients are then used to compute cross-sectional areas of the vocal tract. The performance of proposed technique has been compared to an existing shape estimation method proposed by Wakita. For both synthesized and natural speech signals, the performance of proposed method has been found to be comparable to the existing one. Nevertheless, the Kalman filter algorithm used in proposed method has provisions to tune measurement noise covariance which can be adjusted based on the noise level in speech. Therefore, the performance of proposed method has been seen to be comparatively more robust to noise than the existing technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Bar-Shalom, Y., Li, X. R., & Kirubarajan, T. (2004). Estimation with applications to tracking and navigation: Theory algorithms and software. New York: Wiley.

    Google Scholar 

  • Fant, G., Liljencrants, J., & Lin, Q. (1985). A four-parameter model of glottal flow. STL-QPSR, 4(1985), 1–13.

    Google Scholar 

  • Haykin, S. S., et al. (2001). Kalman filtering and neural networks. Hoboken: Wiley.

    Book  Google Scholar 

  • Hu, Y. (2007). Subjective evaluation and comparison of speech enhancement algorithms. Speech Communication, 49, 588–601.

    Article  Google Scholar 

  • Hwang, I., Balakrishnan, H., & Tomlin, C. (2006). State estimation for hybrid systems: Applications to aircraft tracking. IEE Proceedings Control Theory and Applications, 153(5), 556.

    Article  MathSciNet  Google Scholar 

  • Mathur, S., Story, B. H., & Rodríguez, J. J. (2006). Vocal-tract modeling: Fractional elongation of segment lengths in a waveguide model with half-sample delays. IEEE Transactions on Audio, Speech, and Language Processing, 14(5), 1754–1762.

    Article  Google Scholar 

  • Mullen, J., Howard, D. M., & Murphy, D. T. (2007). Real-time dynamic articulations in the 2-D waveguide mesh vocal tract model. IEEE Transactions on Audio, Speech, and Language Processing, 15(2), 577–585.

    Article  Google Scholar 

  • Plumpe, M. D., Quatieri, T. F., Reynolds, D., et al. (1999). Modeling of the glottal flow derivative waveform with application to speaker identification. IEEE Transactions on Speech and Audio Processing, 7(5), 569–586.

    Article  Google Scholar 

  • Quatieri, T. F. (2006). Discrete-time speech signal processing: Principles and practice. Delhi: Pearson Education India.

    Google Scholar 

  • Routray, A., Pradhan, A. K., & Rao, K. P. (2002). A novel Kalman filter for frequency estimation of distorted signals in power systems. IEEE Transactions on Instrumentation and Measurement, 51(3), 469–479.

    Article  Google Scholar 

  • Sahoo, S., & Routray, A. (2016). A novel method of glottal inverse filtering. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(7), 1230–1241.

    Article  Google Scholar 

  • Schroeder, M. R. (1967). Determination of the geometry of the human vocal tract by acoustic measurements. The Journal of the Acoustical Society of America, 41(4B), 1002–1010.

    Article  Google Scholar 

  • Schroeter, J., & Sondhi, M. M. (1994). Techniques for estimating vocal-tract shapes from the speech signal. IEEE Transactions on Speech and Audio Processing, 2(1), 133–150.

    Article  Google Scholar 

  • Skordilis, Z. I., Toutios, A., Töger, J., & Narayanan, S. (2017). Estimation of vocal tract area function from volumetric Magnetic Resonance Imaging. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 924–928). IEEE.

  • Sondhi, M. M., & Gopinath, B. (1971). Determination of vocal-tract shape from impulse response at the lips. The Journal of the Acoustical Society of America, 49(6B), 1867–1873.

    Article  Google Scholar 

  • Sorensen, T., Toutios, A., Goldstein, L., & Narayanan, S. S. (2016). Characterizing vocal tract dynamics with real-time MRI. In 15th Conference on Laboratory Phonology, Ithaca, NY.

  • Story, B. H., Titze, I. R., & Hoffman, E. A. (1996). Vocal tract area functions from magnetic resonance imaging. The Journal of the Acoustical Society of America, 100(1), 537–554.

    Article  Google Scholar 

  • Wakita, H. (1973). Direct estimation of the vocal tract shape by inverse filtering of acoustic speech waveforms. IEEE Transactions on Audio and Electroacoustics, 21(5), 417–427.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Subhasmita Sahoo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sahoo, S., Routray, A. Dual estimation based vocal tract shape computation. Int J Speech Technol 22, 575–584 (2019). https://doi.org/10.1007/s10772-018-9538-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-018-9538-1

Keywords

Navigation