Skip to main content
Log in

Combined X-ray and facial videos for phoneme-level articulator dynamics

  • Original Article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Dynamic external and internal articulator motions are integrated into a low-cost data-driven three-dimensional talking head in this paper. External and internal articulations are defined and calibrated from the video streams and the videofluoroscopy to a generic 3D talking head model. Three different deformation modes in relation to pronunciation characteristics of muscular soft tissue of lips and tongue, up-down movements of chin and the relatively fixed articulators are set up and integrated. The shape blending functions among segmented phonemes of natural speech input are synthesized in an utterance. Animations of the confusable phonemes and minimal pairs are shown to English teachers and learners for a perception test. The results show that the proposed method can reflect the real situation of phonetic pronunciation realistically.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Grauwinkel, K., Dewitt, B., Fagel, S.: Visual information and redundancy conveyed by internal articulator dynamics in synthetic audiovisual speech. In: Proc. of Interspeech, pp. 706–709 (2007)

  2. Tarabalka, Y., Badin, P., Elisei, F., Bailly, G.: Can you read tongue movements? Evaluation of the contribution of tongue display to speech understanding. In: Proc. of ASSISTH2007, pp. 187–193. Toulouse, France (2007)

  3. Wik, P., Engwall, O.: Looking at tongues—can it help in speech perception? In: Proc. of FONETIK 2008, pp. 57–60 (2008)

  4. Rathinavelu, A., Thiagarajan, H., Rajkuma, A.: Three-dimensional articulator model for speech acquisition by children with hearing loss. In: Universal Access in HCI, Part I, HCII 2007. LNCS, vol. 4554, pp. 786–794 (2007)

  5. Fagel, S., Madany, K.: A 3D virtual head as a tool for speech therapy for children. In: Proc. of Interspeech, pp. 2643–2646, Brisbane (2008)

  6. Tye-Murray, N., Kirk, K.I., Schum, L.: Making typically obscured articulatory activity available to speech-readers by means of videofluoroscopy. NCVS Status Prog. Rep. 4, 41–63 (1993)

    Google Scholar 

  7. Fagel, S., Clemens, C.: An articulation model for audio-visual speech synthesis—determination, adjustment, evaluation. Speech Commun. 44, 141–154 (2004)

    Article  Google Scholar 

  8. Massaro, D.W., Light, J.: Using visible speech to train perception and production of speech for individuals with hearing loss. J. Speech, Lang. Hear. Res. 47, 304–320 (2004)

    Article  Google Scholar 

  9. Badin, P., Elisei, F., Bailly, G., Tarabalka, Y.: An audio-visual talking head for augmented speech generation: Models and animations based on a real speaker’ articulatory data. In: AMDO 2008. LNCS, vol. 5098, pp. 132–143 (2008)

  10. Bregler, C., Covell, M., Slaney, M.: Video rewrite: Driving visual speech with audio. In: Proc. of ACM SIGGRAPH 1997, pp. 353–360 (1997)

  11. Ezzat, T., Poggio, T.: Visual speech synthesis by morphing visemes. Int. J. Comput. Vis. 38, 45–57 (2000)

    Article  MATH  Google Scholar 

  12. Kalberer, G.A., Gool, L.V.: Realistic face animation for speech. J. Vis. Comput. Animat. 13, 97–106 (2002)

    Article  MATH  Google Scholar 

  13. Ma, J., Cole, R., Pellom, B., Ward, W., Wise, B.: Accurate automatic visible speech synthesis of arbitrary 3D models based on concatenation of diviseme motion capture data. Comput. Animat. Virtual Worlds 15, 485–500 (2004)

    Article  Google Scholar 

  14. Liu, X., Mao, T., Xia, S., Yu, Y., Wang, Z.: Facial animation by optimized blend shapes from motion capture data. Comput. Animat. Virtual Worlds 19, 235–245 (2008)

    Article  Google Scholar 

  15. Deng, Z., Neumann, U.: Expressive speech animation synthesis with phoneme-level controls. Comput. Graph. Forum 27(8), 2096–2113 (2008)

    Article  Google Scholar 

  16. Fagel, S.: Merging methods of speech visualization. ZAS Pap. Linguist. 40, 19–32 (2005)

    Google Scholar 

  17. Park, S.Y., Subbarao, M.: A multiview 3D modeling system based on stereo vision techniques. J. Mach. Vis. Appl. 16(3), 148–156 (2005)

    Article  Google Scholar 

  18. Jin, X., Li, Y., Peng, Q.: General constrained deformations based on generalized metaballs. Comput. Graph. 24, 219–231 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, H., Wang, L., Liu, W. et al. Combined X-ray and facial videos for phoneme-level articulator dynamics. Vis Comput 26, 477–486 (2010). https://doi.org/10.1007/s00371-010-0434-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-010-0434-1

Keywords

Navigation