Skip to main content
Log in

Multi-modal recording and modeling of vocal tract movements

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The complexity of vocal tract movement causes the difficult to record whole information of vocal tract during speech. Dynamic articulation has been acquired by implementing a variety of instruments, each of which has its advantages and shortcomings. However, the measurement of vocal tract movements is a difficult task to accomplish using one type of recording technique, and this has led to the simultaneous application of multiple instruments. Thus, we used an ultrasound system in combination with the electromagnetic articulography (EMA) system to record the multi-modality movement of the tongue. Data of the vocal tract movements were obtained by the ultrasound-based speech recording system developed by us, with which ultrasound images and synchronized audio signals are recorded synchronously. The EMA system is also used for the simultaneous collection of articulatory data with the audio. The EMA and ultrasound data were registered and matched to the same audio signal, after which these two sets of data were fused for each time point. In addition, a method for vocal tract shape reconstruction and modeling is proposed for the ultrasound dataset by using an active shape model. The averaged reconstruction error does not exceed 1.26 mm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Avila-Garcia MS, Carter NJ, Damper RI (2005) Extracting tongue shape dynamics frovm magnetic resonance image sequences. World Acad Sci Eng Technol 2

  2. Boisvert J, Gobbi D, Vikal S, Rohling R, Fichtinger G, Abolmaesumi P (2008) An open-source solution for interactive acquisition, processing and transfer of interventional ultrasound images, in [workshop on systems and architectures for computer assisted interventions], Miccai 2008

  3. Chan Song, Jianguo Wei, Qiang Fang, Shen Liu, Yuguang Wang, Jianwu Dang. Tongue shape synthesis based on active shape model, ISCSLIP, pp 383–386, Dec.2012, Hongkong

  4. Cootes TF, Taylor CJ, Cooper DH, Graham J (1995) Active shape models - their training and application. Comput Vis Image Underst 61:38–59

    Article  Google Scholar 

  5. Florescu V-M, Crevier-Buchman L, Denby B, Hueber T, Colazo-Simon A, Pillot-Loiseau C, Roussel P, Gendrot C, Quattrochi S (2010) Silent vs vocalized articulation for a portable ultrasound-based silent speech interface, Proceedings of Interspeech (Makuari, Japan), pp. 450–453.

  6. Hoole P, Nguyen N (1999) Electromagnetic articulography in coarticulation research, in Hardcastle, W.H., Hewlitt, N. Eds., Coarticulation: Theory, data and techniques, pp. 260–269, Cambridge University Press, 1999.

  7. Jianguo Wei, Song Wang, Qingzhi Hou, Jianwu Dang (2015) Generalized finite difference time domain method and its application to acoustics. mathematical problems in engineering, vol. 2015, Article ID 640305, 13 pages

  8. Lee Hung LIEW, Beong Yong LEE, Yin Chai WANG, WaiShiang CHEAH (2013) Aerial images rectification using Non-parametric approach. J Converg 4(2):15–21

    Google Scholar 

  9. Lepsoy S, Cuiinga S (1998) Conversion of articulatory parameters into active shape model coefficients for lip motion representation and synthesis. Signal Process Image Commun 13:209–225

    Article  Google Scholar 

  10. Li M, Kambhamettu C, Stone M (2005) Automatic contour tracking in ultrasound images[J]. Clinical Linguistics & phonetics 19(6–7):545–554

    Article  Google Scholar 

  11. Li A, Yin Z, Wang T, Fang Q, Hu F (2004) RASC863 - a Chinese speech corpus with four regional accents. ICSLT-o-COCOSDA, New Delhi, India

    Google Scholar 

  12. Mielke J, Baker A, Archangeli D, Racy S (2005) Palatron: a technique for aligning ultrasound images of the tongue and palate. In siddiqi, D., tucker, B.V. (eds.). Coyote Papers 14:97–108

    Google Scholar 

  13. Perkell J, Cohen M, Svirsky M, Matthies M, Garabieta I, Jackson M (1992) Electro-magnetic midsagittal articulometer (EMMA) systems for transducing speech articulatory movements. J Acoust Soc Am 92:3078–3096

    Article  Google Scholar 

  14. Shahabi C, Kim SH, Nocera L, Constantinou G, Lu Y, Cai Y, Medioni G, Nevatia R, Banaei-Kashani F (2014) Janus - Multi source event detection and collection system for effective surveillance of criminal activity. J Inf Process Syst 10(1):1–22

    Article  Google Scholar 

  15. Song Wang, Shen Liu, Jianguo Wei, Qiang Fang, Jianwu Dang (2012) Reconstruction of vocal track based on multi-source image information, ISCSLP, pp 393–399, Hongkong

  16. Stone M, Davis E (1995) A head and transducer support system for making ultrasound images of tongue/jaw movement. J Acoust Soc Am 98(6):3107–3112

    Article  Google Scholar 

  17. Stone M, Sonies B, Shawker T, Weiss G, Nadel L (1983) Analysis of real-time ultrasound images of tongue configuration using a grid-digitizing system. J Phon 11:207–218

    Google Scholar 

  18. Thomas H, Elie-Laurent B, Gérard C, Bruce D, Gérard D, Maureen S (2010) Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips. Speech Comm 52(4):288–300

    Article  Google Scholar 

  19. van Assen HC, Danilouchkine MG, Frangi AF, Ordas S, Westenberg JJM, Reiber JHC, Lelieveldt BPF (2006) SPSM: a 3D-ASM for segmentation of sparse and arbitrarily oriented cardiac MRI data. Med Image Anal 10:286–303

    Article  Google Scholar 

  20. van Ginneken B, Frangi AF, Staal JJ, ter Haar Romeny BM, Viergever MA (2002) Active shape model segmentation with optimal features. IEEE Trans Med Imaging 21(8)

  21. Verma P, Singh R, Singh A (2013) A framework to integrate speech based interface for blind web users on the websites of public interest human-centric computing and. Inf Sci 3:21

    Google Scholar 

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation (NSFC) of China (No. 61,175,016), as well as a 973 project (No. 2013CB329305), and the National Natural Science Foundation (NSFC) of China (No. 61,304,250).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenhuan Lu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, J., Wang, S., Lu, W. et al. Multi-modal recording and modeling of vocal tract movements. Multimed Tools Appl 75, 5247–5263 (2016). https://doi.org/10.1007/s11042-015-3040-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-015-3040-4

Keywords

Navigation