Multi-modal recording and modeling of vocal tract movements

Wei, Jianguo; Wang, Song; Lu, Wenhuan; Hou, Qingzhi; Fang, Qiang; Dang, Jianwu

doi:10.1007/s11042-015-3040-4

Multi-modal recording and modeling of vocal tract movements

Published: 14 December 2015

Volume 75, pages 5247–5263, (2016)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Jianguo Wei¹,
Song Wang¹,
Wenhuan Lu¹,
Qingzhi Hou²,
Qiang Fang³ &
…
Jianwu Dang^2,4

402 Accesses
4 Citations
Explore all metrics

Abstract

The complexity of vocal tract movement causes the difficult to record whole information of vocal tract during speech. Dynamic articulation has been acquired by implementing a variety of instruments, each of which has its advantages and shortcomings. However, the measurement of vocal tract movements is a difficult task to accomplish using one type of recording technique, and this has led to the simultaneous application of multiple instruments. Thus, we used an ultrasound system in combination with the electromagnetic articulography (EMA) system to record the multi-modality movement of the tongue. Data of the vocal tract movements were obtained by the ultrasound-based speech recording system developed by us, with which ultrasound images and synchronized audio signals are recorded synchronously. The EMA system is also used for the simultaneous collection of articulatory data with the audio. The EMA and ultrasound data were registered and matched to the same audio signal, after which these two sets of data were fused for each time point. In addition, a method for vocal tract shape reconstruction and modeling is proposed for the ultrasound dataset by using an active shape model. The averaged reconstruction error does not exceed 1.26 mm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Method for Constructing 3D Geometric Articulatory Models

Article 07 May 2015

Analyzing Vocal Tract Parameters of Speech

Nonlinear Acoustic Analysis of Voice Production

References

Avila-Garcia MS, Carter NJ, Damper RI (2005) Extracting tongue shape dynamics frovm magnetic resonance image sequences. World Acad Sci Eng Technol 2
Boisvert J, Gobbi D, Vikal S, Rohling R, Fichtinger G, Abolmaesumi P (2008) An open-source solution for interactive acquisition, processing and transfer of interventional ultrasound images, in [workshop on systems and architectures for computer assisted interventions], Miccai 2008
Chan Song, Jianguo Wei, Qiang Fang, Shen Liu, Yuguang Wang, Jianwu Dang. Tongue shape synthesis based on active shape model, ISCSLIP, pp 383–386, Dec.2012, Hongkong
Cootes TF, Taylor CJ, Cooper DH, Graham J (1995) Active shape models - their training and application. Comput Vis Image Underst 61:38–59
Article Google Scholar
Florescu V-M, Crevier-Buchman L, Denby B, Hueber T, Colazo-Simon A, Pillot-Loiseau C, Roussel P, Gendrot C, Quattrochi S (2010) Silent vs vocalized articulation for a portable ultrasound-based silent speech interface, Proceedings of Interspeech (Makuari, Japan), pp. 450–453.
Hoole P, Nguyen N (1999) Electromagnetic articulography in coarticulation research, in Hardcastle, W.H., Hewlitt, N. Eds., Coarticulation: Theory, data and techniques, pp. 260–269, Cambridge University Press, 1999.
Jianguo Wei, Song Wang, Qingzhi Hou, Jianwu Dang (2015) Generalized finite difference time domain method and its application to acoustics. mathematical problems in engineering, vol. 2015, Article ID 640305, 13 pages
Lee Hung LIEW, Beong Yong LEE, Yin Chai WANG, WaiShiang CHEAH (2013) Aerial images rectification using Non-parametric approach. J Converg 4(2):15–21
Google Scholar
Lepsoy S, Cuiinga S (1998) Conversion of articulatory parameters into active shape model coefficients for lip motion representation and synthesis. Signal Process Image Commun 13:209–225
Article Google Scholar
Li M, Kambhamettu C, Stone M (2005) Automatic contour tracking in ultrasound images[J]. Clinical Linguistics & phonetics 19(6–7):545–554
Article Google Scholar
Li A, Yin Z, Wang T, Fang Q, Hu F (2004) RASC863 - a Chinese speech corpus with four regional accents. ICSLT-o-COCOSDA, New Delhi, India
Google Scholar
Mielke J, Baker A, Archangeli D, Racy S (2005) Palatron: a technique for aligning ultrasound images of the tongue and palate. In siddiqi, D., tucker, B.V. (eds.). Coyote Papers 14:97–108
Google Scholar
Perkell J, Cohen M, Svirsky M, Matthies M, Garabieta I, Jackson M (1992) Electro-magnetic midsagittal articulometer (EMMA) systems for transducing speech articulatory movements. J Acoust Soc Am 92:3078–3096
Article Google Scholar
Shahabi C, Kim SH, Nocera L, Constantinou G, Lu Y, Cai Y, Medioni G, Nevatia R, Banaei-Kashani F (2014) Janus - Multi source event detection and collection system for effective surveillance of criminal activity. J Inf Process Syst 10(1):1–22
Article Google Scholar
Song Wang, Shen Liu, Jianguo Wei, Qiang Fang, Jianwu Dang (2012) Reconstruction of vocal track based on multi-source image information, ISCSLP, pp 393–399, Hongkong
Stone M, Davis E (1995) A head and transducer support system for making ultrasound images of tongue/jaw movement. J Acoust Soc Am 98(6):3107–3112
Article Google Scholar
Stone M, Sonies B, Shawker T, Weiss G, Nadel L (1983) Analysis of real-time ultrasound images of tongue configuration using a grid-digitizing system. J Phon 11:207–218
Google Scholar
Thomas H, Elie-Laurent B, Gérard C, Bruce D, Gérard D, Maureen S (2010) Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips. Speech Comm 52(4):288–300
Article Google Scholar
van Assen HC, Danilouchkine MG, Frangi AF, Ordas S, Westenberg JJM, Reiber JHC, Lelieveldt BPF (2006) SPSM: a 3D-ASM for segmentation of sparse and arbitrarily oriented cardiac MRI data. Med Image Anal 10:286–303
Article Google Scholar
van Ginneken B, Frangi AF, Staal JJ, ter Haar Romeny BM, Viergever MA (2002) Active shape model segmentation with optimal features. IEEE Trans Med Imaging 21(8)
Verma P, Singh R, Singh A (2013) A framework to integrate speech based interface for blind web users on the websites of public interest human-centric computing and. Inf Sci 3:21
Google Scholar

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation (NSFC) of China (No. 61,175,016), as well as a 973 project (No. 2013CB329305), and the National Natural Science Foundation (NSFC) of China (No. 61,304,250).

Author information

Authors and Affiliations

School of Computer Software, Tianjin University, Tianjin, China
Jianguo Wei, Song Wang & Wenhuan Lu
Tianjin Key Laboratory of Cognitive Computation and Application, Tianjin University, Tianjin, China
Qingzhi Hou & Jianwu Dang
Phonetics Laboratory, Institute of Linguistics, Chinese Academy of Social Sciences, Beijing, China
Qiang Fang
Japan Advanced Institute of Science and Technology, Nomi, Japan
Jianwu Dang

Authors

Jianguo Wei
View author publications
You can also search for this author in PubMed Google Scholar
Song Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wenhuan Lu
View author publications
You can also search for this author in PubMed Google Scholar
Qingzhi Hou
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Fang
View author publications
You can also search for this author in PubMed Google Scholar
Jianwu Dang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenhuan Lu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wei, J., Wang, S., Lu, W. et al. Multi-modal recording and modeling of vocal tract movements. Multimed Tools Appl 75, 5247–5263 (2016). https://doi.org/10.1007/s11042-015-3040-4

Download citation

Received: 21 April 2015
Revised: 06 October 2015
Accepted: 21 October 2015
Published: 14 December 2015
Issue Date: May 2016
DOI: https://doi.org/10.1007/s11042-015-3040-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-modal recording and modeling of vocal tract movements

Abstract

Access this article

Similar content being viewed by others

A Novel Method for Constructing 3D Geometric Articulatory Models

Analyzing Vocal Tract Parameters of Speech

Nonlinear Acoustic Analysis of Voice Production

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-modal recording and modeling of vocal tract movements

Abstract

Access this article

Similar content being viewed by others

A Novel Method for Constructing 3D Geometric Articulatory Models

Analyzing Vocal Tract Parameters of Speech

Nonlinear Acoustic Analysis of Voice Production

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation