Abstract
The complexity of vocal tract movement causes the difficult to record whole information of vocal tract during speech. Dynamic articulation has been acquired by implementing a variety of instruments, each of which has its advantages and shortcomings. However, the measurement of vocal tract movements is a difficult task to accomplish using one type of recording technique, and this has led to the simultaneous application of multiple instruments. Thus, we used an ultrasound system in combination with the electromagnetic articulography (EMA) system to record the multi-modality movement of the tongue. Data of the vocal tract movements were obtained by the ultrasound-based speech recording system developed by us, with which ultrasound images and synchronized audio signals are recorded synchronously. The EMA system is also used for the simultaneous collection of articulatory data with the audio. The EMA and ultrasound data were registered and matched to the same audio signal, after which these two sets of data were fused for each time point. In addition, a method for vocal tract shape reconstruction and modeling is proposed for the ultrasound dataset by using an active shape model. The averaged reconstruction error does not exceed 1.26 mm.
Similar content being viewed by others
References
Avila-Garcia MS, Carter NJ, Damper RI (2005) Extracting tongue shape dynamics frovm magnetic resonance image sequences. World Acad Sci Eng Technol 2
Boisvert J, Gobbi D, Vikal S, Rohling R, Fichtinger G, Abolmaesumi P (2008) An open-source solution for interactive acquisition, processing and transfer of interventional ultrasound images, in [workshop on systems and architectures for computer assisted interventions], Miccai 2008
Chan Song, Jianguo Wei, Qiang Fang, Shen Liu, Yuguang Wang, Jianwu Dang. Tongue shape synthesis based on active shape model, ISCSLIP, pp 383–386, Dec.2012, Hongkong
Cootes TF, Taylor CJ, Cooper DH, Graham J (1995) Active shape models - their training and application. Comput Vis Image Underst 61:38–59
Florescu V-M, Crevier-Buchman L, Denby B, Hueber T, Colazo-Simon A, Pillot-Loiseau C, Roussel P, Gendrot C, Quattrochi S (2010) Silent vs vocalized articulation for a portable ultrasound-based silent speech interface, Proceedings of Interspeech (Makuari, Japan), pp. 450–453.
Hoole P, Nguyen N (1999) Electromagnetic articulography in coarticulation research, in Hardcastle, W.H., Hewlitt, N. Eds., Coarticulation: Theory, data and techniques, pp. 260–269, Cambridge University Press, 1999.
Jianguo Wei, Song Wang, Qingzhi Hou, Jianwu Dang (2015) Generalized finite difference time domain method and its application to acoustics. mathematical problems in engineering, vol. 2015, Article ID 640305, 13 pages
Lee Hung LIEW, Beong Yong LEE, Yin Chai WANG, WaiShiang CHEAH (2013) Aerial images rectification using Non-parametric approach. J Converg 4(2):15–21
Lepsoy S, Cuiinga S (1998) Conversion of articulatory parameters into active shape model coefficients for lip motion representation and synthesis. Signal Process Image Commun 13:209–225
Li M, Kambhamettu C, Stone M (2005) Automatic contour tracking in ultrasound images[J]. Clinical Linguistics & phonetics 19(6–7):545–554
Li A, Yin Z, Wang T, Fang Q, Hu F (2004) RASC863 - a Chinese speech corpus with four regional accents. ICSLT-o-COCOSDA, New Delhi, India
Mielke J, Baker A, Archangeli D, Racy S (2005) Palatron: a technique for aligning ultrasound images of the tongue and palate. In siddiqi, D., tucker, B.V. (eds.). Coyote Papers 14:97–108
Perkell J, Cohen M, Svirsky M, Matthies M, Garabieta I, Jackson M (1992) Electro-magnetic midsagittal articulometer (EMMA) systems for transducing speech articulatory movements. J Acoust Soc Am 92:3078–3096
Shahabi C, Kim SH, Nocera L, Constantinou G, Lu Y, Cai Y, Medioni G, Nevatia R, Banaei-Kashani F (2014) Janus - Multi source event detection and collection system for effective surveillance of criminal activity. J Inf Process Syst 10(1):1–22
Song Wang, Shen Liu, Jianguo Wei, Qiang Fang, Jianwu Dang (2012) Reconstruction of vocal track based on multi-source image information, ISCSLP, pp 393–399, Hongkong
Stone M, Davis E (1995) A head and transducer support system for making ultrasound images of tongue/jaw movement. J Acoust Soc Am 98(6):3107–3112
Stone M, Sonies B, Shawker T, Weiss G, Nadel L (1983) Analysis of real-time ultrasound images of tongue configuration using a grid-digitizing system. J Phon 11:207–218
Thomas H, Elie-Laurent B, Gérard C, Bruce D, Gérard D, Maureen S (2010) Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips. Speech Comm 52(4):288–300
van Assen HC, Danilouchkine MG, Frangi AF, Ordas S, Westenberg JJM, Reiber JHC, Lelieveldt BPF (2006) SPSM: a 3D-ASM for segmentation of sparse and arbitrarily oriented cardiac MRI data. Med Image Anal 10:286–303
van Ginneken B, Frangi AF, Staal JJ, ter Haar Romeny BM, Viergever MA (2002) Active shape model segmentation with optimal features. IEEE Trans Med Imaging 21(8)
Verma P, Singh R, Singh A (2013) A framework to integrate speech based interface for blind web users on the websites of public interest human-centric computing and. Inf Sci 3:21
Acknowledgments
This work was supported by the National Natural Science Foundation (NSFC) of China (No. 61,175,016), as well as a 973 project (No. 2013CB329305), and the National Natural Science Foundation (NSFC) of China (No. 61,304,250).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wei, J., Wang, S., Lu, W. et al. Multi-modal recording and modeling of vocal tract movements. Multimed Tools Appl 75, 5247–5263 (2016). https://doi.org/10.1007/s11042-015-3040-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-015-3040-4