Skip to main content
Log in

Articulatory and acoustic analyses of Mandarin sentences with different emotions for speaking training of dysphonic disorders

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

The aim of the current study was to analyze articulatory and acoustic feature of sentences in Mandarin speakers with different emotions; for articulatory features, the movements of lips and tongue, especially velocities of the lips and tongue, during speech production were analyzed; for acoustic features, formants, fundamental frequency, amplitude and speed were analyzed. 14 subjects with pure Mandarin accent were recruited in this experiment. The subjects were asked to express specified sentences under different emotions (anger, sadness, happiness and neutral), for subsequent articulatory and acoustic analyses. The result indicated that emotions influenced the motion of articulators (tongue and lips) obviously; and then, the motion range of tongue and lips with anger and happiness were larger than with sadness and neutral. Results had been discussed to discover the relations between acoustic and articulatory feature of sentences, similarities and difference of multi-syllables and vowels. This study can be the basement for constructing the functional relation between articulatory parameters and acoustic parameters of emotional speech in the future in order to help individuals with dysphonic disorders to do speaking training.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Badino L, Canevari C, Fadiga L, Metta G (2012) Deep-level acoustic-to-articulatory mapping for DBN-HMM based phone. Paper presented at the SLT, Miami, pp 370–375

  • Chao H, Yang Z, Liu W (2012) Improved tone modeling by exploiting articulatory features for Mandarin speech recognition. Paper presented at the ICASSP, Tianjin, China

  • Cowie R, Douglas-Cowie E, Savvidou S, McMahon E, Sawey M, Schröder M (2000) FEELTRACE—an instrument for recording perceived emotion in real time. Paper presented at the ISCA workshop on speech and emotion, Beffast, pp 19–24

  • Eyben F, Scherer K, Schuller B, Sundberg J, Andre E, Busso C et al (2015) The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans Affect Comput. https://doi.org/10.1109/taffc.2015.2457417

    Article  Google Scholar 

  • Fang Q, Wei J, Hu F, Li A, Wang H, IEEE (2013) Estimating the position of mistracked coil of EMA data using GMM-based methods. 2013 Asia-Pacific signal and information processing association annual summit and conference (APSIPA)

  • Han W-J, Li H-F, Ruan H-B, Ma L (2014) Review on speech emotion recognition. J Softw 25:37–50. https://doi.org/10.13328/j.cnki.jos.004497

    Article  MATH  Google Scholar 

  • Heracleous P, Hagita N (2011) Automatic recognition of speech without any audio information. Paper presented at the ICASSP, Prague, Czech Republic, pp 2392–2395

  • Heyde CJ, Scobbie JM, Lickley R, Drake EK (2016) How fluent is the fluent speech of people who stutter? A new approach to measuring kinematics with ultrasound. Clin Linguist Phon 30(3–5):292–312. https://doi.org/10.3109/02699206.2015.1100684

    Article  Google Scholar 

  • Huang D, Wu X, Wei J, Wang H, Song C, Hou Q et al (2013) Visualization of Mandarin articulation by using a physiological articulatory model. Paper presented at the 2013 Asia-Pacific signal and information processing association annual summit and conference (Apsipa), Hokkaido, Japan. <Go to ISI>WOS:000331094400240

  • Johnson M, Lapkin S, Long V, Sanchez P, Suominen H, Basilakis J, Dawson L (2014) A systematic review of speech recognition technology in health care. BMC Med Inf Decis Mak 14:94

    Article  Google Scholar 

  • Kim J, Lee S, Narayanan SS (2009) A detailed study of word-position effects on emotion expression in speech. Paper presented at the DBLP, Brighton, England, pp 1–5

  • Kim J, Lammert A, Ghosh P, Narayanan SS (2013) Spatial and temporal alignment of multimodal human speech production data: real time imaging, flesh point tracking and audio. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 3637–3641

  • Kim J, Kumar N, Tsiartas A, Li M, Narayanan SS (2015) Automatic intelligibility classification of sentence-level pathological speech. Comput Speech Lang 29(1):132–144. https://doi.org/10.1016/j.csl.2014.02.001

    Article  Google Scholar 

  • Lee WS (2016) Articulatory–acoustical relationship in cantonese vowels. Lang Linguist 17(4):477–500. https://doi.org/10.1177/1606822x16637058

    Article  Google Scholar 

  • Li A (2015) Acoustic and articulatory analysis of emotional vowels. Springer, Berlin

    Book  Google Scholar 

  • Lin SJ (2004) Calorie restriction extends yeast life span by lowering the level of NADH. Genes Dev 18(1):12–16. https://doi.org/10.1101/gad.1164804

    Article  Google Scholar 

  • Ling Z-H, Richmond K, Yamagishi J (2013) Articulatory control of HMM-based parametric speech synthesis using feature-space-switched multiple regression. IEEE Trans Audio Speech Lang Process 21(1):205–217. https://doi.org/10.1109/tasl.2012.2215600

    Article  Google Scholar 

  • Malandrakis N, Potamianos A, Evangelopoulos G, Zlatintsi A (2011) A supervised approach to movie emotion tracking. In: IEEE international conference on acoustics, vol 1, pp 2376–2379

  • Manjunath KE, Sreenivasa Rao K (2015) Articulatory and excitation source features for speech recognition in read, extempore and conversation modes. Int J Speech Technol 19(1):121–134. https://doi.org/10.1007/s10772-015-9329-x

    Article  Google Scholar 

  • Marstaller L, Burianová H (2014) The multisensory perception of co-speech gestures—a review and meta-analysis of neuroimaging studies. J Neurolinguist 30:69–77. https://doi.org/10.1016/j.jneuroling.2014.04.003

    Article  Google Scholar 

  • Martin O, Kotsia I, Macq B, Pitas I (2006) The enterface’05 audio-visual emotion database. Paper presented at the international conference on data engineering workshops, Washington, pp 552–559

  • Meenakshi N, Yarra C, Yamini BK, Ghosh PK (2014) Comparison of speech quality with and without sensors in electromagnetic. Paper presented at the INTERSPEECH, Minneapolis, USA, pp 935–939

  • Narayanan S et al. (2014) Real-time magnetic resonance imaging and electromagnetic articulography database for speech production research. J Acoust Soc Am 136:1307–1311. https://doi.org/10.1121/1.4890284

    Article  Google Scholar 

  • Narayanan S, Toutios A, Ramanarayanan V, Lammert A, Kim J, Lee S et al (2014) Real-time magnetic resonance imaging and electromagnetic articulography database for speech production research (TC). J Acoust Soc Am 136(3):1307. https://doi.org/10.1121/1.4890284

    Article  Google Scholar 

  • Neufeld C, van Lieshout P (2014) Tongue kinematics in palate relative coordinate spaces for electro-magnetic articulography. J Acoust Soc Am 135(1):352–361. https://doi.org/10.1121/1.4836515

    Article  Google Scholar 

  • Schuller B, Valstar M, Eyben F, McKeown G, Cowie R, Pantic M (2011) AVEC 2011–The first international audio/visual emotion challenge. In: D’Mello S, Graesser A, Schuller B, Martin JC (eds) Affective computing and intelligent interaction. ACII 2011. Lecture notes in computer science, vol 6975. Springer, Berlin, Heidelberg, pp 415–424

    Google Scholar 

  • Slis A, Van Lieshout P (2013) The effect of phonetic context on speech movements in repetitive speech. J Acoust Soc Am 134(6):4496. https://doi.org/10.1121/1.4828834

    Article  Google Scholar 

  • Wei J, Zhang J, Ji Y, Fang Q, Lu W (2016) Morphological normalization of vowel images for articulatory speech recognition. J Vis Commun Image Represent 41:352–360. https://doi.org/10.1016/j.jvcir.2016.10.005

    Article  Google Scholar 

  • Yang J, Xu L (2017) Mandarin compound vowels produced by prelingually deafened children with cochlear implants. Int J Pediatr Otorhinolaryngol 97:143–149. https://doi.org/10.1016/j.ijporl.2017.04.012

    Article  Google Scholar 

  • Yu J, Jiang C, Luo C-w, Li R, Li L-y, Wang Z-f (2015) Electro-magnetic articulography data stabilization for speech synchronized articulatory animation. Paper presented at the FSKD, Guilin, China, pp 1924–1928

  • Zhang D, Liu X, Yan N, Wang L, Zhu Y, Chen H (2014) A multi-channel/multi-speaker articulatory database in mandarin for speech visualization. Paper presented at the 2014 9th international symposium on Chinese spoken language processing (ISCSLP). <Go to ISI>://WOS:000349765600062

Download references

Acknowledgements

Thanks are due to all the subjects in current experiment, to Xueying Zhang and Shufei Duan for technical assistance, and to Jianzheng Yan and Dong Li for assistance in data collection.

Funding

This study was supported by the National Nature Science Foundation of China [Grant Number 61371193].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xueying Zhang.

Ethics declarations

Conflict of interest

The authors report no conflicts of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ren, G., Zhang, X. & Duan, S. Articulatory and acoustic analyses of Mandarin sentences with different emotions for speaking training of dysphonic disorders. J Ambient Intell Human Comput 11, 561–571 (2020). https://doi.org/10.1007/s12652-018-0942-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-018-0942-9

Keywords

Navigation