Skip to main content
Log in

Exploiting articulatory features for pitch accent detection

  • Published:
Journal of Zhejiang University SCIENCE C Aims and scope Submit manuscript

Abstract

Articulatory features describe how articulators are involved in making sounds. Speakers often use a more exaggerated way to pronounce accented phonemes, so articulatory features can be helpful in pitch accent detection. Instead of using the actual articulatory features obtained by direct measurement of articulators, we use the posterior probabilities produced by multi-layer perceptrons (MLPs) as articulatory features. The inputs of MLPs are frame-level acoustic features pre-processed using the split temporal context-2 (STC-2) approach. The outputs are the posterior probabilities of a set of articulatory attributes. These posterior probabilities are averaged piecewise within the range of syllables and eventually act as syllable-level articulatory features. This work is the first to introduce articulatory features into pitch accent detection. Using the articulatory features extracted in this way, together with other traditional acoustic features, can improve the accuracy of pitch accent detection by about 2%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Ananthakrishnan, S., Narayanan, S., 2008. Automatic prosodic event detection using acoustic, lexical and syntactic evidence. IEEE Trans. Audio Speech Lang. Process., 16(1): 216–228. [doi:10.1109/TASL.2007.907570]

    Article  Google Scholar 

  • Black, A.W., Bunnell, H.T., Dou, Y., Muthukumar, P.K., Metze, F., Perry, D., Polzehl, T., Prahallad, K., Steidl, S., Vaughn, C., 2012. Articulatory Features for Expressive Speech Synthesis. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, p.4005–4008. [doi:10.1109/ICASSP.2012.6288796]

    Google Scholar 

  • Chao, H., Yang, Z.L., Liu, W.J., 2012. Improved Tone Modeling by Exploiting Articulatory Features for Mandarin Speech Recognition. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, p.4741–4744. [doi:10.1109/ICASSP.2012.6288978]

    Google Scholar 

  • Cho, T., 2006. Manifestation of prosodic structure in articulatory variation: evidence from lip kinematics in English. Lab. Phonol., 8:519–548.

    Google Scholar 

  • Erickson, D., 2002. Articulation of extreme formant patterns for emphasized vowels. Phonetica, 59(2–3):134–149. [doi:10.1159/000066067]

    Article  Google Scholar 

  • Fan, R.E., Chen, P.H., Lin, C.J., 2005. Working set selection using second order information for training support vector machines. J. Mach. Learn. Res., 6:1889–1918.

    MathSciNet  MATH  Google Scholar 

  • Fougeron, C., 1999. Prosodically Conditioned Articulatory Variations: a Review. UCLA Working Papers in Phonetics, p.1–74.

    Google Scholar 

  • Hall, M.A., 1999. Correlation-Based Feature Selection for Machine Learning. PhD Thesis, The University of Waikato, New Zealand.

    Google Scholar 

  • Hall, M.A., Smith, L.A., 1999. Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper. Proc. 12th Int. Florida Artificial Intelligence Research Society Conf., p.235–239.

    Google Scholar 

  • Iribe, Y., Mori, T., Katsurada, K., Nitta, T., 2010. Pronunciation Instruction Using CG Animation Based on Articulatory Features. Proc. Int. Conf. on Computers in Education, p.501–508.

    Google Scholar 

  • Iribe, Y., Mori, T., Katsurada, K., Kawai, G., Nitta, T., 2012. Real-Time Visualization of English Pronunciation on an IPA Chart Based on Articulatory Feature Extraction. Proc. Interspeech, p.1271–1274.

    Google Scholar 

  • Jeon, J.H., Liu, Y., 2009a. Automatic Prosodic Events Detection Using Syllable-Based Acoustic and Syntactic Features. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, p.4565–4568. [doi:10.1109/ICASSP.2009.4960646]

    Google Scholar 

  • Jeon, J.H., Liu, Y., 2009b. Semi-supervised Learning for Automatic Prosodic Event Detection Using Co-training Algorithm. Proc. ACL-IJCNLP, p.540–548. [doi:10.3115/1690219.1690222]

    Chapter  Google Scholar 

  • Jeon, J.H., Liu, Y., 2010. Syllable-Level Prominence Detection with Acoustic Evidence. Proc. Interspeech, p.1772–1775.

    Google Scholar 

  • Jeon, J.H., Liu, Y., 2012. Automatic prosodic event detection using a novel labeling and selection method in co-training. Speech Commun., 54(3):445–458. [doi:10.1016/j.specom.2011.10.008]

    Article  Google Scholar 

  • Kirchhoff, K., Fink, G.A., Sagerer, G., 2002. Combining acoustic and articulatory feature information for robust speech recognition. Speech Commun., 37(3–4):303–319. [doi:10.1016/S0167-6393(01)00020-6]

    Article  MATH  Google Scholar 

  • Krstulovic, S., 1999. LPC-Based Inversion of the DRM Articulatory Model. Proc. European Conf. on Speech Communication and Technology, p.125–128.

    Google Scholar 

  • Meng, H., Tseng, C.Y., Kondo, M., Harrison, A., Viscelgia, T., 2009. Studying L2 Suprasegmental Features in Asian Enlishes: a Position Paper. Proc. Interspeech, p.1715–1718.

    Google Scholar 

  • Ostendorf, M., Price, P.J., Shattuck-Hufnagel, S., 1995. The Boston University Radio News Corpus. Linguistic Data Consortium.

    Google Scholar 

  • Papcun, J., Hochberg, T.R., Thomas, F., Larouche, J., Zacks, J., Levy, S., 1992. Inferring articulation and recognizing gestures from acoustics with a neural network trained on X-ray microbeam data. J. Acoust. Soc. Am., 92(2):688–700. [doi:10.1121/1.403994]

    Article  Google Scholar 

  • Qian, Y.M., Liu, J., 2012a. Articulatory Feature Based Multilingual MLPs for Low-Resource Speech Recognition. Proc. Interspeech, p.2602–2605.

    Google Scholar 

  • Qian, Y.M., Liu, J., 2012b. Cross-Lingualand Ensemble MLPs Strategies for Low-Resource Speech Recognition. Proc. Interspeech, p.2582–2585.

    Google Scholar 

  • Qian, Y.M., Povey, D., Liu, J., 2011. State-Level Data Borrowing for Low-Resource Speech Recognition Based on Subspace GMMs. Proc. Interspeech, p.553–560.

    Google Scholar 

  • Qian, Y.M., Xu, J., Liu, J., 2013. Multi-stream posterior features and combining subspace GMMs for low resource LVCSR. Chin. J. Electron., 22(2):291–295.

    MathSciNet  Google Scholar 

  • Richards, H.B., Mason, J.S., Hunt, M., Bridle, J., 1996. Deriving Articulatory Representations of Speech with Various Excitation Modes. Proc. 4th Int. Conf. on Spoken Language, p.1233–1236. [doi:10.1109/ICSLP.1996.607831]

    Google Scholar 

  • Richards, H.B., Bridle, J., Hunt, M., Mason, J.S., 1997. Vocal Tract Shape Trajectory Estimation Using MLP Analysis-by-Synthesis. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, p.1287–1290. [doi:10.1109/ICASSP.1997.596181]

    Google Scholar 

  • Sangwan, A., Hansen, J.H.L., 2012. Automatic analysis of Mandarin accented English using phonological features. Speech Commun., 54(1):40–54. [doi:10.1016/j.specom.2011.06.003]

    Article  Google Scholar 

  • Sangwan, A., Mehrabani, M., Hansen, J.H.L., 2010. Automatic Language Analysis and Identification Based on Speech Production Knowledge. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, p.5006–5010. [doi:10.1109/ICASSP.2010.5495066]

    Google Scholar 

  • Schroeter, J., Sondhi, M.M., 1994. Techniques for estimating vocal-tract shapes from the speech signal. IEEE Trans. Speech Audio Process., 2(1):133–150. [doi:10.1109/89.260356]

    Article  Google Scholar 

  • Schwarz, P., Matejka, P., Cernocky, J., 2006. Hierarchical Structure of Neural Networks for Phoneme Recognition. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, p.325–328. [doi:10.1109/ICASSP.2006.1660023]

    Google Scholar 

  • Siniscalchi, S.M., Svendsen, T., Lee, C.H., 2008. Toward a Detector-Based Universal Phone Recognizer. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, p.4261–4264. [doi:10.1109/ICASSP.2008.4518596]

    Google Scholar 

  • Sluijter, A.M.C., van Heuven, V.J., 1996. Acoustic Correlates of Linguistic Stress and Accent in Dutch and American English. Proc. 4th Int. Conf. on Spoken Language, p.630–633. [doi:10.1109/ICSLP.1996.607440]

    Google Scholar 

  • Sun, X.J., 2002. Pitch Accent Prediction Using Ensemble Machine Learning. Proc. ICSLP, p.953–956.

    Google Scholar 

  • Taylor, P., 1994. The rise/fall/connection model of intonation. Speech Commun., 15(1-2):169–186. [doi:10.1016/0167-6393(94)90050-7]

    Article  Google Scholar 

  • Taylor, P., 1998. The Tilt Intonation Model. Proc. ICSLP, p.1383–1386.

    Google Scholar 

  • Witten, I.H., Frank, E., 2005. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington, Massachusetts.

    Google Scholar 

  • Zhao, J., Yuan, H., Liu, J., Xia, S., 2011. Automatic Lexical Stress Detection Using Acoustic Features for Computer Assisted Language Learning. Proc. APSIPA ASC, p.247–251.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junhong Zhao.

Additional information

Project (Nos. 61370034, 61273268, and 61005019) supported by the National Natural Science Foundation of China

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, J., Xu, J., Zhang, Wq. et al. Exploiting articulatory features for pitch accent detection. J. Zhejiang Univ. - Sci. C 14, 835–844 (2013). https://doi.org/10.1631/jzus.C1300104

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/jzus.C1300104

Key words

CLC number

Navigation