Abstract
To benefit from the maximum-margin nature of SVMs and also from the ability of CRFs to model correlations between neighboring features, this paper utilizes a two-stage SVM/CRF sequence classifier to detect prosodic events with acoustic prosodic cues. The classifier combines support vector machines (SVMs) and conditional random fields (CRFs) to predict the prosodic events. In the first stage, SVMs are trained to predict the label of each sequence element; in the second stage, a CRF is trained to predict the output sequence of labels using the output labels from the previously trained SVMs as its input. In our experiments on the Boston University radio news corpus, training data and testing data are selected according to the rule that they have distinctive transcriptions. The results have shown that the two-stage classifier can produce a competitive performance and the second stage CRF classifier can improve the performance of the overall classifier.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., Pierrehumbert, J., Hirschberg, J.: ToBI: A standard for labeling English prosody. In: Proc. of ICSLP, Canada, pp. 867–870 (1992)
Wightman, C.W., Ostendorf, M.: Automatic labeling of prosodic patterns. IEEE Transaction on Speech and Audio Processing 2(4), 469–481 (1994)
Jeon, J.H., Liu, Y.: Automatic prosodic events detection using syllable-based acoustic and syntactic features. In: Proc. of ICASSP, Taipei, pp. 4565–4568 (2009)
Hoefel, G., Elkan, C.: Learning a Two-Stage SVM/CRF Sequence Classifier. In: Proc.of CIKM, pp. 271–278. Napa Valley, California (2008)
Chen, K., Hasegawa-Johnson, M., Cohen, A.: An automatic prosody labeling system using ANN-based syntactic-prosodic model and GMM-based acoustic prosodic model. In: Proc. of ICASSP, Montreal, pp. 509–512 (2004)
Rangarajan Sridhar, V.K., Bangalore, S., Narayanan, S.: Exploiting acoustic and syntactic features for automatic prosody labeling in a maximum entropy framework. IEEE Transactions on Audio, Speech, and Language Processing 16, 797–811 (2008)
Ananthakrishnan, S., Narayanan, S.: Automatic prosodic event detection using acoustic, lexical and syntactic evidence. IEEE Transactions on Audio, Speech and Language Processing 16, 216–228 (2008)
Ostendorf, M., Price, P.J., Shattuck-Hunfnagel, S.: The Boston University Radio News Corpus. Linguistic Data Consortium (1995)
Chih-Chung, C., Chih-Jen, L.: LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011) software, http://www.csie.ntu.edu.tw/cjlin/libsvm
CRF++:Yet Another CRF toolkit, http://crfpp.sourceforge.net/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, M., Liu, W., Yang, Z., Hu, P. (2012). Automatic Prosodic Events Detection Using a Two-Stage SVM/CRF Sequence Classifier with Acoustic Features. In: Liu, CL., Zhang, C., Wang, L. (eds) Pattern Recognition. CCPR 2012. Communications in Computer and Information Science, vol 321. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33506-8_70
Download citation
DOI: https://doi.org/10.1007/978-3-642-33506-8_70
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33505-1
Online ISBN: 978-3-642-33506-8
eBook Packages: Computer ScienceComputer Science (R0)