Abstract
Automatic prosodic break detection and annotation are important for both speech understanding and natural speech synthesis. In this paper, we discuss automatic prosodic break detection and feature analysis. The contributions of the paper are two aspects. One is that we use classifier combination method to detect Mandarin and English prosodic break using acoustic, lexical and syntactic evidence. Our proposed method achieves better performance on both the Mandarin prosodic annotation corpus — Annotated Speech Corpus of Chinese Discourse and the English prosodic annotation corpus — Boston University Radio News Corpus when compared with the baseline system and other researches' experimental results. The other is the feature analysis for prosodic break detection. The functions of different features, such as duration, pitch, energy, and intensity, are analyzed and compared in Mandarin and English prosodic break detection. Based on the feature analysis, we also verify some linguistic conclusions.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Huang X. Acero A, Hon H W. Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall, 2001.
Pitrelli J, Beckman M, Hirschberg J. Evaluation of prosodic transcription labeling reliability in the ToBI framework. In Proc. ICSLP, September 1994, pp.123-126.
Chen X, Li A, Sun G,Wu H et al. An application of SAMPA-c in standard Chinese. In Proc. ICSLP, Oct. 2000, pp.652-655.
Li A. Chinese prosody and prosodic labeling of spontaneous speech. In Proc. Speech Prosody, April 2002, pp.39-46.
Ostendorf M, Price P J, Shattuck-Hufnagel S. The Boston university radio news corpus. Technical Report No. ECS-95-001, Boston University, March 1995.
Wightman C, Ostendorf M. Automatic labeling of prosodic patterns. IEEE Trans. Speech and Audio Processing, 1994, 2(4): 469–481.
Ross K, Ostendorf M. Prediction of abstract prosodic labels for speech synthesis. Computer Speech and Language, 1996, 10(3): 155–185.
Chen K, Hasegawa-Johnson M, Cohen A. An automatic prosody labeling system using ANN-based syntactic-prosodic model and GMM-based acoustic prosodic model. In Proc. ICASSP, May 2004, Vol.1, pp.509-512.
Ananthakrishnan S, Narayanan S. Automatic prosodic even detection using acoustic, lexical and syntactic evidence. IEEE Trans. Audio, Speech, and Language Processing, 2008, 16(1): 216–228.
Jeon J H, Liu Y. Automatic prosodic events detection using syllable-based acoustic and syntactic features. In Proc. ICASSP, April 2009, pp. 4565–4568.
Srihar V K R, Bangalore S, Narayanan S S. Exploiting acoustic and syntactic features for automatic prosody labeling in a maximum entropy framework. IEEE Trans. Audio Speech and Language Processing, 2008, 16(4): 797–811.
Chou Y, Chiang C, Wang Y et al. Prosody labeling and modeling for Mandarin spontaneous speech. In Proc. Speech Prosody, May 2010.
Hu W. Study on prosody modeling in Chinese [Ph.D. Thesis]. Institute of Automation, Chinese Academic of Sciences, 2007.
Ni C, Liu W, Xu B. Automatic prosody boundary labeling of Mandarin using text and acoustic information. In Proc. the 6th ISCSLP, December 2008, pp.1-4.
Packard J L. The Morphology of Chinese: A Linguistic and Cognitive Approach. Cambridge University Press, 2000.
Tseng H, Chang P, Andrew G et al. A conditional random field word segmenter for sighan bakeoff 2005. In Proc. the 4th SIGHAN Workshop on Chinese Language Processing, October 2005, pp.168-171.
Chang P, Galley M, Manning C. Optimizing Chinese word segmentation for machine translation performance. In Proc. the 3rd Workshop on Statistical Machine Translation, June, 2008, pp.224-232.
Toutanova K, Klein D, Manning C, Singer Y. Feature rich part-of-speech tagging with a cyclic dependency network. InProc. HLT-NAACL, May 2003, pp.173-180.
Kim H, Ghahramani Z. Bayesian classifier combination. In Proc. the 15th Int. Conf. Artificial Intelligence and Statistics, April 2012, pp.619-627.
Sun X. Pitch accent prediction using ensemble machine learning. In Proc. the 2nd ICSLP, September 2002, pp.953-956.
Freund Y, Schapire R E. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 1997, 55(1): 119–139.
Lafferty J D, McCallum A, Pereira F C N. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. the 8th International Conference on Machine Learning, June 2001, pp.282-289.
Hall M, Frank E, Holmes G et al. The WEKA data mining software: An update. SIGKDD Explorations Newsletter, 2009, 11(1): 10–18.
Chang C, Lin C. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3), Article No.27.
Frazier L, Carlson K, Clifton C Jr. Prosodic phrasing is central to language comprehension. Trends in Cognitive Sciences, 2006, 10(6): 244–249.
Watson D, Gibson E. The relationship between intonational phrasing and syntactic structure in language production. Language and Cognitive Processes, 2004, 19(6): 713–755.
Xu Y, Wang M. Organizing syllables into groups: Evidence from F0 and duration patterns in Mandarin. Journal of Phonetics, 2009, 37(4): 502–520.
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by the National Natural Science Foundation of China under Grant Nos. 90820303, 90820011, and the Natural Science Foundation of Shandong Province of China under Grant No. ZR2011FQ024.
Rights and permissions
About this article
Cite this article
Ni, CJ., Zhang, AY., Liu, WJ. et al. Automatic Prosodic Break Detection and Feature Analysis. J. Comput. Sci. Technol. 27, 1184–1196 (2012). https://doi.org/10.1007/s11390-012-1295-z
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-012-1295-z