Abstract
The paper compares different approaches in the phrase boundary detection issue, based on the data gained from speech corpora recorded for the purpose of the text-to-speech (TTS) system. It is showed that conditional random fields model can outperform basic deterministic and classification-based algorithms both in speaker-dependent and speaker independent phrasing. The results on manually annotated sentences with phrase breaks are presented here as well.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The term ‘juncture’ was adopted from [23] and refers to required phrase breaks.
References
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37 (1960)
Grůber, M., Matoušek, J.: Listening-test-based annotation of communicative functions for expressive speech synthesis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 283–290. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15760-8_36
Hirschberg, J., Prieto, P.: Training intonational phrasing rules automatically for English and Spanish text-to-speech. Speech Communication 18(3), 281–290 (1996)
Jůzová, M., Romportl, J., Tihelka, D.: Speech corpus preparation for voice banking of laryngectomised patients. In: Král, P., Matoušek, V. (eds.) TSD 2015. LNCS, vol. 9302, pp. 282–290. Springer, Cham (2015). doi:10.1007/978-3-319-24033-6_32
Jůzová, M., Tihelka, D., Matoušek, J.: Designing high-coverage multi-level text corpus for non-professional-voice conservation. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS (LNAI), vol. 9811, pp. 207–215. Springer, Cham (2016). doi:10.1007/978-3-319-43958-7_24
Jůzová, M.: Prosodic phrase boundary classification based on Czech speech corpora. In: Text, Speech and Dialogue. LNCS. Springer, Berlin, Heidelberg (2017)
Jůzová, M., Tihelka, D., Matoušek, J., Hanzlíček, Z.: Voice conservation and TTS system for people facing total laryngectomy. In: Proceedings of Interspeech 2017 (2017)
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of 18th ICML 2001, pp. 282–289. Morgan Kaufmann Publishers Inc. (2001)
Legát, M., Matoušek, J., Tihelka, D.: A robust multi-phase pitch-mark detection algorithm. In: Proceedings of Interspeech 2007, pp. 1641–1644 (2007)
Louw, A., Moodley, A.: Speaker specific phrase break modeling with conditional random fields for text-to-speech. In: Proceedings of PRASA-RobMech 2016, pp. 1–6 (2016)
Matoušek, J., Romportl, J.: Automatic pitch-synchronous phonetic segmentation. In: Proceedings of Interspeech 2008. pp. 1626–1629. ISCA (2008)
Matoušek, J., Tihelka, D., Romportl, J.: Current state of Czech text-to-speech system ARTIC. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS, vol. 4188, pp. 439–446. Springer, Heidelberg (2006). doi:10.1007/11846406_55
Matoušek, J., Romportl, J.: Recording and annotation of speech corpus for Czech unit selection speech synthesis. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS, vol. 4629, pp. 326–333. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74628-7_43
Mishra, T., Jun Kim, Y., Bangalore, S.: Intonational phrase break prediction for text-to-speech synthesis using dependency relations. In: Proceedings of ICASSP 2015, pp. 4919–4923 (2015)
Palková, Z.: Rytmická výstavba prozaického textu. Studia ČSAV; čis. 13/1974, Academia (1974)
Parlikar, A., Black, A.W.: Data-driven phrasing for speech synthesis in low-resource languages. In: Proceedings of ICASSP 2012, pp. 4013–4016 (2012)
Prahallad, K., Raghavendra, E.V., Black, A.W.: Learning speaker-specific phrase breaks for text-to-speech systems. In: Proceedings of SSW 2010, Kyoto, Japan, September 22–24, 2010. pp. 162–166 (2010)
Romportl, J.: Statistical evaluation of prosodic phrases in the Czech language. In: Proceedings of the Speech Prosody 2008, pp. 755–758. Editora RG/CNPq, Campinas (2008)
Romportl, J.: Automatic prosodic phrase annotation in a corpus for speech synthesis. In: Proceedings of Speech Prosody 2010. University of Illionois, Chicago (2010)
Romportl, J., Matoušek, J.: Several aspects of machine-driven phrasing in text-to-speech systems. Prague Bull. Math. Linguist. 95, 51–61 (2011)
Salzberg, S.L.: On comparing classifiers: pitfalls to avoid and a recommended approach. Data Min. Knowl. Disc. 1(3), 317–328 (1997)
Sun, X., Applebaum, T.H.: Intonational phrase break prediction using decision tree and n-gram model. In: Proceedings of Eurospeech 2001, pp. 3–7 (2001)
Taylor, P.: Text-to-Speech Synthesis, 1st edn. Cambridge University Press, New York, NY, USA (2009)
Taylor, P., Black, A.W.: Assigning phrase breaks from part-of-speech sequences. Comput. Speech Lang. 12(2), 99–117 (1998)
Tihelka, D., Grůber, M., Hanzlíček, Z.: Robust methodology for TTS enhancement evaluation. In: Habernal, I., Matoušek, V. (eds.) TSD 2013. LNCS, vol. 8082, pp. 442–449. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40585-3_56
Acknowledgement
This research was supported by Ministry of Education, Youth and Sports of the Czech Republic, project No. LO1506, and by the grant of the University of West Bohemia, project No. SGS-2016-039.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Jůzová, M. (2017). CRF-Based Phrase Boundary Detection Trained on Large-Scale TTS Speech Corpora. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_26
Download citation
DOI: https://doi.org/10.1007/978-3-319-66429-3_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66428-6
Online ISBN: 978-3-319-66429-3
eBook Packages: Computer ScienceComputer Science (R0)