Post-processing of Automatic Segmentation of Speech Using Dynamic Programming

Szymański, Marcin; Grocholewski, Stefan

doi:10.1007/11846406_66

Marcin Szymański²¹ &
Stefan Grocholewski²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4188))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

1051 Accesses
1 Citations

Abstract

Building unit-selection speech synthesisers requires a precise annotation of large speech corpora. Manual segmentation of speech is a very laborious task, hence there is the need for automatic segmentation algorithms. As it was observed that the common HMM-based method is prone to systematical errors, some boundary refinement approaches, like boundary-specific correction, were introduced.

Last year, a dynamic programming fine-tuning approach was proposed, that combined two sources information, boundary error distribution and boundary MFCC statistical models. In this paper we verify the usefulness of incorporating several other data, boundary energy dynamics models and the signal periodicity information.

Download to read the full chapter text

Chapter PDF

LSTM-Based Speech Segmentation for TTS Synthesis

Semi-automatic Segmentation and Marking of Pitch Contours for Prosodic Analysis

Guaranteed Significance Level Criterion in Automatic Speech Signal Segmentation

Article 26 November 2020

V. V. Savchenko & A. V. Savchenko

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Adell, J., Bonafonte, A.: Towards phone segmentation for concatenation speech synthesis. In: Proc. 5th Speech Synthesis Workshop, Pittsburgh, pp. 139–144 (2004)
Google Scholar
Grocholewski, S.: CORPORA – Speech Database for Polish Diphones. In: Proc. Eurospeech 1997, pp. 1735–1738 (1997)
Google Scholar
Klabbers, E., Stoeber, K., Veldhuis, R., Wagner, P., Breuer, S.: Speech Synthesis Development Made Easy: The Bonn Open Synthesis System. In: Proc. Eurospeech 2001, pp. 521–525 (2001)
Google Scholar
Kvale, K.: Segmentation and Labelling of Speech, Ph.D. Thesis, Inst. for Teleteknikk, Trondheim (1993)
Google Scholar
Matousek, J., Tihelka, D., Psutka, J.: Automatic Segmentation for Czech Concatenative Speech Synthesis Using Statistical Approach with Boundary-Specific Correction. In: Proc. Eurospeech 2003, Geneva, pp. 301–304 (2003)
Google Scholar
Ostendorf, M., Digalakis, V.V., Kimball, O.A.: From HMM’s to Segment Models: A Unified View of Stochastic Modeling for Speech Recognition. IEEE Trans. on Speech and Audio Proc. 4(5) (September 1996)
Google Scholar
Szymański, M., Grocholewski, S.: Dynamic programming method for fine-tuning the boundary points in automatic segmentation of speech. In: Proc. Speech Analysis, Synthesis and Recognition Workshop, Krakow, Poland (2005)
Google Scholar
Taylor, P.A., Isard, S.D.: Automatic phone segmentation. In: Proc. Eurospeech, Genova, pp. 709–711 (1991)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computing Science, Poznan University of Technology, ul. Piotrowo 2, PL-60 965, Poznań, Poland
Marcin Szymański & Stefan Grocholewski

Authors

Marcin Szymański
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Grocholewski
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Masaryk University, Botanická 68a, CZ-602 00, Brno, Czech Republic
Ivan Kopeček
Faculty of Informatics, Department of Computer Graphics and Design, Masaryk University, Botanická 68a, 60200, Brno, Czech Republic
Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Szymański, M., Grocholewski, S. (2006). Post-processing of Automatic Segmentation of Speech Using Dynamic Programming. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2006. Lecture Notes in Computer Science(), vol 4188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11846406_66

Download citation

DOI: https://doi.org/10.1007/11846406_66
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-39090-9
Online ISBN: 978-3-540-39091-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Post-processing of Automatic Segmentation of Speech Using Dynamic Programming

Abstract

Chapter PDF

Similar content being viewed by others

LSTM-Based Speech Segmentation for TTS Synthesis

Semi-automatic Segmentation and Marking of Pitch Contours for Prosodic Analysis

Guaranteed Significance Level Criterion in Automatic Speech Signal Segmentation

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Post-processing of Automatic Segmentation of Speech Using Dynamic Programming

Abstract

Chapter PDF

Similar content being viewed by others

LSTM-Based Speech Segmentation for TTS Synthesis

Semi-automatic Segmentation and Marking of Pitch Contours for Prosodic Analysis

Guaranteed Significance Level Criterion in Automatic Speech Signal Segmentation

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation