A Phonetic Segmentation Procedure Based on Hidden Markov Models

Pakoci, Edvin; Popović, Branislav; Jakovljević, Nikša; Pekar, Darko; Yassa, Fathy

doi:10.1007/978-3-319-43958-7_7

A Phonetic Segmentation Procedure Based on Hidden Markov Models

Edvin Pakoci¹⁶,
Branislav Popović¹⁶,
Nikša Jakovljević¹⁶,
Darko Pekar¹⁷ &
…
Fathy Yassa¹⁸

Conference paper
First Online: 13 August 2016

2284 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9811))

Abstract

In this paper, a novel variant of an automatic phonetic segmentation procedure is presented, especially useful if data is scarce. The procedure uses the Kaldi speech recognition toolkit as its basis, and combines and modifies several existing methods and Kaldi recipes. Both the specifics of model training and test data alignment are explained in detail. Effectiveness of artificial extension of the starting amount of manually labeled material during training is examined as well. Experimental results show the admirable overall correctness of the proposed procedure in the given test environment. Several variants of the procedure are compared, and the usage of speaker-adapted context-dependent triphone models trained without the expanded manually checked data is proven to produce the best results. A few ways to improve the procedure even more, as well as future work, are also discussed.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Brognaux, S., Roekhaut, S., Drugman, T., Beaufort, R.: Train&Align: a new online tool for automatic phonetic alignment. In: Spoken Language Technology Workshop (SLT), pp. 416–421. IEEE Signal Processing Society (2012)
Google Scholar
Scharenborg, O., Ernestus, M., Wan, V.: Segmentation of speech: child’s play? In: 8th Annual Conference of the International Speech Communication Association (INTERSPEECH), Antwerp, pp. 1953–1956 (2007)
Google Scholar
Esposito, A., Aversano, G.: Text independent methods for speech segmentation. In: Chollet, G., Esposito, A., Faundez-Zanuy, M., Marinaro, M. (eds.) Nonlinear Speech Modeling. LNCS (LNAI), vol. 3445, pp. 261–290. Springer, Heidelberg (2005)
Chapter Google Scholar
Leow, S.J., Chng, E.S., Lee, C.H.: Language-resource independent speech segmentation using cues from a spectrogram image. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, pp. 5813–5817 (2015)
Google Scholar
Priyadarsini, S., Kumar, A.: Automatic speech segmentation in syllable centric speech recognition system. J. Speech Technol. 19(1), 9–18 (2016)
Article Google Scholar
Almpanidis, G., Kotti, M., Kotropoulos, C.: Robust detection of phone boundaries using model selection criteria with few observations. IEEE Trans. Audio Speech Lang. Process. 17(2), 287–298 (2009). IEEE Signal Processing Society
Article Google Scholar
Bigi, B.: SPPAS: a tool for the phonetic segmentations of speech. In: 8th International Conference on Language Resources and Evaluation (LREC), Istanbul, pp. 1748–1755 (2012)
Google Scholar
Boeffard, O., Charonnat, L., Le Maguer, S., Lolive, D., Vidal, G.: Towards fully automatic annotation of audio books for TTS. In: 8th International Conference on Language Resources and Evaluation (LREC), Instanbul, pp. 975–980 (2012)
Google Scholar
Brognaux, S., Drugman, T.: HMM-based speech segmentation: improvements of fully automatic approaches. IEEE/ACM Trans. Audio Speech Lang. Process. 24(1), 5–15 (2016). IEEE Signal Processing Society
Article Google Scholar
Hoffmann, S., Pfister, B.: Fully automatic segmentation for prosodic speech corpora. In: 10th Annual Conference of the International Speech Communication Association (INTERSPEECH), Makuhari, pp. 1389–1392 (2010)
Google Scholar
Hoffmann, S., Pfister, B.: Text-to-speech alignment of long recordings using universal phone models. In: 14th Annual Conference of the International Speech Communication Association (INTERSPEECH), Lyon, pp. 1520–1524 (2013)
Google Scholar
Matoušek, J.: Automatic pitch-synchronous phonetic segmentation with context-independent HMMs. In: Matoušek, V., Mautner, P. (eds.) TSD 2009. LNCS, vol. 5729, pp. 178–185. Springer, Heidelberg (2009)
Chapter Google Scholar
Stan, A., Mamiya, Y., Yamagishi, J., Bell, P., Watts, O., Clark, R.A.J., King, S.: ALISA: an automatic lightly supervised speech segmentation and alignment tool. J. Comput. Speech Lang. 35, 116–133 (2016)
Article Google Scholar
Adell, J., Bonafonte, A., Gomez, J., Castro, M.: Comparative study of automatic phone segmentation methods for TTS. In: 30th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Philadelphia, pp. 309–312 (2005)
Google Scholar
Toledano, D., Gomez, L., Grande, L.: Automatic phonetic segmentation. IEEE Trans. Speech Audio Process. 11(6), 617–625 (2003). IEEE Signal Processing Society
Article Google Scholar
Wang, L., Zhao, Y., Chu, M., Zhou, J., Cao, Z.: Refining segmental boundaries for TTS database using fine contextual-dependent boundary models. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Montreal, pp. 641–644 (2004)
Google Scholar
Brugnara, F., Falavigna, D., Omologo, M.: Automatic segmentation and labeling of speech based on hidden Markov models. J. Speech Commun. 12(4), 357–370 (1993)
Article Google Scholar
Appen, Product Catalog. http://catalog.appenbutlerhill.com/
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlícek, P., Qian, Y., Schwarz, P., Silovský, J., Stemmer, G., Veselý, K.: The kaldi speech recognition toolkit. In: IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 1–4. IEEE Signal Processing Society (2011)
Google Scholar

Download references

Acknowledgments

This research was supported in part by the Ministry of Education, Science and Technological Development of the Republic of Serbia, under Grant No. TR32035. The authors are grateful to the company “Speech Morphing, Inc.” from Campbell, CA, USA, for providing the speech corpora for the experiments.

Author information

Authors and Affiliations

Faculty of Technical Sciences, University of Novi Sad, Novi Sad, Serbia
Edvin Pakoci, Branislav Popović & Nikša Jakovljević
AlfaNum Speech Technologies, Novi Sad, Serbia
Darko Pekar
Speech Morphing Inc., Campbell, CA, USA
Fathy Yassa

Authors

Edvin Pakoci
View author publications
You can also search for this author in PubMed Google Scholar
Branislav Popović
View author publications
You can also search for this author in PubMed Google Scholar
Nikša Jakovljević
View author publications
You can also search for this author in PubMed Google Scholar
Darko Pekar
View author publications
You can also search for this author in PubMed Google Scholar
Fathy Yassa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Branislav Popović .

Editor information

Editors and Affiliations

SPIIRAS , Saint-Petersburg, Russia
Andrey Ronzhin
Moscow State Linguistic University , Moscow, Russia
Rodmonga Potapova
Budapest University of Technology and Economics, Budapest, Hungary
Géza Németh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pakoci, E., Popović, B., Jakovljević, N., Pekar, D., Yassa, F. (2016). A Phonetic Segmentation Procedure Based on Hidden Markov Models. In: Ronzhin, A., Potapova, R., Németh, G. (eds) Speech and Computer. SPECOM 2016. Lecture Notes in Computer Science(), vol 9811. Springer, Cham. https://doi.org/10.1007/978-3-319-43958-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-43958-7_7
Published: 13 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43957-0
Online ISBN: 978-3-319-43958-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics