Fast Algorithm for Automatic Alignment of Speech and Imperfect Text Data

Tomashenko, Natalia A.; Khokhlov, Yuri Y.

doi:10.1007/978-3-319-01931-4_20

Natalia A. Tomashenko²² &
Yuri Y. Khokhlov²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8113))

Included in the following conference series:

International Conference on Speech and Computer

1217 Accesses
5 Citations

Abstract

A solution to the problem of fast single-pass alignment of speech with imperfect transcripts is introduced. The proposed technique is based on constructing a special word network for segmentation. We examine robustness and segmentation quality for different types of errors and different levels of noise in the text, depending on the parameters of network tuning. Experiments showed that with properly selected parameters the algorithm is robust to noise of any type in transcripts. The proposed approach has been successfully applied to the task of creating movie subtitles.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Pitz, M., Molau, S., Schluter, R., Ney, H.: Automatic transcription verification of broadcast news and similar speech corpora. In: Proc. DARPA Broadcast News Workshop, Herndon, VA, pp. 157–159 (1999)
Google Scholar
Lamel, L., Gauvain, J.L., Adda, G.: Lightly supervised acoustic model training. In: Proc. ISCA ITRW ASR 2000 (2000)
Google Scholar
Moreno, P., Joerg, C., Van Thong, J.-M., Glickman, O.: A recursive algorithm for the forced alignment of very long audio segments. In: Proc. ICSLP 1998, Sydney, Australia, pp. 2711–2714. IEEE Press (1998)
Google Scholar
Braunschweiler, N., Gales, M.J.F., Buchholz, S.: Lightly supervised recognition for automatic alignment of large coherent speech recordings. In: Proc. of INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, pp. 2222–2225 (2010)
Google Scholar
Boeffard, O., Charonnat, L., Maguer, S., Lolive, D., Vidal, G.: Towards Fully Automatic Annotation of Audiobooks for TTS. In: Proc. LREC (2012)
Google Scholar
Katsamanis, A., Black, M.P., Georgiou, P.G., Goldstein, L., Narayanan, S.: SailAlign: Robust long speech-text alignment. In: Proc. of Workshop on New Tools and Methods for Very-Large Scale Phonetics Research (2011)
Google Scholar
Haubold, A., Kender, J.R.: Augmented segmentation and visualization for presentation 2005, pp. 51–60. ACM Press, Singapore (2005)
Google Scholar
Hazen, T.J.: Automatic Alignment and Error Correction of Human Generated Transcripts for Long Speech Recordings. In: Interspeech. IEEE Press, Pittsburgh (2006)
Google Scholar
Lecouteux, B., Linarés, G., Nocéra, P., Bonastre, J.-F.: Imperfect transcript driven speech recognition. In: Proc. Interspeech (2006)
Google Scholar
Placeway, P., Lafferty, J.: Cheating with Imperfect Transcripts. In: Proceedings ICSLP (1996)
Google Scholar
Stan, A., Bell, P., King, S.: A grapheme-based method for automatic alignment of speech and text data. In: Proc. IEEE Workshop on Spoken Language Technology (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Speech Technology Center, Saint-Petersburg, Russia
Natalia A. Tomashenko & Yuri Y. Khokhlov

Authors

Natalia A. Tomashenko
View author publications
You can also search for this author in PubMed Google Scholar
Yuri Y. Khokhlov
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Applied Sciences, Department of Cybernetics, University of West Bohemia, Univerzitní 8, 306 14, Plzeň, Czech Republic
Miloš Železný
University of West Bohemia, 306 14, Pilsen, Czech Republic
Ivan Habernal
Speech and Multimodal Interfaces Laboratory, St. Petersburg Institute of Informatics and Automation for the Russian Academy of Sciences, 14-th line, 39, 199178, St. Petersburg, Russia
Andrey Ronzhin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tomashenko, N.A., Khokhlov, Y.Y. (2013). Fast Algorithm for Automatic Alignment of Speech and Imperfect Text Data. In: Železný, M., Habernal, I., Ronzhin, A. (eds) Speech and Computer. SPECOM 2013. Lecture Notes in Computer Science(), vol 8113. Springer, Cham. https://doi.org/10.1007/978-3-319-01931-4_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-01931-4_20
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01930-7
Online ISBN: 978-3-319-01931-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics