Grappling with Web Technologies: The Problems of Remote Speech Recording

Tihelka, Daniel; Jůzová, Markéta; Vít, Jakub

doi:10.1007/978-3-030-60276-5_57

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12335))

Included in the following conference series:

International Conference on Speech and Computer

1561 Accesses

Abstract

Modern web browsers are becoming operating systems of their own kind, allowing unified access to the underlying hardware. The sound device can thus be used by web-based communication systems, such a Google meet, Zoom and others. This attracts the idea of using such capabilities to record a speech synthesis corpus through the web, with there being cases of use where it is really beneficial – for example, the building of personalised speech synthesis. The present paper shows that although it may appear easy, there are some dark corners to take care of.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Web audio concepts and usage. https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API. Accessed 15 June 2020
Conkie, A., Okken, T., Kim, Y.J., Fabbrizio, G.D.: Building text-to-speech voices in the cloud. In: LREC 2012, ELRA, Istanbul, Turkey, pp. 3317–3321 (2012)
Google Scholar
Grůber, M., Legát, M., Tihelka, D.: Corpus recording and checking on the recorded data. In: The 1st Young Researchers Conference on Applied Sciences, Západoçeská univerzita, Plzeň, pp. 174–179 (2007)
Google Scholar
Hanzlíçek, Z., Romportl, J., Matoušek, J.: Voice conservation: towards creating a speech-aid system for total laryngectomees. In: Kelemen, J., Romportl, J., Žáçková, E. (eds.) Beyond Artificial Intelligence: Contemplations, Expectations, Applications, Topics in Intelligent Engineering and Informatics, vol. 4, pp. 203–212. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34422-0_14
Chapter Google Scholar
Hanzlíček, Z., Vít, J., Tihelka, D.: LSTM-based speech segmentation for TTS synthesis. In: Ekštein, K. (ed.) TSD 2019. LNCS (LNAI), vol. 11697, pp. 361–372. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27947-9_31
Chapter Google Scholar
Jůzová, M., Tihelka, D., Matoušek, J., Hanzlíçek, Z.: Voice conservation and TTS system for people facing total laryngectomy. In: Interspeech 2017, pp. 3425–3426 (2017)
Google Scholar
Legát, M., Grůber, M., Matoušek, J.: The issue of checking the volume consistence of speech corpus during recording. In: The 1st Young Researchers Conference on Applied Sciences, Západoçeská univerzita, Plzeň, pp. 206–211 (2007)
Google Scholar
Malfrère, F., et al.: My-own-voice: a web service that allows you to create a text-to-speech voice from your own voice. In: Interspeech 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, pp. 1968–1969 (2016)
Google Scholar
Matoušek, J., Tihelka, D., Romportl, J.: Building of a speech corpus optimised for unit selection TTS synthesis. In: LREC 2008, Proceedings of 6th International Conference on Language Resources and Evaluation, ELRA, Marrakech, Morocco, pp. 1296–1299 (2008)
Google Scholar
Mertl, J., Žáçková, E., Řepová, B.: Quality of life of patients after total laryngectomy: the struggle against stigmatization and social exclusion using speech synthesis. Disabil. Rehabil. Assist. Technol. 13(4), 342–352 (2018)
Article Google Scholar
Miyane, Y.: Web audio recorder JavaScript library. https://github.com/higuma/web-audio-recorder-js. Accessed 15 June 2020
Stanislav, P., Šmídl, L., Švec, J.: An automatic training tool for air traffic control training. In: Interspeech 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, 8–12 September 2016, pp. 782–783 (2016)
Google Scholar
Tihelka, D., Hanzlíček, Z., Jůzová, M., Vít, J., Matoušek, J., Grůber, M.: Current state of text-to-speech system ARTIC: a decade of research on the field of speech technologies. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2018. LNCS (LNAI), vol. 11107, pp. 369–378. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00794-2_40
Chapter Google Scholar
Řepová, B., Zábrodský, M., Plzák, J., Kalfert, D., Matoušek, J., Betka, J.: Text-to-speech synthesis as an alternative communication means after total laryngectomy. Biomedical Papers of the Medical Faculty of the University Palacky (2020)
Google Scholar

Download references

Acknowledgements

This research was supported by the Technology Agency of the Czech Republic (project No. TH02010307), and by the grant of the University of West Bohemia, (project No. SGS-2019-027).

Author information

Authors and Affiliations

New Technologies for the Information Society, University of West Bohemia, Pilsen, Czech Republic
Daniel Tihelka
Department of Cybernetics, Faculty of Applied Sciences, University of West Bohemia, Pilsen, Czech Republic
Markéta Jůzová & Jakub Vít

Authors

Daniel Tihelka
View author publications
You can also search for this author in PubMed Google Scholar
Markéta Jůzová
View author publications
You can also search for this author in PubMed Google Scholar
Jakub Vít
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Tihelka .

Editor information

Editors and Affiliations

St. Petersburg Institute for Informatics and Automation, Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Institute for Applied and Mathematical Linguistics, Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tihelka, D., Jůzová, M., Vít, J. (2020). Grappling with Web Technologies: The Problems of Remote Speech Recording. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2020. Lecture Notes in Computer Science(), vol 12335. Springer, Cham. https://doi.org/10.1007/978-3-030-60276-5_57

Download citation

DOI: https://doi.org/10.1007/978-3-030-60276-5_57
Published: 29 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60275-8
Online ISBN: 978-3-030-60276-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics