Comparative Evaluation of Speech-to-Text Software Based on Sociodemographic and Environmental Factors

Morato, Jorge; Pedrero, Alejandro; Sanchez-Cuadrado, Sonia

doi:10.1007/978-3-031-83432-5_20

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2349))

Included in the following conference series:

International Conference on Advanced Research in Technologies, Information, Innovation and Sustainability

74 Accesses

Abstract

The proliferation of voice assistants for information retrieval is propelled by technological advancements and seamless integration across multiple devices. Nevertheless, these systems face persistent limitations in accuracy and comprehension, particularly with accents, dialects, and uncommon terminology. Additional challenges include the cost of these technologies and their reliance on internet connectivity. This study conducts a comprehensive evaluation of various low-cost speech-to-text transcription software, including Windows10, Google Docs, GBoard Android, Speech-Texter, and SpeechNotes. The analysis focuses on high-error criteria in text retrieval, such as proper names, homophones, neologisms, and multilingual usage. Key variables examined include user age, message duration, and ambient noise levels. Transcription quality is meticulously assessed to determine the efficacy of voice retrieval. Results reveal significant disparities among the software, with GBoard Android demonstrating superior accuracy and the lowest error rates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Deng, L., Liu, Y. eds: Deep Learning in Natural Language Processing. Springer Singapore, Singapore (2018). https://doi.org/10.1007/978-981-10-5209-5
Roger, V., Farinas, J., Pinquier, J.: Deep neural networks for automatic speech processing: a survey from large corpora to limited data. J. Audio Speech Music Proc. 2022, 19 (2022). https://doi.org/10.1186/s13636-022-00251-w
Article MATH Google Scholar
Richter, F.: Infographic: Smart Speaker Adoption Continues to Rise [Infographic]. Statista Daily Data (2020). https://www.statista.com/chart/16597/smart-speaker-ownership-in-the-united-states
Morato, J., Sanchez-Cuadrado, S., Iglesias, A., Campillo, A., Fernández-Panadero, C.: Sustainable technologies for older adults. Sustainability 13, 8465 (2021). https://doi.org/10.3390/su13158465
Article Google Scholar
Kashinath, G., Kanhaiya, K., Vineet, K.: Speech-to-Text API Market. Allied Market Research, report code A09527 (2023)
Google Scholar
Yu, D., Deng, L.: Automatic Speech Recognition: A Deep Learning Approach. Springer London, London (2015). https://doi.org/10.1007/978-1-4471-5779-3
Watanabe, S., Delcroix, M., Metze, F., Hershey, J.R. eds: New Era for Robust Speech Recognition: Exploiting Deep Learning. Springer International Publishing: Imprint: Springer, Cham (2017)
Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015). https://doi.org/10.1038/nature14539
Article MATH Google Scholar
Errattahi, R., El Hannani, A., Ouahmane, H.: Automatic speech recognition errors detection and correction: a review. Procedia Comput. Sci. 128, 32–37 (2018). https://doi.org/10.1016/j.procs.2018.03.005
Article MATH Google Scholar
Marrero, M., Urbano, J., Sánchez-Cuadrado, S., Morato, J., Gómez-Berbís, J.M.: Named entity recognition: fallacies, challenges and opportunities. Comput. Stand. Interfaces 35, 482–489 (2013). https://doi.org/10.1016/j.csi.2012.09.004
Article Google Scholar
Humes, L.E.: Factors underlying individual differences in speech-recognition threshold (SRT) in noise among older adults. Front. Aging Neurosci. 13, 702739 (2021). https://doi.org/10.3389/fnagi.2021.702739
Article Google Scholar
Ogun, S.: How to create a speech dataset for ASR, TTS, and other speech tasks [Blog] (2021). https://ogunlao.github.io/blog/2021/01/26/how-to-create-speech-dataset.html
Tatman, R., Kasten, C.: Effects of talker dialect, gender & race on accuracy of Bing speech and Youtube automatic captions. In: Interspeech 2017, pp. 934–938. ISCA (2017). https://doi.org/10.21437/Interspeech.2017-1746
Winata, G.I., et al.: Learning fast adaptation on cross-accented speech recognition. In: Interspeech 2020, pp. 1276–1280. ISCA (2020). https://doi.org/10.21437/Interspeech.2020-45
Lu, X., Li, S., Fujimoto, M.: Automatic Speech Recognition. In: Kidawara, Y., Sumita, E., Kawai, H. (eds.) Speech-to-Speech Translation, pp. 21–38. Springer Singapore, Singapore (2020). https://doi.org/10.1007/978-981-15-0595-9_2
Pentland, S.J., Fuller, C.M., Spitzley, L.A., Twitchell, D.P.: Does accuracy matter? Methodological considerations when using automated speech-to-text for social science research. Int. J. Soc. Res. Methodol. 26, 661–677 (2023). https://doi.org/10.1080/13645579.2022.2087849
Article Google Scholar
Pfeifer, V.A., Chilton, T.D., Grilli, M.D., Mehl, M.R.: How ready is speech-to-text for psychological language research? Evaluating the validity of AI-generated English transcripts for analyzing free-spoken responses in younger and older adults. Behav. Res. 56, 7621–7631 (2024). https://doi.org/10.3758/s13428-024-02440-1
Article Google Scholar
Klakow, D., Peters, J.: Testing the correlation of word error rate and perplexity. Speech Commun. 38, 19–28 (2002). https://doi.org/10.1016/S0167-6393(01)00041-3
Article MATH Google Scholar
Durand, J.: Corpus Phonology. In: Oxford Research Encyclopedia of Linguistics. Oxford University Press (2017). https://doi.org/10.1093/acrefore/9780199384655.013.145
Niemants, N.: Des enregistrements aux corpus: transcription et extraction de données d’interprétation en milieu médical. meta. 63, 665–694 (2019). https://doi.org/10.7202/1060168ar
Ravanelli, M., Brakel, P., Omologo, M., Bengio, Y.: Light gated recurrent units for speech recognition. IEEE Trans. Emerg. Top. Comput. Intell. 2, 92–102 (2018). https://doi.org/10.1109/TETCI.2017.2762739
Article Google Scholar
Dias, G.: Dossier: IA & technologies du langage humain. Bulletin de l’AFIA 107, 6–9 (2020)
Google Scholar
Blackley, S.V., Huynh, J., Wang, L., Korach, Z., Zhou, L.: Speech recognition for clinical documentation from 1990 to 2018: a systematic review. J. Am. Med. Inform. Assoc. 26, 324–338 (2019). https://doi.org/10.1093/jamia/ocy179
Article Google Scholar
Iancu, B.: Evaluating Google Speech-to-Text API's performance for Romanian e-learning resources. Informatica Economica 23(1), 17–25 (2019). https://ideas.repec.org/a/aes/infoec/v23y2019i1p17-25.html
Rufino Morales, M.: Estudio comparativo de métodos de transcripción para corpus orales: el caso del español. Revista Nebrija de Lingüística Aplicada a la Enseñanza de Lenguas 14, 126–146 (2020). https://doi.org/10.26378/rnlael1429406
Serna, Y., Morato, J, Sanchez-Cuadrado, S.: Evaluación de la comprensión de los paneles interpretativos en parajes naturales. Scire 24, 53–62 (2018). https://doi.org/10.54886/scire.v24i2.4568

Online Resources

AssemblyAI: The #1 Speech-to-Text API for Developers. https://www.assemblyai.com/
Cloud Speech-to-Text API - Marketplace - Google Cloud Platform. https://console.cloud.google.com/marketplace/product/google/speech.googleapis.com
Google Documents: create and edit documents online for free. https://www.google.es/intl/es/docs/about/
Gboard: Google's keyboard - Apps on Google Play. https://play.google.com/store/apps/details?id=com.google.android.inputmethod.latin&hl=es&gl=US
Speech to Text Online Notepad. Free. Speechnotes. https://speechnotes.co/
SpeechTexter. Type with your voice online. Speech Texter. https://www.speechtexter.com

Download references

Funding

Research partially funded by the R&D grant from the Autonomus Community of Madrid (PHS-2024/PH-HUM-313).

Author information

Authors and Affiliations

Carlos III University, Avda. Universidad, 30, 28911, Leganes, Spain
Jorge Morato & Alejandro Pedrero
Complutense University, C/ Santisima Trinidad 37, 28010, Madrid, Spain
Sonia Sanchez-Cuadrado

Authors

Jorge Morato
View author publications
You can also search for this author in PubMed Google Scholar
Alejandro Pedrero
View author publications
You can also search for this author in PubMed Google Scholar
Sonia Sanchez-Cuadrado
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sonia Sanchez-Cuadrado .

Editor information

Editors and Affiliations

Santa Elena, Systems Dept, Universidad Estatal Peninsula de Santa Elena, La Libertad, Ecuador
Teresa Guarda
University of Minho, Guimarães, Portugal
Filipe Portela
BITrum Research Group, León, Spain
Maria Fernanda Augusto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Morato, J., Pedrero, A., Sanchez-Cuadrado, S. (2025). Comparative Evaluation of Speech-to-Text Software Based on Sociodemographic and Environmental Factors. In: Guarda, T., Portela, F., Augusto, M.F. (eds) Advanced Research in Technologies, Information, Innovation and Sustainability. ARTIIS 2024. Communications in Computer and Information Science, vol 2349. Springer, Cham. https://doi.org/10.1007/978-3-031-83432-5_20

Download citation

DOI: https://doi.org/10.1007/978-3-031-83432-5_20
Published: 05 March 2025
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-83431-8
Online ISBN: 978-3-031-83432-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics