Skip to main content

Comparative Evaluation of Speech-to-Text Software Based on Sociodemographic and Environmental Factors

  • Conference paper
  • First Online:
Advanced Research in Technologies, Information, Innovation and Sustainability (ARTIIS 2024)

Abstract

The proliferation of voice assistants for information retrieval is propelled by technological advancements and seamless integration across multiple devices. Nevertheless, these systems face persistent limitations in accuracy and comprehension, particularly with accents, dialects, and uncommon terminology. Additional challenges include the cost of these technologies and their reliance on internet connectivity. This study conducts a comprehensive evaluation of various low-cost speech-to-text transcription software, including Windows10, Google Docs, GBoard Android, Speech-Texter, and SpeechNotes. The analysis focuses on high-error criteria in text retrieval, such as proper names, homophones, neologisms, and multilingual usage. Key variables examined include user age, message duration, and ambient noise levels. Transcription quality is meticulously assessed to determine the efficacy of voice retrieval. Results reveal significant disparities among the software, with GBoard Android demonstrating superior accuracy and the lowest error rates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Deng, L., Liu, Y. eds: Deep Learning in Natural Language Processing. Springer Singapore, Singapore (2018). https://doi.org/10.1007/978-981-10-5209-5

  2. Roger, V., Farinas, J., Pinquier, J.: Deep neural networks for automatic speech processing: a survey from large corpora to limited data. J. Audio Speech Music Proc. 2022, 19 (2022). https://doi.org/10.1186/s13636-022-00251-w

    Article  MATH  Google Scholar 

  3. Richter, F.: Infographic: Smart Speaker Adoption Continues to Rise [Infographic]. Statista Daily Data (2020). https://www.statista.com/chart/16597/smart-speaker-ownership-in-the-united-states

  4. Morato, J., Sanchez-Cuadrado, S., Iglesias, A., Campillo, A., Fernández-Panadero, C.: Sustainable technologies for older adults. Sustainability 13, 8465 (2021). https://doi.org/10.3390/su13158465

    Article  Google Scholar 

  5. Kashinath, G., Kanhaiya, K., Vineet, K.: Speech-to-Text API Market. Allied Market Research, report code A09527 (2023)

    Google Scholar 

  6. Yu, D., Deng, L.: Automatic Speech Recognition: A Deep Learning Approach. Springer London, London (2015). https://doi.org/10.1007/978-1-4471-5779-3

  7. Watanabe, S., Delcroix, M., Metze, F., Hershey, J.R. eds: New Era for Robust Speech Recognition: Exploiting Deep Learning. Springer International Publishing: Imprint: Springer, Cham (2017)

    Google Scholar 

  8. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015). https://doi.org/10.1038/nature14539

    Article  MATH  Google Scholar 

  9. Errattahi, R., El Hannani, A., Ouahmane, H.: Automatic speech recognition errors detection and correction: a review. Procedia Comput. Sci. 128, 32–37 (2018). https://doi.org/10.1016/j.procs.2018.03.005

    Article  MATH  Google Scholar 

  10. Marrero, M., Urbano, J., Sánchez-Cuadrado, S., Morato, J., Gómez-Berbís, J.M.: Named entity recognition: fallacies, challenges and opportunities. Comput. Stand. Interfaces 35, 482–489 (2013). https://doi.org/10.1016/j.csi.2012.09.004

    Article  Google Scholar 

  11. Humes, L.E.: Factors underlying individual differences in speech-recognition threshold (SRT) in noise among older adults. Front. Aging Neurosci. 13, 702739 (2021). https://doi.org/10.3389/fnagi.2021.702739

    Article  Google Scholar 

  12. Ogun, S.: How to create a speech dataset for ASR, TTS, and other speech tasks [Blog] (2021). https://ogunlao.github.io/blog/2021/01/26/how-to-create-speech-dataset.html

  13. Tatman, R., Kasten, C.: Effects of talker dialect, gender & race on accuracy of Bing speech and Youtube automatic captions. In: Interspeech 2017, pp. 934–938. ISCA (2017). https://doi.org/10.21437/Interspeech.2017-1746

  14. Winata, G.I., et al.: Learning fast adaptation on cross-accented speech recognition. In: Interspeech 2020, pp. 1276–1280. ISCA (2020). https://doi.org/10.21437/Interspeech.2020-45

  15. Lu, X., Li, S., Fujimoto, M.: Automatic Speech Recognition. In: Kidawara, Y., Sumita, E., Kawai, H. (eds.) Speech-to-Speech Translation, pp. 21–38. Springer Singapore, Singapore (2020). https://doi.org/10.1007/978-981-15-0595-9_2

  16. Pentland, S.J., Fuller, C.M., Spitzley, L.A., Twitchell, D.P.: Does accuracy matter? Methodological considerations when using automated speech-to-text for social science research. Int. J. Soc. Res. Methodol. 26, 661–677 (2023). https://doi.org/10.1080/13645579.2022.2087849

    Article  Google Scholar 

  17. Pfeifer, V.A., Chilton, T.D., Grilli, M.D., Mehl, M.R.: How ready is speech-to-text for psychological language research? Evaluating the validity of AI-generated English transcripts for analyzing free-spoken responses in younger and older adults. Behav. Res. 56, 7621–7631 (2024). https://doi.org/10.3758/s13428-024-02440-1

    Article  Google Scholar 

  18. Klakow, D., Peters, J.: Testing the correlation of word error rate and perplexity. Speech Commun. 38, 19–28 (2002). https://doi.org/10.1016/S0167-6393(01)00041-3

    Article  MATH  Google Scholar 

  19. Durand, J.: Corpus Phonology. In: Oxford Research Encyclopedia of Linguistics. Oxford University Press (2017). https://doi.org/10.1093/acrefore/9780199384655.013.145

  20. Niemants, N.: Des enregistrements aux corpus: transcription et extraction de données d’interprétation en milieu médical. meta. 63, 665–694 (2019). https://doi.org/10.7202/1060168ar

  21. Ravanelli, M., Brakel, P., Omologo, M., Bengio, Y.: Light gated recurrent units for speech recognition. IEEE Trans. Emerg. Top. Comput. Intell. 2, 92–102 (2018). https://doi.org/10.1109/TETCI.2017.2762739

    Article  Google Scholar 

  22. Dias, G.: Dossier: IA & technologies du langage humain. Bulletin de l’AFIA 107, 6–9 (2020)

    Google Scholar 

  23. Blackley, S.V., Huynh, J., Wang, L., Korach, Z., Zhou, L.: Speech recognition for clinical documentation from 1990 to 2018: a systematic review. J. Am. Med. Inform. Assoc. 26, 324–338 (2019). https://doi.org/10.1093/jamia/ocy179

    Article  Google Scholar 

  24. Iancu, B.: Evaluating Google Speech-to-Text API's performance for Romanian e-learning resources. Informatica Economica 23(1), 17–25 (2019). https://ideas.repec.org/a/aes/infoec/v23y2019i1p17-25.html

  25. Rufino Morales, M.: Estudio comparativo de métodos de transcripción para corpus orales: el caso del español. Revista Nebrija de Lingüística Aplicada a la Enseñanza de Lenguas 14, 126–146 (2020). https://doi.org/10.26378/rnlael1429406

  26. Serna, Y., Morato, J, Sanchez-Cuadrado, S.: Evaluación de la comprensión de los paneles interpretativos en parajes naturales. Scire 24, 53–62 (2018). https://doi.org/10.54886/scire.v24i2.4568

Online Resources

  1. AssemblyAI: The #1 Speech-to-Text API for Developers. https://www.assemblyai.com/

  2. Cloud Speech-to-Text API - Marketplace - Google Cloud Platform. https://console.cloud.google.com/marketplace/product/google/speech.googleapis.com

  3. Google Documents: create and edit documents online for free. https://www.google.es/intl/es/docs/about/

  4. Gboard: Google's keyboard - Apps on Google Play. https://play.google.com/store/apps/details?id=com.google.android.inputmethod.latin&hl=es&gl=US

  5. Speech to Text Online Notepad. Free. Speechnotes. https://speechnotes.co/

  6. SpeechTexter. Type with your voice online. Speech Texter. https://www.speechtexter.com

Download references

Funding

Research partially funded by the R&D grant from the Autonomus Community of Madrid (PHS-2024/PH-HUM-313).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sonia Sanchez-Cuadrado .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Morato, J., Pedrero, A., Sanchez-Cuadrado, S. (2025). Comparative Evaluation of Speech-to-Text Software Based on Sociodemographic and Environmental Factors. In: Guarda, T., Portela, F., Augusto, M.F. (eds) Advanced Research in Technologies, Information, Innovation and Sustainability. ARTIIS 2024. Communications in Computer and Information Science, vol 2349. Springer, Cham. https://doi.org/10.1007/978-3-031-83432-5_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-83432-5_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-83431-8

  • Online ISBN: 978-3-031-83432-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics