A comparative case study of neural network training by using frame-level cost functions for automatic speech recognition purposes in Spanish

Becerra, Aldonso; Rosa, J. Ismael de la; González, Efrén; Pedroza, A. David; Escalante, N. Iracemi; Santos, Eduardo

doi:10.1007/s11042-020-08782-0

A comparative case study of neural network training by using frame-level cost functions for automatic speech recognition purposes in Spanish

Published: 27 March 2020

Volume 79, pages 19669–19715, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Aldonso Becerra ORCID: orcid.org/0000-0002-4274-4396¹,
J. Ismael de la Rosa¹,
Efrén González¹,
A. David Pedroza¹,
N. Iracemi Escalante² &
…
Eduardo Santos¹

192 Accesses
6 Citations
Explore all metrics

Abstract

Training procedures of a deep neural network are still an area with ample research possibilities and constant improvement either to increase its efficiency or its time performance. One of the lesser-addressed components is its objective function, which is an underlying aspect to consider when there is the necessity to achieve better error rates in the area of automatic speech recognition. The aim of this paper is to present two new variations of the frame-level cost function for training a deep neural network with the purpose of obtaining superior word error rates in speech recognition applied to a case study in Spanish. The first proposed function is a fusion between the boosted cross-entropy and the so called cross-entropy/log-posterior-ratio. The main idea is to jointly emphasize the prediction of difficult/crucial frames provided by a boosting factor and at the same time enlarge the distance between the target senone and its closest competitor. The second proposal is a fusion between the non-uniform mapped cross-entropy and the cross-entropy/log-posterior-ratio. This function utilizes both the mapped function to enhance the frames that have ambiguity in their belonging to specific senones and the log-posterior-ratio with the purpose of separating the target senone against the most competing tied tri-phone state. The proposed approaches are compared against those frame-level cost functions discussed in the state of the art. This comparative has been made by using a personalized mid-vocabulary speaker-independent voice corpus. This dataset is employed for the recognition of digit strings and personal name lists in Spanish from the northern central part of México on a connected-words phone dialing task. A relative word error rate improvement of 15.14% and 12.30% is obtained with the two proposed approaches, respectively, against the plain well-established cross-entropy loss function.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition: a survey

Article 10 November 2020

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Article 29 September 2022

References

Ali A, Zhang Y, Cardinal P, Dahak N, Vogel S, Glass J (2014) A complete kaldi recipe for building arabic speech recognition systems. In: Proceeedings of IEEE workshop spoken language technology (SLT). https://doi.org/10.1109/SLT.2014.7078629, pp 525–529
Allauzen C, Riley M, Schalkwyk J, Skut W, Mohri M (2007) Openfst: a general and efficient weighted finite-state transducer library. In: Proceedings of int conf. on implementation and application of automata (CIAA), pp 11–23
Almási AD, Woźniak S, Woźniak W, Cristea V, Leblebici Y, Engbersen T (2015) Review of advances in neural networks: neural design technology stack. Neurocomputing 174:31–41. https://doi.org/10.1016/j.neucom.2015.02.092
Article Google Scholar
Alpaydin E (2010) Introduction to machine learning. MIT Press, Massachusetts
MATH Google Scholar
Anusuya MA, Katti SK (2009) Speech recognition by machine: a review. Int J Comput Sci Inf Secur 6(2):181–205
Google Scholar
Astudillo RF, Correia J, Trancoso I (2015) Integration of DNN based speech enhancement and ASR. In: Proceedings of Interspeech, pp 3576–3580
Bacchiani M, Senior A, Heigold G (2014) Asynchronous, online, gmm-free training of a context dependent acoustic model for speech recognition. In: Proceedings of Interspeech, pp 1900–1904
Becerra A, de la Rosa J, González E (2016) A case study of speech recognition in Spanish: from conventional to deep approach. In: Proceedings of IEEE ANDESCON. https://doi.org/10.1109/ANDESCON.2016.7836212
Becerra A, de la Rosa J, González E (2017) Speech recognition using deep neural networks trained with non-uniform frame-level cost functions. In: Proceedings of IEEE international autumn meeting on power, electronics and computing (ROPEC). https://doi.org/10.1109/ROPEC.2017.8261588
Becerra A, de la Rosa J, González E (2018) Speech recognition in a dialog system: from conventional to deep processing. a case study applied to Spanish. Multimed Tools Appl 12(77):15,875–15,911. https://doi.org/10.1007/s11042-017-5160-5
Article Google Scholar
Becerra A, de la Rosa J, González E, Pedroza A, Escalante N (2018) Training deep neural networks with non-uniform frame-level cost function for automatic speech recognition. Multimed Tools Appl 20(77):27,231–27,267. https://doi.org/10.1007/s11042-018-5917-5
Article Google Scholar
Bengio Y (2009) Learning deep architectures for ai. Found Trends Mach Learn 2(1):1–127. https://doi.org/10.1561/2200000006
Article MATH Google Scholar
Bilmes J (2006) What hmms can do. IEICE Trans Inf Syst E E89-D(3):869–891
Google Scholar
Bishop C (2006) Pattern recognition and machine learning. Springer, New York
MATH Google Scholar
Bourlard H, Morgan N (1993) Connectionist speech recognition: a hybrid approach. Kluwer Academic Publishers, Norwell
Google Scholar
Bourlard H, Morgan N (1994) Connectionist speech recognition: a hybrid approach. Klumer Academic Publishers, Boston. https://doi.org/10.1007/978-1-4615-3210-1
Google Scholar
Burbea J, Rao R (1982) On the convexity of some divergence measures based on entropy functions. IEEE Trans Inf Theory 28(3):489–495
MathSciNet MATH Google Scholar
Cao W, Wang X, Ming Z, Gao J (2018) A review on neural networks with random weights. Neurocomputing 275:278–287. https://doi.org/10.1016/j.neucom.2017.08.040
Article Google Scholar
Chen X, Eversole A, Li G, Yu D, Seide F (2012) Pipelined back-propagation for context-dependent deep neural networks. In: Proceedings of Interspeech
Dahl GE, Yu D, Deng L, Acero A (2012) Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Trans Audio Speech Lang Process 20(1):30–42
Google Scholar
Deng L, Kenny P, Lennig M, Gupta V, Seitz F, Mermelstein P (1991) Phonemic hidden Markov models with continuous mixture output densities for large vocabulary word recognition. IEEE Trans Signal Process 39(7):1677–1681
Google Scholar
Deng L, Li X (2013) Machine learning paradigms for speech recognition: an overview. IEEE Trans Audio Speech, Lang Process 21(5):1060–1089
Google Scholar
Duda R, Hart P, Stork D (2001) Pattern classification. Wiley, New York
MATH Google Scholar
Gauvain J, Ch L (1994) Maximum a posteriori estimation for multivariate gaussian mixture observations of Markov chains. IEEE Trans Speech Audio Process 2 (2):291–298
Google Scholar
Ge Z, Iyer AN, Cheluvaraja S, Sundaram R, Ganapathiraju A (2017) Neural network based speaker classification and verification systems with enhanced features. In: Proceedings of intelligent systems conference
Golik P, Doetsch P (2013) Cross-entropy vs. squared error training: a theoretical and experimental comparison. In: Proceedings of Interspeech, pp 1756–1760
Hagan MT, Demuth HB, Beale MH, de Jesús O (2014) Neural network design. CreateSpace, US
Haykin S (2009) Neural networks and learning machines. Pearson Education, New Jersey
Google Scholar
Heigold G, Ney H, Schlüter R (2013) Investigations on an em-style optimization algorithm for discriminative training of hmms. IEEE Trans Audio Speech Lang Process 21(12):2616–2626
Google Scholar
Hinton G, Deng L, Yu D, Dahl GE, Mohamed A, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN, Kingsbury B (2012) Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Proc Mag 29(6):82–97
Google Scholar
Hinton G, Osindero S, Teh Y (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
MathSciNet MATH Google Scholar
Huang JT, Li J, Gong Y (2015) An analysis of convolutional neural networks for speech recognition. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, pp 4989–4993
Huang Y, Yu D, Liu C, Gong Y (2014) A comparative analytic study on the Gaussian mixture and context dependent deep neural network hidden Markov models. In: Proceedings of Interspeech, pp 1895–1899
Huang Z, Li J, Ch W, Ch L (2014) Beyond cross-entropy: towards better frame-level objective functions for deep neural network training in automatic speech recognition. In: Proceedings of Interspeech, pp 1214–1218
Hwang M, Huang X (1993) Shared-distribution hidden Markov models for speech recognition. IEEE Trans Audio Speech Lang Process 1(4):414–420
Google Scholar
Jaitly N (2014) Exploring deep learning methods for discovering features in speech signals. Ph.D. thesis, University of Toronto
Juang BH, Levinson SE, Sondhi M (1986) Maximum likelihood estimation for multivariate mixture observations of Markov chains. IEEE Trans Inform Theory IT-32(2):307–309
Google Scholar
Jurafsky D, Martin J (2008) Speech and language processing: an introduction to natural language processing, computational linguistics and speech recognition. Pearson, NJ
Kingsbury B, Sainath TN, Soltau H (2012) Scalable minimum Bayes risk training of deep neural network acoustic models using distributed hessian-free optimization. In: Proceedings of InterSpeech
Lad F, Sanfilippo G, Agró G (2015) Extropy: complementary dual of entropy. Stat Sci 30(1):40–58
MathSciNet MATH Google Scholar
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444
Google Scholar
Li G, Deng L, Tian L, Cui H, Han W, Pei J, Shi L (2017) Training deep neural networks with discrete state transition. Neurocomputing 272:154–162. https://doi.org/10.1016/j.neucom.2017.06.058
Article Google Scholar
Li X, Wu X (2014) Labeling unsegmented sequence data with dnn-hmm and its application for speech recognition. In: Proceedings of int. symp. on Chinese spoken language processing (ISCSLP)
Li X, Yang Y, Pang Z, Wu X (2015) A comparative study on selecting acoustic modeling units in deep neural networks based large vocabulary chinese speech recognition. Neurocomputing 170(C):251–256
Google Scholar
Liao Y, Lee H, Lee L (2015) Towards structured deep neural network for automatic speech recognition. In: Proceedings of IEEE conference of automatic speech recognition and understanding workshop (ASRU). https://doi.org/10.1109/ASRU.2015.7404786
Liu W, Wang Z, Liu X, Zeng N, Liu Y, Alsaadi F (2017) A survey of deep neural network architectures and their applications. Neurocomputing 234:11–26. https://doi.org/10.1016/j.neucom.2016.12.038
Article Google Scholar
M G, Young SJ (2007) The application of hidden Markov models in speech recognition. Found Trends Signal Process 1(3):195–304
MATH Google Scholar
McLachlan G (1988) Mixture models. Marcel Dekker, New York
MATH Google Scholar
Miao Y, Metze F (2013) Improving low-resource cd-dnn-hmm using dropout and multilingual dnn training. In: Proceedings of InterSpeech, pp 2237–2241
Mohamed A, Dahl GE, Hinton G (2012) Acoustic modeling using deep belief networks. IEEE Trans Audio Speech Lang Process 20(1):14–22
Google Scholar
Morgan N, Bourlard H (1995) An introduction to hybrid hmm/connectionist continuous speech recognition. IEEE Signal Process Mag 12(3):25–42
Google Scholar
Noguchi H, Miura K, Fujinaga T, Sugahara T, Kawaguchi H, Yoshimoto M (2011) Vlsi architecture of gmm processing and viterbi decoder for 60,000-word real-time continuous speech recognition. IEICE Trans Electron E94C (4):458–467
Google Scholar
Pan J, Liu C, Wang Z, Hu Y, Jiang H (2012) Investigation of deep neural networks (dnn) for large vocabulary continuous speech recognition: why dnn surpass gmms in acoustic modeling. In: Proceedings of international symposium on chinese spoken language processing, pp 301–305
Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O, Goel N, Hannemann M, Motlicek P, Qian Y, Schwarz P, Silovsky J, Stemmer G, Vesely K (2011) The kaldi speech recognition toolkit. In: Proceedings of IEEE automatic speech recognition and understanding workshop (ASRU)
Prieto A, Prieto B, Ortigosa EM, Ros E, Pelayo F, Ortega J, Rojas I (2016) Neural networks: an overview of early research, current frameworks and new challenges. Neurocomputing 214:242–268. https://doi.org/10.1016/j.neucom.2016.06.014
Article Google Scholar
Rabiner L (1989) A tutorial on hidden Markov models and selected applications in speech recognition. In: Proceedings of IEEE, vol 77, pp 257–286
Rabiner L, Juang B (1993) Fundamentals of speech recognition. Prentice-Hall, New Jersey
Google Scholar
Rabiner L, Schafer R (2007) Introduction to digital speech processing. Found Trends Signal Process 1(1-2):1–194
MATH Google Scholar
Rao R (1984) Use of diversity and distance measures in the analysis of qualitative data. In: Van Vark GN, Howells WW (eds) Multivariate statistical methods in physical anthropology. D. Reidel Publishing Company, Dordrecht, pp 49–67
Rath S, Povey D, Vesel K, Cernock J (2013) Improved feature processing for deep neural networks. In: Proceedings of Interspeech, pp 109–113
Ray J, Thompson B, Shen W (2014) Comparing a high and low-level deep neural network implementation for automatic speech recognition. In: Proceedings of workshop for high performance technical computing in dynamic languages (HPTCDL), pp 41–46
Reynolds DA, Quatieri TF, Trb D (2000) Speaker verification using adapted gaussian mixture models. Digital Signal Process 10(1):19–41
Google Scholar
Richard MD, Lippmann RP (1991) Neural network classifiers estimate bayesian a posteriori probabilities. Neural Comput 3(4):461–483. https://doi.org/10.1162/neco.1991.3.4.461
Article Google Scholar
Robinson T (1994) An application of recurrent nets to phone probability estimation. IEEE Trans Neural Netw 5(3):1–16
Google Scholar
Sainath T, Kingsbury B, Mohamed AR, E Dahl G, Saon G, Soltau H, Beran T, Aravkin A, Ramabhadran B (2013) Improvements to deep convolutional neural networks for lvcsr. In: Proceedings of IEEE conference of automatic speech recognition and understanding workshop (ASRU). https://doi.org/10.1109/ASRU.2013.6707749
Sainath TN, Kingsbury B, Ramabhadran B (2012) Improving training time of deep belief networks through hybrid pre-training and larger batch sizes. In: Proceedings of neural information processing systems, workshop on log-linear models
Sainath TN, Kingsbury B, Ramabhadran B, Fousek P, Novak P, Mohamed A (2011) Making deep belief networks effective for large vocabulary continuous speech recognition. In: Proceedings of IEEE conference of automatic speech recognition and understanding workshop (ASRU), pp 30–35. https://doi.org/10.1109/ASRU.2011.6163900
Sainath TN, Kingsbury B, Soltau H, Ramabhadran B (2013) Optimization techniques to improve training speed of deep neural networks for large speech tasks. IEEE Trans Audio Speech Lang Process 21(11):2267–2276
Google Scholar
Sainath TN, Mohamed A, Kingsbury B, Ramabhadran B (2013) Deep convolutional neural networks for lvcsr. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, pp 8614–8618
Scowen R (1993) Extended bnf - generic base standards. In: Proceedings of software engineering standards symp, pp 25–34
Seide F, Li G, Chen X, Yu D (2011) Feature engineering in context-dependent deep neural networks for conversational speech transcription. In: Proceedings of IEEE conference of automatic speech recognition and understanding workshop (ASRU), pp 24–29
Seide F, Li G, Yu D (2011) Conversational speech transcription using context-dependent deep neural networks. In: Proceedings of Interspeech, pp 437–440
Seki H, Yamamoto K, Nakagawa S (2014) Comparison of syllable-based and phoneme-based dnn-hmm in japanese speech recognition. In: Proceedings of int conf. of advanced informatics concept, theory and application (ICAICTA), pp 249–254
Seltzer ML, Yu D, Wang Y (2013) An investigation of deep neural networks for noise robust speech recognition. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, pp 7398–7402
Senior A, Heigold G, Bacchiani M, Liao H (2014) Gmm-free dnn training. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, pp 5639–5643
Siniscalchi SM, Svendsen T, Lee CH (2014) An artificial neural network approach to automatic speech processing. Neurocomputing 140:326–338
Google Scholar
Su H, Li G, Yu D, Seide F (2013) Error back propagation for sequence training of context-dependent deep networks for conversational speech transcription. In: Proceeedings of IEEE international conference on acoustics, speech and signal processing, pp 6664–6668
Trentin E, Gori M (2001) A survey of hybrid ann/hmm models for automatic speech recognition. Neurocomputing 37(1-4):91–126
MATH Google Scholar
Vesely K, Ghoshal A, Burget L, Povey D (2013) Sequence-discriminative training of deep neural networks. In: Proceedings of Interspeech, pp 2345–2349
Vesely K, Hannemann M, Burget L (2013) Semi-supervised training of deep neural networks. In: Proceedings of IEEE conference of automatic speech recognition and understanding workshop (ASRU), pp 267–272
Veselý K, Vesel K (2010) Parallel training of neural networks for speech recognition. In: Proceedings of Interspeech, pp 2934–2937. https://doi.org/10.1007/b100511
Vincent P, Larochelle H, Bengio Y, Manzagol P (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of int conf on machine learning (ICML), pp 1096–1103
Wang G, Sim KC (2011) Sequential classification criteria for NNs in automatic speech recognition. In: Proceedings of Interspeech, pp 441–444
Wang X, Wang L, Chen J, Wu L (2016) Toward a better understanding of deep neural network∖r∖nBased acoustic modelling: an empirical investigation. In: Proceedings of 30th conference on artificial intelligence (AAAI 2016), pp 2173–2179
Wei W, van Vuuren S (1998) Improved neural network training of inter-word context. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, pp 1520–6149. https://doi.org/10.1109/ICASSP.1998.674476
Wiesler S, Golik P, Schluter R, Ney H (2015) Investigations on sequence training of neural networks. In: Proceedings of IEEE international conference on acoustics,speech and signal processing, pp 4565–4569
Xu Y, Du J, Dai LR, Lee C (2014) An experimental study on speech enhancement based on deep neural networks. IEEE Signal Process Lett 21(1):1070–9908
Google Scholar
Xue S, Abdel-Hamid O, Jiang H, Dai L, Liu Q (2014) Fast adaptation of deep neural network based on discriminant codes for speech recognition. IEEE Trans Audio Speech Lang Process 22(12):1713–1725
Google Scholar
Yang Z, Zhong A, Carass A, Ying SH, Prince JL (2014) Deep learning for cerebellar ataxia classification and functional score regression. Lect Notes Comput Sci 8679:68–76
Google Scholar
Yao K, Yu D, Seide F, Su H, Deng L, Gong Y (2012) Adaptation of context-dependent deep neural networks for automatic speech recognition. In: Proceedings of IEEE spoken language technology workshop (SLT), pp 366–369
Young S (1996) Large vocabulary continuous speech recognition: a review. IEEE Signal Process Mag 13(5):45–57
Google Scholar
Young S (2008) Hmms and related speech recognition technologies. In: Benesty J (ed) Springer handbook of speech processing. Springer, Berlin, pp 539–558
Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P (2006) The HTK book (for version 3.4). Cambridge University Engineering Department, UK
Yu D, Deng L (2015) Automatic speech recognition: a deep learning approach. Springer, London
MATH Google Scholar
Yu D, Deng L, Dahl GE (2010) Roles of pretraining and fine-tuning in context-dependent dnn-hmms for real-world speech recognition. In: Proceedings of NIPS workshop on deep learning and unsupervised feature learning
Yu D, Seide G, Li G, Deng L (2012) Exploiting sparseness in deep neural networks for large vocabulary speech recognition. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, pp 4409–4412
Zeng Z, Liang N, Yang X, Hoi S (2018) Multi-target deep neural networks: Theoretical analysis and implementation. Neurocomputing 273:634–642. https://doi.org/10.1016/j.neucom.2017.08.044
Article Google Scholar
Zhang C, Woodland PC (2014) Standalone training of context-dependent deep neural network acoustic models. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, pp 5597–5601
Zhang G (2000) Neural networks for classification: a survey. IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews) 30(4):451–462. https://doi.org/10.1109/5326.897072
Article Google Scholar
Zhang T (2004) Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of int. conf. on machine learning (ICML), pp 919–926
Zhao R, Li J, Gong Y (2014) Variable-component deep neural network for robust speech recognition. In: Proceedings of Interspeech
Zhou P, Jiang H, Dai L, Hu Y, Liu Q (2015) State-clustering based multiple deep neural networks modeling approach for speech recognition. IEEE Trans Audio Speech Lang Process 23(4):631–642
Google Scholar

Download references

Author information

Authors and Affiliations

Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Av. López Velarde No. 801, Col. Centro, Zacatecas, C.P. 98068, México
Aldonso Becerra, J. Ismael de la Rosa, Efrén González, A. David Pedroza & Eduardo Santos
Department of Basic Sciences, Instituto Tecnológico de Pabellón de Arteaga, Carretera a la Estación de Rincón KM 1, Pabellón de Arteaga, Ags., C.P. 20670, México
N. Iracemi Escalante

Authors

Aldonso Becerra
View author publications
You can also search for this author in PubMed Google Scholar
J. Ismael de la Rosa
View author publications
You can also search for this author in PubMed Google Scholar
Efrén González
View author publications
You can also search for this author in PubMed Google Scholar
A. David Pedroza
View author publications
You can also search for this author in PubMed Google Scholar
N. Iracemi Escalante
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo Santos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aldonso Becerra.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Becerra, A., Rosa, J.I.d., González, E. et al. A comparative case study of neural network training by using frame-level cost functions for automatic speech recognition purposes in Spanish. Multimed Tools Appl 79, 19669–19715 (2020). https://doi.org/10.1007/s11042-020-08782-0

Download citation

Received: 25 January 2019
Accepted: 18 February 2020
Published: 27 March 2020
Issue Date: July 2020
DOI: https://doi.org/10.1007/s11042-020-08782-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comparative case study of neural network training by using frame-level cost functions for automatic speech recognition purposes in Spanish

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A comparative case study of neural network training by using frame-level cost functions for automatic speech recognition purposes in Spanish

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation