Application of EαNets to Feature Recognition of Articulation Manner in Knowledge-Based Automatic Speech Recognition

Siniscalchi, Sabato M.; Li, Jinyu; Pilato, Giovanni; Vassallo, Giorgio; Clements, Mark A.; Gentile, Antonio; Sorbello, Filippo

doi:10.1007/11731177_21

Application of EαNets to Feature Recognition of Articulation Manner in Knowledge-Based Automatic Speech Recognition

Sabato M. Siniscalchi^20,22,
Jinyu Li²⁰,
Giovanni Pilato²¹,
Giorgio Vassallo²²,
Mark A. Clements²⁰,
Antonio Gentile²² &
…
Filippo Sorbello²²

Conference paper

826 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3931))

Abstract

Speech recognition has become common in many application domains. Incorporating acoustic-phonetic knowledge into Automatic Speech Recognition (ASR) systems design has been proven a viable approach to rise ASR accuracy. Manner of articulation attributes such as vowel, stop, fricative, approximant, nasal, and silence are examples of such knowledge. Neural networks have already been used successfully as detectors for manner of articulation attributes starting from representations of speech signal frames. In this paper, a set of six detectors for the above mentioned attributes is designed based on the E-αNet model of neural networks. This model was chosen for its capability to learn hidden activation functions that results in better generalization properties. Experimental set-up and results are presented that show an average 3.5% improvement over a baseline neural network implementation.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Article Google Scholar
Haykin, S.: Neural Networks: a Comprehensive Foundation, 2nd edn. Prentice-Hall, Englewood Cliffs (1998)
MATH Google Scholar
Kirchhoff, K.: Combining articulatory and acoustic information for speech recognition in noisy and reverberant environments. In: Proc. ICSLP 1998, Sydney, Australia (1998)
Google Scholar
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllable word recognition in continuously spoken sentences. IEEE Trans. On Acoust., Speech and Signal Process. 28(4), 357–366 (1980)
Article Google Scholar
Lee, K.F., Hon, H.W.: Speaker-independent phone recognition using hiddenMarkov models. IEEE Trans. On Acoust., Speech and Signal Process. 37(11), 1641–1648 (1989)
Article Google Scholar
Li, J., Tsao, Y., Lee, C.-H.: A study on knowledge source integration for candidate rescoring in automatic speech recognition. In: Proceedings of the International Conference on Spoken Language Processing, Sydney, Australia, pp. 891–894 (December 1998)
Google Scholar
Lippmann, R.P.: Speech recognition by machines and humans. Speech Communication 22(1), 1–15 (1997)
Article Google Scholar
Gaglio, S., Pilato, G., Sorbello, F., Vassallo, G.: Using the Hermite Regression Formula to Design a Neural Architecture with Automatic Learning of the Hidden Activation Functions. In: Lamma, E., Mello, P. (eds.) AI*IA 1999. LNCS (LNAI), vol. 1792, pp. 226–237. Springer, Heidelberg (2000)
Chapter Google Scholar
Pilato, G., Sorbello, F., Vassallo, G.: An Innovative Way to Measure the Quality of a Neural Network without the Use of the Test Set. IJACI International Journal of Advanced Computational Intelligence 5(1), 31–36 (2001)
Article Google Scholar
Cirasa, A., Pilato, G., Sorbello, F., Vassallo, G.: An Enhanced Version of the aNet Architecture: Automatic Pruning of the Hermite Orthonormal Functions. In: Atti del Workshop Apprendimento e Percezione nei Sistemi Robotici, Parma, Italy, pp. 29–30 (1999)
Google Scholar
Lee, C.-H.: From knowledge-ignorant to knowledge-rich modeling: a new speech research paradigm for next generation automatic speech recognition. In: Proc. ICSLP (2004)
Google Scholar
Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S., Dahlgren, N.L.: DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus. U.S. Dept. of Commerce, NIST, Gaithersburg, MD (February 1993)
Book Google Scholar

Download references

Author information

Authors and Affiliations

Center for Signal and Image Processing, School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, Georgia, 30332, United States of America
Sabato M. Siniscalchi, Jinyu Li & Mark A. Clements
Istituto di CAlcolo e Reti ad alte prestazioni, Italian National Research Council, Viale delle Scienze (Edif. 11), 90128, Palermo, Italy
Giovanni Pilato
Dipartimento di Ingegneria Informatica, Universita’ degli studi di Palermo, V.le delle Scienze (Edif. 6), 90128, Palermo, Italy
Sabato M. Siniscalchi, Giorgio Vassallo, Antonio Gentile & Filippo Sorbello

Authors

Sabato M. Siniscalchi
View author publications
You can also search for this author in PubMed Google Scholar
Jinyu Li
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Pilato
View author publications
You can also search for this author in PubMed Google Scholar
Giorgio Vassallo
View author publications
You can also search for this author in PubMed Google Scholar
Mark A. Clements
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Gentile
View author publications
You can also search for this author in PubMed Google Scholar
Filippo Sorbello
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimentimento di Scienze dell’Informazione, via Comelico 39/41, 20135, Milano, Italy
Bruno Apolloni
Dipartimento di Fisica “E.R. Caianiello”, Università degli Studi di Salerno, Via S. Allende, 84081, Baronissi (SA), Italy
Maria Marinaro
Department of Mathematics and Computer Science, University of Catania, Viale A. Doria 6, 95125, Catania, Italy
Giuseppe Nicosia
Department of Mathematics and Informatics, University of Salerno, Via Ponte Don Melillo, 84084, Fisciano (SA), Italy
Roberto Tagliaferri

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Siniscalchi, S.M. et al. (2006). Application of EαNets to Feature Recognition of Articulation Manner in Knowledge-Based Automatic Speech Recognition. In: Apolloni, B., Marinaro, M., Nicosia, G., Tagliaferri, R. (eds) Neural Nets. WIRN NAIS 2005 2005. Lecture Notes in Computer Science, vol 3931. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11731177_21

Download citation

DOI: https://doi.org/10.1007/11731177_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33183-4
Online ISBN: 978-3-540-33184-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics