Acoustic Modelling Using Continuous Rational Kernels

Layton, Martin; Gales, Mark

doi:10.1007/s11265-006-0027-4

Martin Layton¹ &
Mark Gales¹

64 Accesses
5 Citations
Explore all metrics

Abstract

Many discriminative classification algorithms are designed for tasks where samples can be represented by fixed-length vectors. However, many examples in the fields of text processing, computational biology and speech recognition are best represented as variable-length sequences of vectors. Although several dynamic kernels have been proposed for mapping sequences of discrete observations into fixed-dimensional feature-spaces, few kernels exist for sequences of continuous observations. This paper introduces continuous rational kernels, an extension of standard rational kernels, as a general framework for classifying sequences of continuous observations. In addition to allowing new task-dependent kernels to be defined, continuous rational kernels allow existing continuous dynamic kernels, such as Fisher and generative kernels, to be calculated using standard weighted finite-state transducer algorithms. Preliminary results on both a large vocabulary continuous speech recognition (LVCSR) task and the TIMIT database are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

L.A. Rabiner, “A Tutorial on Hidden Markov Models and Selective Applications in Speech Recognition,” in Proc. of the IEEE, vol. 77, 1989, pp. 257-286, February.
D. Povey, Discriminative Training for Large Vocabulary Speech Recognition, Ph.D. thesis, University of Cambridge, July 2004.
V.N. Vapnik, Statistical Learning Theory, Wiley, 1998.
H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins, “Text Classification Using String Kernels,” J. Mach. Learn. Res., vol. 2, 2002, pp. 419–444.
Article MATH Google Scholar
K. Tsuda, T. Kin, and K. Asai, “Marginalized Kernels for Biological Sequences,” Bioinformatics, vol. 18, 2002, pp. S268–S275.
Google Scholar
T. Jaakkola and D. Hausser, “Exploiting Generative Models in Disciminative Classifiers,” in Advances in Neural Information Processing Systems 11, S.A. Solla and D.A. Cohn (Eds.), MIT, 1999, pp. 487–493.
N. Smith and M. Gales, “Speech Recognition using SVMs,” in Advances in Neural Information Processing Systems 14, T.G. Dietterich, S. Becker, and Z. Ghahramani (Eds.), MIT, 2002, pp. 1197–1204.
C. Cortes, P. Haffner, and M. Mohri, “Positive Definite Rational Kernels,” in 16th Annual Conference on Computational Learning Theory (COLT 2003), Washington DC, August 2003, pp. 656–670.
C. Cortes, P. Haffner, and M. Mohri, “Rational Kernels: Theory and Algorithms,” J. Mach. Learn. Res., vol. 5, 2004, pp. 1035–1062.
MathSciNet Google Scholar
M. Mohri, F. Pereira, and M. Riley, “Weighted Finite-state Transducers in Speech Recognition,” Comput. Speech Lang., vol. 16, 2002, pp. 69–88, January.
Article Google Scholar
F.C.N. Pereira and M.D. Riley, “Speech Recognition by Composition of Weighted Finite Automata,” in Finite-State Devices for Natural Language Processing, E. Roche and Y. Schabes (Eds.), MIT, 1997.
J.S. Garofolo et al., DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus CD-ROM, 1993.
N.D. Smith and M.J.F. Gales, “Using SVMs to Classify Variable Length Speech Patterns,” Tech. Rep. CUED/F-INFENG/TR.412, Department of Engineering, University of Cambridge, April 2002.
M.I. Layton, Augmented Statistical Models for Classifying Sequence Data, Ph.D. thesis, University of Cambridge, September 2006.
F. Rosenblatt, “The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain,” Psychol. Rev., vol. 65, no. 6, 1958, pp. 386–408.
Article MathSciNet Google Scholar
V. Venkataramani, S. Chakrabartty, and W. Byrne, “Support Vector Machines for Segmental Minimum Bayes Risk Decoding of Continuous Speech,” in ASRU 2003, 2003, pp. 13–18.
M. Mohri, “Finite-state Transducers in Language and Speech Processing,” Comput. Linguist., vol. 23, no. 2, 1997, pp. 269–311.
MathSciNet Google Scholar
M. Mohri, “Semiring Frameworks and Algorithms for Shortest-distance Problems,” J. Autom. Lang. Comb., vol. 7, 2002, pp. 321–350.
MATH MathSciNet Google Scholar
J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis, Cambridge University Press, 2004.
L. E. Baum and J. A. Eagon, “An Inequality with Applications to Statistical Estimation for Probabilistic Functions of Markov Processes and to a Model for Ecology,” Bull. Am. Math. Soc., vol. 73, 1967, pp. 360–363.
Article MATH MathSciNet Google Scholar
L.R. Bahl, P. Brown, P. de Souza, and R. Mercer, “Maximum Mutual Information Estimation of Hidden Markov Model Parameters for Speech Recognition,” in Proc. ICASSP, Tokyo, 1986.
O. Cappé, E. Moulines, and T. Rydén, Inference in Hidden Markov Models, Springer, 2005, Springer Series in Statistics.
G. Evermann, H.Y. Chan, M.J.F. Gales, B. Jia, D. Mrva, P.C. Woodland, and K. Yu, “Training LVCSR Systems on Thousands of Hours of Data,” in Proc. ICASSP, 2005, pp. 209–212.
L. Mangu, E. Brill, and A. Stolcke, “Finding Consensus among Words: Lattice-based Word Error Minimization,” in Proc. Eurospeech, 1999, pp. 495–498.
N.D. Smith, Using Augmented Statistical Models and Score Spaces for Classification, Ph.D. thesis, University of Cambridge, September 2003.
A. Gunawardana, M. Mahajan, A. Acero, and J.C. Platt, “Hidden Conditional Random Fields for Phone Classification,” in Interspeech, 2005.

Download references

Author information

Authors and Affiliations

Department of Engineering, University of Cambridge, Trumpington St., Cambridge, CB2 1PZ, UK
Martin Layton & Mark Gales

Authors

Martin Layton
View author publications
You can also search for this author in PubMed Google Scholar
Mark Gales
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Layton.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Layton, M., Gales, M. Acoustic Modelling Using Continuous Rational Kernels. J VLSI Sign Process Syst Sign Im 48, 67–82 (2007). https://doi.org/10.1007/s11265-006-0027-4

Download citation

Received: 12 April 2006
Revised: 01 September 2006
Accepted: 17 October 2006
Published: 05 May 2007
Issue Date: August 2007
DOI: https://doi.org/10.1007/s11265-006-0027-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Acoustic Modelling Using Continuous Rational Kernels

Abstract

Access this article

Similar content being viewed by others

A Decade of Discriminative Language Modeling for Automatic Speech Recognition

Effects of Frequency-Based Inter-frame Dependencies on Automatic Speech Recognition

Generalization of dimension-based statistical learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Acoustic Modelling Using Continuous Rational Kernels

Abstract

Access this article

Similar content being viewed by others

A Decade of Discriminative Language Modeling for Automatic Speech Recognition

Effects of Frequency-Based Inter-frame Dependencies on Automatic Speech Recognition

Generalization of dimension-based statistical learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation