Abstract
We have described a series of predictive models which have been developed for capturing some kind of dependency inside non stationary sequences. Although the precise motivations and the inspiration sources for these different models have been multi-fold, they are aimed at the same goal. Other attempts have been developed which we have not described here. An important class of models which uses parametric trajectories is that of Segment Models, a review and a comparison with HMMs may be found in [37]. Up to now, predictive models have not led to better results than classical multi-gaussian HMMs. Most of the time, the experiments reported by the different authors are performed on small sized or limited complexity problems. However, some authors also report excellent performances of some predictive models on different tasks. In the second part of the paper, we have described a non linear predictive HMM, which is based on regressive neural networks. We have presented experiments on two relatively large tasks where the model reaches state of the art performances.
Preview
Unable to display preview. Download preview PDF.
References
Furui S., 1986, Speaker independent isolated word recognition using dynamic features of speech spectrum, IEEE T. ASSP, 34, 52–59.
Poritz A.B., 1982: linear predictive HMMs and the speech signal, ICASSP, Vol. 2, 1291–1294.
Wellekens C., 1987, Explicit time correlation in hidden Markov models for speech recognition” ICASSP'87, pp 384–386.
Brown P.F., 1987, The acoustic modeling problem in automatic speech recognition”, PhD thesis, Carnegie Mellon University.
Juang B.H., Rabiner L.R., 1985, Mixture autoregressive hidden Markov models for speech signals In IEEE T. ASSP, Vol. 33, Nℴ6, pp 1404–1413, dec.
Kenny P., Lennig M., Memelstein P., 1990: A linear predictive HMM for Vector-valued Observations with application to speech recognition, IEEE Trans. on Acoustics Speech and Signal Processing, ASSP-38, 2, pp 220–225.
Woodland P.C., 1992, Hidden Markov models using vector linear predictors and discriminative output distributions” ICASSP'92, pp 509–512.
Tishby N., 1991: on the application of mixture AR HMMs to text-independent speaker recognition, IEEE Trans. on Signal Processing, Vol. 39, Nℴ 3, March 91.
Kawabata T., 1993: speaker-independent speech recognition using nonlinear predictor codebooks, ICASSP.
Artières T., Gallinari P., 1995: multi-state predictive neural models for text-independent speaker identification, Eurospeech 95.
Mellouk A., Gallinari P., 1993:“A discriminative neural prediction system for speech recognition”, ICASSP 93, ppII 553–536.
Deng L., Hassanein H., Elsmary M., 1994, Analysis of the correlation structure for a neural predictive model with application to speech recognition, Neural Networks, Vol. 7, Nℴ 2, 331–339.
Bianchini M., Frasconi P., Gori M., 1995: learning in multilayered networks used as autoassociators, IEEE Transactions on Neural Networks, vol. 6, no. 2, 512–514.
Artières T., 1995: Approches prédictives neuronales: application à l'identification du locuteur, Thèse de doctorat, Université de Paris Sud (In french).
Tebelskis J., Waibel A., Petek B., Schmidbauer O., 1991, Continuous speech recognition using linked predictive neural networks, ICASSP 91, pp 61–64.
Iso K., Watanabe T., 1990: speaker-independent word recognition using a neural prediction model, ICASSP.
Iso K., Watanabe T., 1991: “ Large vocabulary speech recognition using neural prediction model”, ICASSP 91, pp 57–60.
Petek B., Waibel A., Tebelskis J., 1992, Integrated and phoneme-function word architecture of hidden control neural networks for continuous speech recognition” In Speech Communication, Special Issue on Eurospeech, Vol. 11, Nℴ2, pp 273–282.
Levin E., 1993: hidden control neural architecture modeling of non linear time varying systems and its applications, IEEE Trans on NN, vol 4.
Tsuboka E, Takada Y, Wakita H., 1990: neural predictive hidden Markov model, ICSLP.
Rabiner L., Juang B.H., 1993, Fundamentals of speech recognition, Prentice Hall.
Deng L., Aksmanovic M., Sun X., 1994, Speech recognition using hidden markov models with polynomial functions as nonstationary states, IEEE Trans. SAP, 507–520.
Hattori H., 1992: text independent speaker recognition using neural networks, ICASSP, II 153–156.
Mellouk A., Gallinari P., 1994 Discriminative training for improved neural prediction system, ICASSP 94, pp 1233–1236.
Mellouk A., Gallinari P., 1995, Global discrimination for neural predictive systems based on N-Best algorithm” ICASSP'95.
Rao T.S., The fitting of nonstationnary time series model with time dependent parameters, J. R. S. S. Series B, vol 32, nℴ 2, 312–322.
Liporace L.A., 1975, Linear estimation of non stationary signals, J. Acoust. Soc. Amer., vol 58, nℴ 6, 1288–1295.
Grenier Y., 1983, Time-dependent ARMA modeling of non stationary signals, IEEE T. ASSP, Vol. 31, Nℴ 4, 899–911.
Gish H., Ng K., 1993, A segmental speech model with applications to word spotting, ICASSP'93, 11-447-450.
Deng L., 1993, A stochastic model of speech incorporating hirerarchical non-stationarity, IEEE T. SAP, Vol. 1, Nℴ 4, 471–474.
Deng L., Rathinavelu C., 1995, A markov model containing state-conditioned second order non-stationarity: application to speech recognition, Comp. Speech and Lang., 9, 63–86.
Garcia-Salcetti, Dorizzi B., Gallinari P., Wimmer Z., 1996, Adaptive discrimination in an HMM based neural predictive system for on-line word recognition, ICPR-96.
Robinson T., 1991, Several improvements to a recurrent error propagation network phone recognition system”, Tech. Rep. CUED/F-INFENG/TR.82, Cambridge Univ. Eng. Dept, Sept.
Lee K. F., Hon H-W., 1989, Speaker-independent phone recognition using hidden markov models”, IEEE Trans. ASSP, Vol 37, no 11. 1641–1648.
Manke S., Finke M., Waibel A., 1995, NPen++: a writer independent large vocabulary on line hand-writing recognition system, ICDAR'95, 403–408.
Schwartz R., Chow Y.L., 1990 The N-Best algorithm: An efficient and exact procedure for finding the N most likely hypotheses” In ICASSP 90, pp 81–84.
Ostendorf M., Digalakis V., Kimball O.A., 1996, From HHM's to segment models: a unified view of stochastic modelling for speech recognition, IEEE T. SAP, Vol 4, Nℴ 5, 360–378.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Gallinari, P. (1998). Predictive models for sequence modelling, application to speech and character recognition. In: Giles, C.L., Gori, M. (eds) Adaptive Processing of Sequences and Data Structures. NN 1997. Lecture Notes in Computer Science, vol 1387. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0054007
Download citation
DOI: https://doi.org/10.1007/BFb0054007
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64341-8
Online ISBN: 978-3-540-69752-7
eBook Packages: Springer Book Archive