Skip to main content

Continuous-Speech Recognition in the SPICOS-II System

  • Conference paper
Book cover Wissensbasierte Systeme

Part of the book series: Informatik-Fachberichte ((INFORMATIK,volume 227))

  • 68 Accesses

Abstract

The goal of the SPICOS project is the development of a system for answering spoken database queries. This paper describes the present state of two modules for speech recognition developed in this project. The two approaches described can be characterized as a bottom-up and an integrated approach.

In the bottom-up approach, a data-driven two-network matching parser compares the input network of alternative phonological units with a word lexicon organized as a cyclic network. This lexicon contains not only the standard pronunciation, it also models inter-word and intra-word assimilations. Substitutions, deletions and insertions of single phonemes are also taken into account during the match. The output of the parser is a network of word hypotheses. Results are presented with respect to phoneme and word recognition.

In the integrated approach, there are three knowledge sources, namely phoneme models, pronunciation lexicon, and language model. They are integrated into a global search procedure, which is based on statistical decision theory and which finds that word sequence which best explains the input speech signal. A stochastic language model based on probabilities of trigrams, bigrams, and unigrams of word categories is used to incorporate language restrictions into the search process. The word-error rate is reduced from 22% without language model to 9% with a stochastic language model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. K. Baker: “Stochastic Modeling for Automatic Speech Understanding”, in D. R. Reddy (ed.): ‘Speech Recognition’, Academic Press, New York, pp. 512–542, 1975.

    Google Scholar 

  2. M. Brenner, H. Höge, E. Marschall, J. Romano: “Word Recognition in Continuous Speech using a Phonological Based Two-Network Matching Parser and a Synthesis Based Prediction”, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Glasgow, UK, pp. 457–460, May 1989.

    Google Scholar 

  3. H. Bunt: “Mass Nouns and Model Theoretic Semantics”, Cambridge University Press, 1985.

    Google Scholar 

  4. J. P. van Hemert, U. Adriaens-Porzig, L. M. H. Adiaens: “Speech Synthesis in the SPICOS project”, in: H. G. Tillmann, G. Willee (eds.): ‘Analyse und Synthese gesprochener Sprache’, Georg Olms Verlag Hüdesheim, pp. 34–39, 1987.

    Google Scholar 

  5. H. Höge, H. Ney: “Architektur des sprachverstehenden Systems SPICOS”. Proc. Kleinheubacher Bericht No. 29, FTZ Darmstadt 1986, pp. 29–36.

    Google Scholar 

  6. H. Höge et. al.: “Syllable-based Acoustic-Phonetic Decoding and Word Hypotheses Generation in Fluently Spoken Speech”. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Tokyo, pp. 30. 1. 1–4, April 1986.

    Google Scholar 

  7. F. Jelinek: “Continuous Speech Recognition by Statistical Methods”, Proc. of the IEEE, Vol. 64, No. 10, pp. 532–556, April 1976.

    Article  Google Scholar 

  8. F. Jelinek, R.L. Mercer: “Interpolated Estimation of Markov Source parameters from Sparse Data”, in: Pattern Recognition in Practice, E. S. Gelsema and L. N. Kanal (eds.), Amsterdam: North Holland, 1980.

    Google Scholar 

  9. F. Jelinek: “The Development of an Experimental Discrete Dictation Recognizer”, Proc. of the IEEE, Vol. 73, No. 11, pp. 1616–1624. Nov. 1985.

    Article  Google Scholar 

  10. S. M. Katz: “Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer”, IEEE Trans, on Acoustics, Speech, and Signal Processing, Vol. ASSP-35, No. 3, pp. 400–401, March 1987.

    Article  Google Scholar 

  11. D. Mergel, A. Paeseler: “Construction of Language Models for Spoken Data Base Queries”, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Dallas, Texas, pp.20. 13. 1–4, April 1987.

    Google Scholar 

  12. A. Nadas: “On Turing’s Formula for Word Probabilities”, IEEE Trans, on Acoustics, Speech, and Signal Processing, Vol. ASSP–33, No. 6, pp. 1414–1416, Dec. 1985.

    Article  Google Scholar 

  13. H. Ney: “The Use of a One-Stage Dynamic Programming Algorithm for Connected Word Recognition”, IEEE Trans, on Acoustics, Speech, and Signal Processing, Vol. ASSP-32, No. 2, pp. 263–271, April 1984.

    Article  Google Scholar 

  14. H. Ney, D. Mergel, A. Noll, A. Paeseler: “A Data-Driven Organization of the Dynamic Programming Beam Search for Continuous Speech Recognition”, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Dallas, Texas, pp.20. 10. 1–4, April 1987.

    Google Scholar 

  15. H. Ney, A. Noll: “Phoneme Modelling Using Continuous Mixture Densities”, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, New York, pp. 437–440, April 1988.

    Google Scholar 

  16. G. T. Niedermair: “Syntactic Analysis in Speech Understanding”. Proc. of the Europ. Conf. on Speech Techn., Edinburgh, 1987.

    Google Scholar 

  17. A. Noll, H. Ney: “Training of Phoneme Models in a Sentence Recognition System”, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Dallas, Texas, pp.29. 6. 1–4, April 1987.

    Google Scholar 

  18. A. Paeseler, H. Ney: “Continuous speech recognition using a stochastic language model”, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Glasgow, UK, pp. 719–722, May 1989.

    Google Scholar 

  19. O. Schmidbauer: “Syllable-based Segment-Hypotheses Generation in Fluently Spoken Speech Using Gross Articulatory Features”. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Dallas, Texas, pp. 391–394, April 1987.

    Google Scholar 

  20. O. Schmidbauer: “Robust Statistic Modelling of Systematic Variabilities in Continuous Speech Incorporating Acoustic-Articulatory Relations”. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Glasgow, UK, pp. 616–619, May 1989.

    Google Scholar 

  21. J. Sotscheck: “Sätze für Sprachgütemessungen und ihre phonologische Anpassung an die deutsche Sprache”, Proc. DAGA ’84, Deutsche Arbeitsgemeinschaft für Akustik, Darmstadt, West Germany, 4 p., March 1984.

    Google Scholar 

  22. V. Steinbiss: “Sentence-Hypotheses Generation in a Continuous-Speech Recognition System”, to appear in the Proc. of the European Conf. on Speech Communication and Technology, Paris, Sept. 1989.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1989 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Paeseler, A., Ney, H., Steinbiss, V., Höge, H., Marschall, E. (1989). Continuous-Speech Recognition in the SPICOS-II System. In: Brauer, W., Freksa, C. (eds) Wissensbasierte Systeme. Informatik-Fachberichte, vol 227. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-75182-0_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-75182-0_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-51838-9

  • Online ISBN: 978-3-642-75182-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics