Abstract
In typical speech recognition applications, the designer of the application must supply the recognition engine with context- free grammars defining the set of allowable utterances for each recognition.
In the case of the recognition of numeric quantities such as phone and room numbers, such grammars can be fairly tricky to formulate. The natural pronunciations of a number may have many variations and special cases depending on the language and country.Consider, for example, the pronunciations of the four-digit phone extension “2200” in the United Kingdom.One could pronounce this “two-two-zero-zero”, of course, but you will also hear “twenty-two hundred”, “double-two double-naught”, and many other combinations. In North America, on the other hand, you would almost never hear the use of “double”, “triple”, or “naught”. Conventions for pronouncing numbers also depend heavily on the type of quantity being recognized. The year 1987, for example, could be pronounced “nineteen eighty-seven” or in literary contexts, “nineteen hundred and eighty seven” - but never “one-nine eighty-seven” or “one-thousand-nine-hundred and eighty-seven”.
The richness and idiosyncrasies of pronunciation possibilities, together with the combinatorics of dealing with multi-digit numbers of varying lengths, thus make the practical task of manually designing sufficiently complete number grammars laborious and error prone. This is especially true for applications that must be localized to countries having differing telephone dial plans and conventions for natural pronunciation. Ideally, one would like to be able to generate such grammars automatically, or at least semi-automatically.
Doing so requires an intuitive specification technique that allows one to easily encode pronunciation rules from which a grammar can be automatically derived. In this talk, we will show how to define certain formal term rewriting systems – which we call Number Generating Term Rewriting Systems (NGTRS) – that work extremely well for this purpose. We will show that given an NGTRS that generates pronunciations for digit strings representing numeric quantities, we can mechanically generate an equivalent context-free grammar.We’ll give some examples in a number of different natural languages, and explain how classical term rewriting proof techniques can be used to verify the consistency and completeness of the construction.
The method we describe was put to real-world use in Vocera’s speech-controlled communication system. This is a commercial product that consists of tiny, Star Trek-like wearable communicator badges operating against an enterprise-class server on a Wi-Fi network. The system is currently used by nearly a half-million nurses and doctors every day in hospitals in several countries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shostak, R.E. (2012). Applying Term Rewriting to Speech Recognition of Numbers. In: Aoki, T., Taguchi, K. (eds) Formal Methods and Software Engineering. ICFEM 2012. Lecture Notes in Computer Science, vol 7635. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34281-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-34281-3_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34280-6
Online ISBN: 978-3-642-34281-3
eBook Packages: Computer ScienceComputer Science (R0)