Applying Term Rewriting to Speech Recognition of Numbers

Shostak, Robert E.

doi:10.1007/978-3-642-34281-3_3

Robert E. Shostak¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 7635))

Included in the following conference series:

International Conference on Formal Engineering Methods

940 Accesses

Abstract

In typical speech recognition applications, the designer of the application must supply the recognition engine with context- free grammars defining the set of allowable utterances for each recognition.

In the case of the recognition of numeric quantities such as phone and room numbers, such grammars can be fairly tricky to formulate. The natural pronunciations of a number may have many variations and special cases depending on the language and country.Consider, for example, the pronunciations of the four-digit phone extension “2200” in the United Kingdom.One could pronounce this “two-two-zero-zero”, of course, but you will also hear “twenty-two hundred”, “double-two double-naught”, and many other combinations. In North America, on the other hand, you would almost never hear the use of “double”, “triple”, or “naught”. Conventions for pronouncing numbers also depend heavily on the type of quantity being recognized. The year 1987, for example, could be pronounced “nineteen eighty-seven” or in literary contexts, “nineteen hundred and eighty seven” - but never “one-nine eighty-seven” or “one-thousand-nine-hundred and eighty-seven”.

The richness and idiosyncrasies of pronunciation possibilities, together with the combinatorics of dealing with multi-digit numbers of varying lengths, thus make the practical task of manually designing sufficiently complete number grammars laborious and error prone. This is especially true for applications that must be localized to countries having differing telephone dial plans and conventions for natural pronunciation. Ideally, one would like to be able to generate such grammars automatically, or at least semi-automatically.

Doing so requires an intuitive specification technique that allows one to easily encode pronunciation rules from which a grammar can be automatically derived. In this talk, we will show how to define certain formal term rewriting systems – which we call Number Generating Term Rewriting Systems (NGTRS) – that work extremely well for this purpose. We will show that given an NGTRS that generates pronunciations for digit strings representing numeric quantities, we can mechanically generate an equivalent context-free grammar.We’ll give some examples in a number of different natural languages, and explain how classical term rewriting proof techniques can be used to verify the consistency and completeness of the construction.

The method we describe was put to real-world use in Vocera’s speech-controlled communication system. This is a commercial product that consists of tiny, Star Trek-like wearable communicator badges operating against an enterprise-class server on a Wi-Fi network. The system is currently used by nearly a half-million nurses and doctors every day in hospitals in several countries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and Affiliations

Vocera Communications, Inc., San Jose, CA, 95126, USA
Robert E. Shostak

Authors

Robert E. Shostak
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Japan Advanced Institute of Science and Technology (JAIST), 1-1, Asahidai, 923-1292, Nuomi, Ishikawa, Japan
Toshiaki Aoki
National Institute of Advanced Industrial Science and Technology (AIST), Nakoji 3-11-46, 661-0974, Amagasaki, Hyogo, Japan
Kenji Taguchi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shostak, R.E. (2012). Applying Term Rewriting to Speech Recognition of Numbers. In: Aoki, T., Taguchi, K. (eds) Formal Methods and Software Engineering. ICFEM 2012. Lecture Notes in Computer Science, vol 7635. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34281-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-34281-3_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34280-6
Online ISBN: 978-3-642-34281-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics